【详述】不同数据库指定相同Colocation Group无法Colocate Join
【背景】
【业务影响】性能影响
【StarRocks版本】2.5.5
【集群规模】3fe +4be
【机器信息】48C/250G/万兆
【联系方式】
【附件】
表结构如下:
– dwd.dwd_site_scan_dtl definition
CREATE TABLE dwd.dwd_site_scan_dtl (
ship_id int(11) NOT NULL COMMENT “”,
sub_ship_id bigint(20) NOT NULL COMMENT “”,
scan_site int(11) NOT NULL COMMENT “”,
scan_typ varchar(10) NOT NULL ,
ins_db_tm datetime NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(ship_id, sub_ship_id)
COMMENT “OLAP”
DISTRIBUTED BY HASH(ship_id) BUCKETS 48
PROPERTIES (
“replication_num” = “3”,
“colocate_with” = “ship_id48”,
“dynamic_partition.enable” = “true”,
“dynamic_partition.time_unit” = “DAY”,
“dynamic_partition.time_zone” = “Asia/Shanghai”,
“dynamic_partition.start” = “-60”,
“dynamic_partition.end” = “20”,
“dynamic_partition.prefix” = “p”,
“dynamic_partition.buckets” = “48”,
“dynamic_partition.history_partition_num” = “0”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “true”,
“replicated_storage” = “true”,
“compression” = “LZ4”
);
– ods.reg_bill_info_wid definition
CREATE TABLE ods.reg_bill_info_wid (
unit_tm datetime NOT NULL COMMENT “”,
ship_id int(11) NOT NULL COMMENT " ",
ins_db_tm datetime NULL COMMENT “”
) ENGINE=OLAP
DUPLICATE KEY(unit_tm, ship_id)
COMMENT “”
DISTRIBUTED BY HASH(ship_id) BUCKETS 48
PROPERTIES (
“replication_num” = “3”,
“colocate_with” = “ship_id48”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “false”,
“compression” = “LZ4”
);
colocation_group信息:
版本确认2.5.5:
表中数据量:


收集统计信息:
analyze table dwd.dwd_site_scan_dtl;
analyze table ods.reg_bill_info_wid;
查询语句如下:
SQL1:
explain
SELECT count(*)
FROM ods.reg_bill_info_wid AS a
LEFT OUTER JOIN dwd.dwd_site_scan_dtl AS d
ON a.ship_id = d.ship_id
WHERE a.ins_db_tm BETWEEN CURRENT_DATE() - INTERVAL 20 DAY AND now();
SQL2:
explain
SELECT count(*)
FROM ods.ods_site_scan_dtl AS a
LEFT OUTER JOIN dwd.dwd_site_scan_dtl AS d
ON a.ship_id = d.ship_id
WHERE a.ins_db_tm BETWEEN CURRENT_DATE() - INTERVAL 20 DAY AND now()
;
注:语句都有实际执行过 :
以上两种SQL执行计划均如图:
结论:试了很多查询和表,统计信息也都收集了,即使是相同的一张表定义,分别创建在两个不同库中,然后关联也不会本地join,不是说2.5.4版本开始支持跨库本地join吗?难道不支持?还是我打开的方式不对?





