2.5.5版本无法跨库colocate join

【详述】不同数据库指定相同Colocation Group无法Colocate Join
【背景】
【业务影响】性能影响
【StarRocks版本】2.5.5
【集群规模】3fe +4be
【机器信息】48C/250G/万兆
【联系方式】
【附件】
表结构如下:
– dwd.dwd_site_scan_dtl definition

CREATE TABLE dwd.dwd_site_scan_dtl (
ship_id int(11) NOT NULL COMMENT “”,
sub_ship_id bigint(20) NOT NULL COMMENT “”,
scan_site int(11) NOT NULL COMMENT “”,
scan_typ varchar(10) NOT NULL ,
ins_db_tm datetime NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(ship_id, sub_ship_id)
COMMENT “OLAP”
DISTRIBUTED BY HASH(ship_id) BUCKETS 48
PROPERTIES (
“replication_num” = “3”,
“colocate_with” = “ship_id48”,
“dynamic_partition.enable” = “true”,
“dynamic_partition.time_unit” = “DAY”,
“dynamic_partition.time_zone” = “Asia/Shanghai”,
“dynamic_partition.start” = “-60”,
“dynamic_partition.end” = “20”,
“dynamic_partition.prefix” = “p”,
“dynamic_partition.buckets” = “48”,
“dynamic_partition.history_partition_num” = “0”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “true”,
“replicated_storage” = “true”,
“compression” = “LZ4”
);

– ods.reg_bill_info_wid definition

CREATE TABLE ods.reg_bill_info_wid (
unit_tm datetime NOT NULL COMMENT “”,
ship_id int(11) NOT NULL COMMENT " ",
ins_db_tm datetime NULL COMMENT “”
) ENGINE=OLAP
DUPLICATE KEY(unit_tm, ship_id)
COMMENT “”
DISTRIBUTED BY HASH(ship_id) BUCKETS 48
PROPERTIES (
“replication_num” = “3”,
“colocate_with” = “ship_id48”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “false”,
“compression” = “LZ4”
);

colocation_group信息

版本确认2.5.5:

表中数据量:
image
image

收集统计信息:
analyze table dwd.dwd_site_scan_dtl;
analyze table ods.reg_bill_info_wid;

查询语句如下:
SQL1:
explain
SELECT count(*)
FROM ods.reg_bill_info_wid AS a
LEFT OUTER JOIN dwd.dwd_site_scan_dtl AS d
ON a.ship_id = d.ship_id
WHERE a.ins_db_tm BETWEEN CURRENT_DATE() - INTERVAL 20 DAY AND now();

SQL2:
explain
SELECT count(*)
FROM ods.ods_site_scan_dtl AS a
LEFT OUTER JOIN dwd.dwd_site_scan_dtl AS d
ON a.ship_id = d.ship_id
WHERE a.ins_db_tm BETWEEN CURRENT_DATE() - INTERVAL 20 DAY AND now()
;
注:语句都有实际执行过 :

以上两种SQL执行计划均如图:

结论:试了很多查询和表,统计信息也都收集了,即使是相同的一张表定义,分别创建在两个不同库中,然后关联也不会本地join,不是说2.5.4版本开始支持跨库本地join吗?难道不支持?还是我打开的方式不对?

执行一下show proc “/colocation_group”; 结果发出来看下

升级后,表没有重建 ColocateGroup?

为个表是升级前创建的?

ship_id48 这两个 colocate group 应该都是升级之前建的吧,因为原来的版本不支持跨库 colocate group,这个升级上来之后不会形成跨库的colocate group,因为需要数据重新分布。你这个问题可以这么处理,
把其中一个库的 ship_id48 colocate group 的表从这个 colocate group 中都移除(这个时候这个历史的 colocate group 就别删除了),然后再把这些表都加到 ship_id48的 colocate group 中,这样它就会自动和另外库的 ship_id48 colocate group 一起 colocate。或者你如果业务允许的,可以全部新建新的同名 colocate group,这样就自动 colocate 了。

确实有一张表是升级之前创建的,另外一张表是升级之后创建的,我重建一下表试试!

我现在重新新建了两张不同库表,采用相同的最新命名的组,但是groupid为啥是不一样的呢?还是无法本地join ,难道是我升级的有问题?但是数据库未见什么异常啊,还是说我建表有问题?


表结构如下,表中都有数据,也都收集了统计信息:
CREATE TABLE dwd.dwd_site_scan_dtl_test (
ship_id int(11) NOT NULL COMMENT " ",
sub_ship_id bigint(20) NOT NULL COMMENT " "

) ENGINE=OLAP
DUPLICATE KEY(ship_id, sub_ship_id)
COMMENT “OLAP”
DISTRIBUTED BY HASH(ship_id) BUCKETS 48
PROPERTIES (
“replication_num” = “3”,
“colocate_with” = “ship_id_public”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “true”,
“replicated_storage” = “true”,
“compression” = “LZ4”
);
CREATE TABLE ods.reg_bill_info_test (
unit_tm datetime NOT NULL COMMENT " ",
ship_id int(11) NOT NULL COMMENT " ",
ins_db_tm datetime NULL COMMENT " ",

) ENGINE=OLAP
DUPLICATE KEY(unit_tm, ship_id)
COMMENT " "
DISTRIBUTED BY HASH(ship_id) BUCKETS 48
PROPERTIES (
“replication_num” = “3”,
“colocate_with” = “ship_id_public”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “false”,
“compression” = “LZ4”
);

写入数据了吗,如果都是空表的话,不会Colocate的

写入数据可以了吗?

数据都有的,还是不可以

@U_1653974266322_1581抱歉,我测试了一下,应该是中间的 pr 把这个跨库 colocate 的逻辑破坏了,这里有个 bug,我 fix 一下,再同步您这边

@U_1653974266322_1581 具体看了下,这个是对 group name 中的下划线处理的不好,当前版本可以临时规避,你用不带下划线的 group name ,比如 colocate_with=“shipIdPublic”,可以进行跨库 colocate