total size of single column exceed the limit of hash join

U_1649754113481_5857 · 2022年04月12日 09:06

【详述】由doris1.15升级到StarRocks 2.0.3，原先可以正常运行的SQL的，在新版本报错： total size of single column exceed the limit of hash join 。是改了什么限制吗？
【背景】升级版本
【业务影响】SQL运行报错
【StarRocks版本】2.0.3
【集群规模】例如：3fe（1 follower+2observer）+5be（fe与be混部）
【机器信息】CPU虚拟核/内存/网卡，例如：48C/64G/万兆
【附件】

运行的SQL；
set exec_mem_limit = 68719476736;
select product_id,big_label,sum(volume) volume from
((select product_id,
volume
from table1
where date >= ‘20220310’
and date <= ‘20220325’)
union all
(select product_id,
volume
from table2
where date >= ‘20220310’
and date <= ‘20220325’))
sale_product left join (select pid,big_label from table3) v5 on v5.pid = product_id where big_label = ‘baihou’
group by product_id,big_label
order by volume desc
limit 0,500

右表大小：167.698 GB

shemplle · 2022年04月12日 09:23

咱们并行度设置的是多少？
执行下

show variables like "%parallel_fragment_exec_instance_num %";

U_1649754113481_5857 · 2022年04月12日 09:42

show variables like “%parallel_fragment_exec_instance_num %”;

并行度为 1

shemplle · 2022年04月12日 09:43

设置为cpu核数的一半重新执行下试试看。

shemplle · 2022年04月13日 01:56

目前情况怎么样了呢？

U_1649754113481_5857 · 2022年04月13日 02:43

请问下，怎么修改这个参数？需要重启FE/BE吗？

shemplle · 2022年04月13日 02:44

https://docs.starrocks.com/zh-cn/main/administration/Query_management#查询相关的-session-变量
参照这篇文章

U_1649754113481_5857 · 2022年04月13日 02:56

查询我设置了
set exec_mem_limit = 68719476736;
set parallel_fragment_exec_instance_num = 16;

结果报错： Memory of process exceed limit. Used: 103601595008, Limit: 107643243478. Mem usage has exceed the limit of BE

shemplle · 2022年04月13日 04:15

增大exec_mem_limit 再看下？

shemplle · 2022年04月13日 04:25

辛苦执行下explain + sql 和 explain cost + sql 看下。

U_1649754113481_5857 · 2022年04月13日 06:08

增大 set exec_mem_limit = 107643243478;

还是报同样的错：Memory of process exceed limit. Used: 99249110328, Limit: 107643243478. Mem usage has exceed the limit of BE

U_1649754113481_5857 · 2022年04月13日 06:19

shemplle · 2022年04月13日 07:22

可以发一下这几张表的建表语句吗？

U_1649754113481_5857 · 2022年04月13日 08:20

CREATE TABLE table1 (
…
) ENGINE=OLAP
UNIQUE KEY(date, room_id, author_id, product_id)
COMMENT “OLAP”
DISTRIBUTED BY HASH(author_id) BUCKETS 24
PROPERTIES (
“replication_num” = “3”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”
);

CREATE TABLE table2 (
…
) ENGINE=OLAP
UNIQUE KEY(date, aweme_id, author_id, product_id)
COMMENT “OLAP”
DISTRIBUTED BY HASH(author_id) BUCKETS 24
PROPERTIES (
“replication_num” = “3”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”
);

CREATE TABLE table3 (
…
) ENGINE=OLAP
UNIQUE KEY(pid)
COMMENT “OLAP”
DISTRIBUTED BY HASH(pid) BUCKETS 24
PROPERTIES (
“replication_num” = “3”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”
);

shemplle · 2022年04月13日 08:25

咱们建表语句有些不合理，导致了查询时扫描了很多数据，建议按照天或者月份来进行分区，将table1 和 table2重新设计一下。

U_1649754113481_5857 · 2022年04月13日 08:47

后面设置了并行度没有报hash join的错，而是报超出内存限制的错，感觉exec_mem_limit 这个参数没起作用？

U_1649754113481_5857 · 2022年04月13日 08:48

旧版本运行是正常的。2.0.3反而报错，新版本是有作hash join的限制吗？

shemplle · 2022年04月13日 08:51

是不是之后表内导入数据了呢？

U_1649754113481_5857 · 2022年04月13日 08:53

这张表一直是有导入数据的。

U_1649754113481_5857 · 2022年04月13日 09:11

我们是关联的维度表列超过4G的问题，那种维度表没有天的概念。