Query性能优化

不惑123 · 2023年03月7日 03:21

【详述】问题详细描述
执行一个查询SQL，耗时3s左右，耗时太长，需要缩短执行时长，优化查询性能。
看Profile，显示Exchange Operator的Network耗时比较长，请问有什么优化建议吗？
单分区4000万数据量
DDL：
CREATE TABLE dwd_game_event_log_d_2 (
dt date NULL COMMENT “分区”,
event_no varchar(65533) NULL COMMENT " 根据事件相关原始日志全字段生成MD5(32位小写)，用于去重",
event_tp varchar(65533) NULL COMMENT “事件类型”,
event_id varchar(65533) NULL COMMENT " 事件id",
server_time datetime NULL COMMENT " 服务器时间, yyyy-MM-dd hh:mm:ss格式",
…
INDEX i_op_id (op_id) USING BITMAP COMMENT ‘’,
INDEX event_tp (event_tp) USING BITMAP,
INDEX event_id (event_id) USING BITMAP
) ENGINE=OLAP
DUPLICATE KEY(dt, event_no)
COMMENT “事件表”
PARTITION BY RANGE(dt)
(PARTITION p20230109 VALUES [(“2023-01-09”), (“2023-01-10”)),
PARTITION p20230110 VALUES [(“2023-01-10”), (“2023-01-11”)),
PARTITION p20230111 VALUES [(“2023-01-11”), (“2023-01-12”)),
PARTITION p20230112 VALUES [(“2023-01-12”), (“2023-01-13”)))
DISTRIBUTED BY HASH(event_no) BUCKETS 15
PROPERTIES (
“replication_num” = “3”,
“bloom_filter_columns” = “server_time, pre_account, server_id”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “false”,
“compression” = “LZ4”
);
SQL：
select
376,op_id,pre_account,account,role_id,server_id
,min(coalesce(first_stime,first_time)) first_time
,min(reg_stime) act_time
,min(f_pay_stime) first_pay_time
,min(role_first_pay_stime) role_first_pay_time
,max(if(coalesce(first_opid,’’) <> ‘’,first_opid,null)) first_opid
,max(if(coalesce(first_server_id,’’) <> ‘’,first_server_id,null)) first_server_id
,max(if(coalesce(first_ip,’’) <> ‘’,first_ip,null)) first_ip
,max(if(coalesce(reg_ip,’’) <> ‘’,reg_ip,null)) act_ip
,max(if(coalesce(first_lang,’’) <> ‘’,first_lang,null)) first_lang
,max(if(coalesce(role_first_lang,’’) <> ‘’,role_first_lang,null)) role_first_lang
,max(role_level) role_level
,max(role_vip) role_vip
,max(if(coalesce(role_name,’’) <> ‘’ and length(role_name) >= 1,role_name,null)) role_name
,max(if(coalesce(role_career,’’) <> ‘’ and length(role_career) >= 1,role_career,null)) role_career
,count(distinct if(lower(event_id) = ‘rolelogin’,event_no,null)) login_num
from dwd_game_event_log_d_2 t11
where 1=1
and dt=‘2023-01-12’
and TO_DATE(server_time) = ‘2023-01-12’
and (
(event_tp = ‘kpi’ and event_id in (‘Login’,‘Logout’,‘Recharge’,‘PayLog’))
or (event_tp = ‘sdk’ and event_id in (‘Enter’))
)
and length(pre_account) < 200
group by op_id,pre_account,account,role_id,server_id

【StarRocks版本】2.5.2
【集群规模】3fe（3 follower）+6be（fe与be混部）
【机器信息】48C/64G/千兆

【附件】

Profile信息

图片886×864 50.1 KB

profile.yaml (67.4 KB)
并行度：show variables like ‘%parallel_fragment_exec_instance_num%’;

show variables like ‘%pipeline_dop%’;
pipeline是否开启：show variables like ‘%pipeline%’;
执行计划：
explain.txt (21.3 KB)

不惑123 · 2023年03月7日 04:14

针对Exchange Operator的Network耗时比较长，我尝试增加分桶数量，增加并行度，来实现缩短__MAX_OF_NetworkTime的值。我把分桶数从15增加到30，但结果是__MIN_OF_NetworkTime的时间增加了

不惑123 · 2023年03月7日 04:19

还有一个问题，我手动设置pipeline的并行度，设置成12，但是看profile，pipeline最多只有4个，看Fragment2的每个pipeline的算子，这4个pipeline似乎有前后依赖逻辑。我理解的pipeline并行执行，是同一个fragment内的pipeline之间没有依赖关系，可以同时执行。请问pipeline引擎可以并行执行吗？

不惑123 · 2023年03月7日 11:26

这个WaitTime耗时怎么优化呢？

yuchen1019 · 2023年03月7日 12:19

这个看着是数据倾斜导致的 min和max差别太大了可以执行下这个工具看下tablet分布是否均衡工具： tools.tar.gz (24.0 MB)
下载完成后编辑config.ini信息，然后执行

./healthy_report config.ini

可以获取以下信息：

以上信息中关注下标准差那列，如果异常高，则表示该表需要重新选取hash键，建表不合理有严重的数据倾斜问题。

除此还可以关注下tablet数据平均值是否合理，一般建议该值在100MB-1GB之间，对于表总体数据量比较小时可以容忍小一点，数据量大的表建议在1G左右，如果该值与建议值差异较大，可以适当调整建表语句的bucket数量大小。

不惑123 · 2023年03月8日 07:50

yuchen1019 · 2023年03月8日 09:03

tools.tar(1).gz (23.9 MB) 重新改了下试一下这个

不惑123 · 2023年03月8日 09:24

标准差那列值挺高的，但这个值高是不是因为有很多空分区？我的sql只查询2023-01-12这个分区，我看这个分区的每个tablet的大小都一样的。

yuchen1019 · 2023年03月8日 12:08

CREATE TABLE dwd_game_event_log_d_2 (
event_tp varchar(65533) NULL COMMENT “事件类型”,
event_id varchar(65533) NULL COMMENT " 事件id",
dt date NULL COMMENT “分区”,
event_no varchar(65533) NULL COMMENT " 根据事件相关原始日志全字段生成MD5(32位小写)，用于去重",
server_time datetime NULL COMMENT " 服务器时间, yyyy-MM-dd hh:mm:ss格式",
…
INDEX i_op_id ( op_id ) USING BITMAP COMMENT ‘’,
INDEX event_tp ( event_tp ) USING BITMAP,
INDEX event_id ( event_id ) USING BITMAP
) ENGINE=OLAP
DUPLICATE KEY( event_tp , event_id )
COMMENT “事件表”
PARTITION BY RANGE( dt )
(PARTITION p20230109 VALUES [(“2023-01-09”), (“2023-01-10”)),
PARTITION p20230110 VALUES [(“2023-01-10”), (“2023-01-11”)),
PARTITION p20230111 VALUES [(“2023-01-11”), (“2023-01-12”)),
PARTITION p20230112 VALUES [(“2023-01-12”), (“2023-01-13”)))
DISTRIBUTED BY HASH( event_no ) BUCKETS 15
PROPERTIES (
“replication_num” = “3”,
“bloom_filter_columns” = “server_time, pre_account, server_id”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “false”,
“compression” = “LZ4”
)

6个be是吗？可以吧分桶数改成30 然后建表修改主键为查询中的条件列：event_tp 、 event_id
然后还可以把sql中的TO_DATE(server_time) = ‘2023-01-12’ 修改成 server_time = ‘2023-01-12’ ，我看你本来存的就是datetime 可以重新建一张表试一下查询速率

不惑123 · 2023年03月10日 02:52

我按照这种方式尝试，性能上基本没有变化。
DDL:
CREATE TABLE dwd_game_event_log_d1 (
dt date NULL COMMENT “分区”,
event_tp varchar(20) NULL COMMENT “事件类型”,
event_id varchar(20) NULL COMMENT " 事件id",
server_time datetime NULL COMMENT " 服务器时间, yyyy-MM-dd hh:mm:ss格式",
event_no varchar(65533) NULL COMMENT " 根据事件相关原始日志全字段生成MD5(32位小写)，用于去重",
event_info varchar(65533) NULL COMMENT " 事件特有属性拼接",
cn_time datetime NULL COMMENT " 北京时间, yyyy-MM-dd hh:mm:ss格式，用于默认计算，分区时间",
…
reg_opg_id varchar(65533) NULL COMMENT “角色首登opgame_id”,
INDEX i_op_id (op_id) USING BITMAP COMMENT ‘’
) ENGINE=OLAP
DUPLICATE KEY(dt, event_tp, event_id, server_time)
COMMENT “游戏事件表”
PARTITION BY RANGE(dt)
(PARTITION p20230106 VALUES [(“2023-01-06”), (“2023-01-07”)),
PARTITION p20230107 VALUES [(“2023-01-07”), (“2023-01-08”)),
…
PARTITION p20230302 VALUES [(“2023-03-02”), (“2023-03-03”)),
PARTITION p20230303 VALUES [(“2023-03-03”), (“2023-03-04”)),
PARTITION p20230304 VALUES [(“2023-03-04”), (“2023-03-05”)))
DISTRIBUTED BY HASH(event_tp, event_id, server_time) BUCKETS 30
PROPERTIES (
“replication_num” = “3”,
“bloom_filter_columns” = “server_time, pre_account, server_id”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “false”,
“compression” = “LZ4”
);
Query:
select
376,op_id,pre_account,account,role_id,server_id
,min(coalesce(first_stime,first_time)) first_time
,min(reg_stime) act_time
,min(f_pay_stime) first_pay_time
,min(role_first_pay_stime) role_first_pay_time
,max(if(coalesce(first_opid,’’) <> ‘’,first_opid,null)) first_opid
,max(if(coalesce(first_server_id,’’) <> ‘’,first_server_id,null)) first_server_id
,max(if(coalesce(first_ip,’’) <> ‘’,first_ip,null)) first_ip
,max(if(coalesce(reg_ip,’’) <> ‘’,reg_ip,null)) act_ip
,max(if(coalesce(first_lang,’’) <> ‘’,first_lang,null)) first_lang
,max(if(coalesce(role_first_lang,’’) <> ‘’,role_first_lang,null)) role_first_lang
,max(role_level) role_level
,max(role_vip) role_vip
,max(if(coalesce(role_name,’’) <> ‘’ and length(role_name) >= 1,role_name,null)) role_name
,max(if(coalesce(role_career,’’) <> ‘’ and length(role_career) >= 1,role_career,null)) role_career
– ,sum(case when cast(coalesce(event_info[‘onlinetime’],0) as int) > 86400 then 86400 else cast(coalesce(event_info[‘onlinetime’],0) as int) end) onlinetime
,count(distinct if(lower(event_id) = ‘rolelogin’,event_no,null)) login_num
from dwd_game_event_log_d1 t11
where 1=1
and dt=‘2023-01-12’
and TO_DATE(server_time) = ‘2023-01-12’
and (
(event_tp = ‘kpi’ and event_id in (‘RoleLogin’,‘RoleLogout’,‘PayRecharge’,‘CollectPayLog’))
or (event_tp = ‘sdk’ and event_id in (‘EnterGame’))
)
and length(pre_account) < 200
group by op_id,pre_account,account,role_id,server_id;
Explain:
explain1.txt (4.5 KB)
Profile：
profile1.txt (68.7 KB)