请教下大佬们想要更快需要怎么优化

凉城听暖 · 2022年07月1日 10:33

【详述】总数据2000w左右，7张表，耗时1000左右，感觉优化空间挺大，但无从下手，是优化语句还是修改配置还是增加机器呀
【业务影响】
【StarRocks版本】2.2.2
【集群规模】2fe（1 follower+1observer）+3be（1observer febe混部）
【机器信息】8核16G4台
【附件】

Profile信息
profile.txt (135.4 KB)

tablet_size_avg 两张大表 150 125M，分桶4

并行度：show variables like ‘%parallel_fragment_exec_instance_num%’;
| Variable_name | Value |
±------------------------------------±------+
| parallel_fragment_exec_instance_num | 1 |（设置更改并行度不会变快）
±------------------------------------±------+
cbo是否开启：show variables like ‘%cbo%’;
±------------------------------------±------+
| Variable_name | Value |
±------------------------------------±------+
| cbo_cte_reuse | false |
| cbo_enable_dp_join_reorder | true |
| cbo_enable_greedy_join_reorder | true |
| cbo_enable_low_cardinality_optimize | true |
| cbo_enable_replicated_join | true |
| cbo_max_reorder_node_use_dp | 10 |
| cbo_max_reorder_node_use_exhaustive | 4 |
| cbo_use_correlated_join_estimate | true |
±------------------------------------±------+
be节点cpu和内存使用率较低

Profile优化：StarRocks-Profile分析及优化指南

shemplle · 2022年07月1日 11:09

set broadcast_row_limit = 100000;
set global pipeline_dop = 0;
set enable_exchange_pass_through=true
表结构修改成为一个bucket就可以。再跑下看下

shemplle · 2022年07月1日 11:15

可以辛苦吧explain costs 發一份麽？

凉城听暖 · 2022年07月1日 11:17

嗯嗯有几十毫秒改善

shemplle · 2022年07月1日 11:18

看起来主要时间花费在了agg阶段，优化空间不太大，可能需要修改sql逻辑来加快sql。请问下方便发出来么？或者脱敏下？

shemplle · 2022年07月1日 11:19

先把 analyze full table table_name; 跑下所有的表吧

凉城听暖 · 2022年07月1日 11:22

收到那个确实好多是默认的1 0，只有id不是但是按照教程加了计划了

shemplle · 2022年07月1日 11:31

    select
      a.customer_id,
      group_concat(b.wx_name) tag_names
    from
      scrm_user_tag a
      left join wwx_corp_tag b on a.tag_id = b.id
    group by
      a.customer_id
  ) b on a.id = b.customer_id

辛苦单独跑下这个sql看下占用多长时间吧

凉城听暖 · 2022年07月1日 12:27

select
a.customer_id,
group_concat(b.wx_name) tag_names
from
scrm_user_tag a
left join wwx_corp_tag b on a.tag_id = b.id
group by
a.customer_id;
全部1秒，这个3秒左右

shemplle · 2022年07月1日 12:39

多跑几次找个稳定值把，也辛苦发下这个的profile么？

shemplle · 2022年07月4日 02:32

这个的优化点也不多主要时间花费在了，exchange 还有 agg 阶段

shemplle · 2022年07月4日 03:03

我们今年会支持多表的物化视图。后续可以考虑把这个语句搞一个物化视图。应该会有一个很大的提升

凉城听暖 · 2022年07月4日 03:01

就是说提高相关硬件也木有办法嘛嗯嗯谢谢收到

shemplle · 2022年07月4日 03:03

可以装一个监控看下跑该SQL时有没有达到性能的瓶颈。如果跑到了性能的瓶颈我理解加机器还是有用的。