left join查询很慢

【详述】问题详细描述
商品表 item2 明细模型, 大数据量,按时间+cat分区
品牌表 fix_brands_v2 主键模型,没有分区(300w记录左右)
单独查item2 的top 100,大概1.45秒
left join fix_brands_v2后,需要 20秒左右
单独查询item2 12个月 top 100商品
mysql> select item_id, max(cat1) cat1, max(brand_id) brand_id, sum(sales) as sales, sum(sold) as sold from item2 t1
->
-> where cat1=‘16’ and time in (‘2022-01-01’, ‘2022-02-01’, ‘2022-03-01’, ‘2022-04-01’, ‘2022-05-01’, ‘2022-06-01’, ‘2022-07-01’, ‘2022-08-01’, ‘2022-09-01’, ‘2022-10-01’, ‘2022-11-01’, ‘2022-12-01’) group by item_id order by sales desc limit 100;
100 rows in set (1.75 sec)

left join fix_brands_v2
mysql> select item_id, max(cat1) cat1, max(brand_id) brand_id, sum(sales) as sales, sum(sold) as sold from item2 t1
-> left join fix_brands_v2 t4 on t1.brand_id = t4.bid and t4.plat = ‘taobao’
-> where cat1=‘16’ and time in (‘2022-01-01’, ‘2022-02-01’, ‘2022-03-01’, ‘2022-04-01’, ‘2022-05-01’, ‘2022-06-01’, ‘2022-07-01’, ‘2022-08-01’, ‘2022-09-01’, ‘2022-10-01’, ‘2022-11-01’, ‘2022-12-01’) group by item_id order by sales desc limit 100;
100 rows in set (24.53 sec)
【背景】做过哪些操作?
【业务影响】
【StarRocks版本】2.4.0
【集群规模】3fe(1 follower+2observer)+6be(fe与be单独部署)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,社区群12-金谡–jinsu@moojing.com
【附件】

  • Profile信息,如何获取profile
    left join fix_brands_v2 的
    | Explain String |
    ±--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | PLAN FRAGMENT 0 |
    | OUTPUT EXPRS:1: item_id | 34: max | 35: max | 36: sum | 37: sum |
    | PARTITION: UNPARTITIONED |
    | |
    | RESULT SINK |
    | |
    | 10:MERGING-EXCHANGE |
    | limit: 100 |
    | |
    | PLAN FRAGMENT 1 |
    | OUTPUT EXPRS: |
    | PARTITION: HASH_PARTITIONED: 1: item_id |
    | |
    | STREAM DATA SINK |
    | EXCHANGE ID: 10 |
    | UNPARTITIONED |
    | |
    | 9:TOP-N |
    | | order by: <slot 36> 36: sum DESC |
    | | offset: 0 |
    | | limit: 100 |
    | | |
    | 8:AGGREGATE (update finalize) |
    | | output: max(11: cat1), max(9: brand_id), sum(8: sales), sum(7: sold) |
    | | group by: 1: item_id |
    | | |
    | 7:EXCHANGE |
    | |
    | PLAN FRAGMENT 2 |
    | OUTPUT EXPRS: |
    | PARTITION: RANDOM |
    | |
    | STREAM DATA SINK |
    | EXCHANGE ID: 07 |
    | HASH_PARTITIONED: 1: item_id |
    | |
    | 6:Project |
    | | <slot 1> : 1: item_id |
    | | <slot 7> : 7: sold |
    | | <slot 8> : 8: sales |
    | | <slot 9> : 9: brand_id |
    | | <slot 11> : 11: cat1 |
    | | |
    | 5:HASH JOIN |
    | | join op: RIGHT OUTER JOIN (BUCKET_SHUFFLE) |
    | | colocate: false, reason: |
    | | equal join conjunct: 20: bid = 9: brand_id |
    | | |
    | |----4:EXCHANGE |
    | | |
    | 1:Project |
    | | <slot 20> : 20: bid |
    | | |
    | 0:OlapScanNode |
    | TABLE: fix_brands_v2 |
    | PREAGGREGATION: ON |
    | PREDICATES: 21: plat = ‘taobao’ |
    | partitions=1/1 |
    | rollup: fix_brands_v2 |
    | tabletRatio=32/32 |
    | tabletList=5990349,5990353,5990357,5990361,5990365,5990369,5990373,5990377,5990381,5990385 … |
    | cardinality=2902615 |
    | avgRowSize=16.014166 |
    | numNodes=0 |
    | |
    | PLAN FRAGMENT 3 |
    | OUTPUT EXPRS: |
    | PARTITION: RANDOM |
    | |
    | STREAM DATA SINK |
    | EXCHANGE ID: 04 |
    | BUCKET_SHUFFLE_HASH_PARTITIONED: 9: brand_id |
    | |
    | 3:Project |
    | | <slot 1> : 1: item_id |
    | | <slot 7> : 7: sold |
    | | <slot 8> : 8: sales |
    | | <slot 9> : 9: brand_id |
    | | <slot 11> : 11: cat1 |
    | | |
    | 2:OlapScanNode |
    | TABLE: item2 |
    | PREAGGREGATION: ON |
    | PREDICATES: 11: cat1 = 16, 2: time IN (‘2022-01-01’, ‘2022-02-01’, ‘2022-03-01’, ‘2022-04-01’, ‘2022-05-01’, ‘2022-06-01’, ‘2022-07-01’, ‘2022-08-01’, ‘2022-09-01’, ‘2022-10-01’, ‘2022-11-01’, ‘2022-12-01’) |
    | partitions=12/2016 |
    | rollup: item2 |
    | tabletRatio=384/384 |
    | tabletList=67795,67799,67803,67807,67811,67815,67819,67823,67827,67831 … |
    | cardinality=243717 |
    | avgRowSize=40.053226 |
    | numNodes=0 |
    ±--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    91 rows in set (0.01 sec)

  • 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
    show variables like ‘%pipeline_dop%’;

  • pipeline是否开启:show variables like ‘%pipeline%’;

  • 执行计划:explain costs + sql

  • be节点cpu和内存使用率截图

单独查询
| Explain String |
±--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:1: item_id | 20: max | 21: max | 22: sum | 23: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 5:MERGING-EXCHANGE |
| limit: 100 |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 1: item_id |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 05 |
| UNPARTITIONED |
| |
| 4:TOP-N |
| | order by: <slot 22> 22: sum DESC |
| | offset: 0 |
| | limit: 100 |
| | |
| 3:AGGREGATE (update finalize) |
| | output: max(11: cat1), max(9: brand_id), sum(8: sales), sum(7: sold) |
| | group by: 1: item_id |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 1: item_id |
| |
| 1:Project |
| | <slot 1> : 1: item_id |
| | <slot 7> : 7: sold |
| | <slot 8> : 8: sales |
| | <slot 9> : 9: brand_id |
| | <slot 11> : 11: cat1 |
| | |
| 0:OlapScanNode |
| TABLE: item2 |
| PREAGGREGATION: ON |
| PREDICATES: 11: cat1 = 16, 2: time IN (‘2022-01-01’, ‘2022-02-01’, ‘2022-03-01’, ‘2022-04-01’, ‘2022-05-01’, ‘2022-06-01’, ‘2022-07-01’, ‘2022-08-01’, ‘2022-09-01’, ‘2022-10-01’, ‘2022-11-01’, ‘2022-12-01’) |
| partitions=12/2016 |
| rollup: item2 |
| tabletRatio=384/384 |
| tabletList=67795,67799,67803,67807,67811,67815,67819,67823,67827,67831 … |
| cardinality=243717 |
| avgRowSize=40.053226 |
| numNodes=0 |
±--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
54 rows in set (0.01 sec)

发个profile看下

两个查询的profile都放上来了呢

您发的是explain,可以将profile复制到文件中发一下。item2 和 fix_brands_v2 表的数据量有多大。先对t
1进行条件过滤然后group by 再join执行效果如何

item2 (40.6 KB)

brand_v2 (76.9 KB)

在查询的范围内,item2 172310974条记录

fix_brands_v2 5805230条记录

先对t1过滤,再join ,执行时间11.3秒; 这个也很慢,过滤完只剩100条数据了呢

先做group by再join呢