【详述】问题详细描述
商品表 item2 明细模型, 大数据量,按时间+cat分区
品牌表 fix_brands_v2 主键模型,没有分区(300w记录左右)
单独查item2 的top 100,大概1.45秒
left join fix_brands_v2后,需要 20秒左右
单独查询item2 12个月 top 100商品
mysql> select item_id, max(cat1) cat1, max(brand_id) brand_id, sum(sales) as sales, sum(sold) as sold from item2 t1
->
-> where cat1=‘16’ and time in (‘2022-01-01’, ‘2022-02-01’, ‘2022-03-01’, ‘2022-04-01’, ‘2022-05-01’, ‘2022-06-01’, ‘2022-07-01’, ‘2022-08-01’, ‘2022-09-01’, ‘2022-10-01’, ‘2022-11-01’, ‘2022-12-01’) group by item_id order by sales desc limit 100;
100 rows in set (1.75 sec)
left join fix_brands_v2
mysql> select item_id, max(cat1) cat1, max(brand_id) brand_id, sum(sales) as sales, sum(sold) as sold from item2 t1
-> left join fix_brands_v2 t4 on t1.brand_id = t4.bid and t4.plat = ‘taobao’
-> where cat1=‘16’ and time in (‘2022-01-01’, ‘2022-02-01’, ‘2022-03-01’, ‘2022-04-01’, ‘2022-05-01’, ‘2022-06-01’, ‘2022-07-01’, ‘2022-08-01’, ‘2022-09-01’, ‘2022-10-01’, ‘2022-11-01’, ‘2022-12-01’) group by item_id order by sales desc limit 100;
100 rows in set (24.53 sec)
【背景】做过哪些操作?
【业务影响】
【StarRocks版本】2.4.0
【集群规模】3fe(1 follower+2observer)+6be(fe与be单独部署)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,社区群12-金谡–jinsu@moojing.com
【附件】
-
Profile信息,如何获取profile
left join fix_brands_v2 的
| Explain String |
±--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:1: item_id | 34: max | 35: max | 36: sum | 37: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 10:MERGING-EXCHANGE |
| limit: 100 |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 1: item_id |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 10 |
| UNPARTITIONED |
| |
| 9:TOP-N |
| | order by: <slot 36> 36: sum DESC |
| | offset: 0 |
| | limit: 100 |
| | |
| 8:AGGREGATE (update finalize) |
| | output: max(11: cat1), max(9: brand_id), sum(8: sales), sum(7: sold) |
| | group by: 1: item_id |
| | |
| 7:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 07 |
| HASH_PARTITIONED: 1: item_id |
| |
| 6:Project |
| | <slot 1> : 1: item_id |
| | <slot 7> : 7: sold |
| | <slot 8> : 8: sales |
| | <slot 9> : 9: brand_id |
| | <slot 11> : 11: cat1 |
| | |
| 5:HASH JOIN |
| | join op: RIGHT OUTER JOIN (BUCKET_SHUFFLE) |
| | colocate: false, reason: |
| | equal join conjunct: 20: bid = 9: brand_id |
| | |
| |----4:EXCHANGE |
| | |
| 1:Project |
| | <slot 20> : 20: bid |
| | |
| 0:OlapScanNode |
| TABLE: fix_brands_v2 |
| PREAGGREGATION: ON |
| PREDICATES: 21: plat = ‘taobao’ |
| partitions=1/1 |
| rollup: fix_brands_v2 |
| tabletRatio=32/32 |
| tabletList=5990349,5990353,5990357,5990361,5990365,5990369,5990373,5990377,5990381,5990385 … |
| cardinality=2902615 |
| avgRowSize=16.014166 |
| numNodes=0 |
| |
| PLAN FRAGMENT 3 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| BUCKET_SHUFFLE_HASH_PARTITIONED: 9: brand_id |
| |
| 3:Project |
| | <slot 1> : 1: item_id |
| | <slot 7> : 7: sold |
| | <slot 8> : 8: sales |
| | <slot 9> : 9: brand_id |
| | <slot 11> : 11: cat1 |
| | |
| 2:OlapScanNode |
| TABLE: item2 |
| PREAGGREGATION: ON |
| PREDICATES: 11: cat1 = 16, 2: time IN (‘2022-01-01’, ‘2022-02-01’, ‘2022-03-01’, ‘2022-04-01’, ‘2022-05-01’, ‘2022-06-01’, ‘2022-07-01’, ‘2022-08-01’, ‘2022-09-01’, ‘2022-10-01’, ‘2022-11-01’, ‘2022-12-01’) |
| partitions=12/2016 |
| rollup: item2 |
| tabletRatio=384/384 |
| tabletList=67795,67799,67803,67807,67811,67815,67819,67823,67827,67831 … |
| cardinality=243717 |
| avgRowSize=40.053226 |
| numNodes=0 |
±--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
91 rows in set (0.01 sec) -
并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
show variables like ‘%pipeline_dop%’; -
pipeline是否开启:show variables like ‘%pipeline%’;
-
执行计划:explain costs + sql
-
be节点cpu和内存使用率截图