3.0.2 查询性能退化

yangrong · 2023年08月7日 07:55

为了更快的定位您的问题，请提供以下信息，谢谢
【版本】3.0.2
【集群配置】3FE + 8BE（32C 128G NVMe SSD 3.5T *2)
【详述】
一个sql查询不加过滤条件，查询结果毫秒级别出来，没有数据，在这个sql的基础上加上一个大表的过滤条件后，查询时间显著变长，结果同样是没数据，查看explain 发现过滤条件被谓词下推了，导致查询性能退化，是否有参数能将谓词下推给禁止掉，不要先下推过滤后再join, 而是先join 后再过滤条件

– 不带过滤条件，146ms 就能查出数据
with t1 as (
select
*
from re_target_lexicon_detail t where t.site=‘US’ and detail_type = 2 and label_id = 584446498518880258 and lexicon_id = 584446498518880256
)

select *
from re_asin_aggregation_info r1 inner join t1
on r1.asin = t1.detail_value and r1.country = ‘US’

– 使用过滤条件的sql, 直接查询报 exceed memory limit 的错误，加上 set enable_spill=true;
set spill_mode=“force”; 后需要跑10min 以上才能出结果，结果也是没有数据

with t1 as (
select
*
from re_target_lexicon_detail t where t.site=‘US’ and detail_type = 2 and label_id = 584446498518880258 and lexicon_id = 584446498518880256
)

select * from (
select *
from re_asin_aggregation_info r1 inner join t1
on r1.asin = t1.detail_value and r1.country = ‘US’
) z
where z.price >= 10 and z.price <= 35;

慢查询的相关信息如下:

query_dump_file (13.8 KB) explain.txt (1.9 KB) explain_costs.txt (9.4 KB) profile.txt (48.0 KB)

yangrong · 2023年08月7日 07:52

看了profile 后把小表加了[broadcast] 的hint 后，查询正常了
with t1 as (
select
*
from re_target_lexicon_detail t where t.site=‘US’ and detail_type = 2 and label_id = 584446498518880258 and lexicon_id = 584446498518880256
)

select * from (
select *
from re_asin_aggregation_info r1 inner join [broadcast] t1
on r1.asin = t1.detail_value and r1.country = ‘US’
) z
where z.price >= 10 and z.price <= 35;

yangrong · 2023年08月7日 08:38

with t1 as (
select
*
from re_target_lexicon_detail t where t.site=‘US’ and detail_type = 2 and label_id = 584446498518880258 and lexicon_id = 584446498518880256
)

select * from (
select *
from t1 inner join re_asin_aggregation_info r1
on r1.asin = t1.detail_value and r1.country = ‘US’
) z
where z.price >= 10 ;
这个sql 的统计信息正常，查询性能也正常 explain costs 如下:
explain_costs_1.txt (9.8 KB)

yangrong · 2023年08月7日 08:40

加上两个过滤条件发现统计信息估算sql 过滤后只有1条数据，导致大表被broadcast了

yuchen1019 · 2023年08月7日 13:24

这个有点问题我们跟进修复一下感谢

packy · 2023年08月8日 01:47

应该是country, price列有数据倾斜, 可以参考https://docs.starrocks.io/zh-cn/latest/using_starrocks/Cost_based_optimizer#%E7%9B%B4%E6%96%B9%E5%9B%BE%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF%E5%85%83%E6%95%B0%E6%8D%AE 手动收集下country, price列的直方图. 这样会让基数估计更准

yangrong · 2023年08月8日 02:26

手动执行了 ANALYZE TABLE re_asin_aggregation_info UPDATE HISTOGRAM ON country, price; 然后再跑了explain costs, 发现估计的变得正常了
ANALYZE TABLE re_asin_aggregation_info UPDATE HISTOGRAM ON country, price;

trueeyu · 2023年08月8日 04:37

已确定是统计信息估计的问题，ANALYZE TABLE re_asin_aggregation_info UPDATE HISTOGRAM ON country, price; 恢复正常，后序要设计下怎么优化这个场景的统计信息估算。