3.0.2 查询性能退化

为了更快的定位您的问题,请提供以下信息,谢谢
【版本】3.0.2
【集群配置】3FE + 8BE(32C 128G NVMe SSD 3.5T *2)
【详述】
一个sql查询不加过滤条件,查询结果毫秒级别出来,没有数据,在这个sql的基础上加上一个大表的过滤条件后,查询时间显著变长,结果同样是没数据,查看explain 发现过滤条件被谓词下推了,导致查询性能退化,是否有参数能将谓词下推给禁止掉,不要先下推过滤后再join, 而是先join 后再过滤条件

– 不带过滤条件,146ms 就能查出数据
with t1 as (
select
*
from re_target_lexicon_detail t where t.site=‘US’ and detail_type = 2 and label_id = 584446498518880258 and lexicon_id = 584446498518880256
)

select *
from re_asin_aggregation_info r1 inner join t1
on r1.asin = t1.detail_value and r1.country = ‘US’

– 使用过滤条件的sql, 直接查询报 exceed memory limit 的错误,加上 set enable_spill=true;
set spill_mode=“force”; 后需要跑10min 以上才能出结果,结果也是没有数据

with t1 as (
select
*
from re_target_lexicon_detail t where t.site=‘US’ and detail_type = 2 and label_id = 584446498518880258 and lexicon_id = 584446498518880256
)

select * from (
select *
from re_asin_aggregation_info r1 inner join t1
on r1.asin = t1.detail_value and r1.country = ‘US’
) z
where z.price >= 10 and z.price <= 35;

慢查询的相关信息如下:

query_dump_file (13.8 KB) explain.txt (1.9 KB) explain_costs.txt (9.4 KB) profile.txt (48.0 KB)

看了profile 后把 小表加了[broadcast] 的hint 后,查询正常了
with t1 as (
select
*
from re_target_lexicon_detail t where t.site=‘US’ and detail_type = 2 and label_id = 584446498518880258 and lexicon_id = 584446498518880256
)

select * from (
select *
from re_asin_aggregation_info r1 inner join [broadcast] t1
on r1.asin = t1.detail_value and r1.country = ‘US’
) z
where z.price >= 10 and z.price <= 35;

with t1 as (
select
*
from re_target_lexicon_detail t where t.site=‘US’ and detail_type = 2 and label_id = 584446498518880258 and lexicon_id = 584446498518880256
)

select * from (
select *
from t1 inner join re_asin_aggregation_info r1
on r1.asin = t1.detail_value and r1.country = ‘US’
) z
where z.price >= 10 ;
这个sql 的统计信息正常,查询性能也正常 explain costs 如下:
explain_costs_1.txt (9.8 KB)


加上两个过滤条件发现统计信息估算sql 过滤后只有1条数据,导致大表被broadcast了

这个有点问题 我们跟进修复一下 感谢

应该是country, price列有数据倾斜, 可以参考https://docs.starrocks.io/zh-cn/latest/using_starrocks/Cost_based_optimizer#%E7%9B%B4%E6%96%B9%E5%9B%BE%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF%E5%85%83%E6%95%B0%E6%8D%AE 手动收集下country, price列的直方图. 这样会让基数估计更准

手动执行了 ANALYZE TABLE re_asin_aggregation_info UPDATE HISTOGRAM ON country, price; 然后再跑了explain costs, 发现估计的变得正常了
ANALYZE TABLE re_asin_aggregation_info UPDATE HISTOGRAM ON country, price;

已确定是统计信息估计的问题,ANALYZE TABLE re_asin_aggregation_info UPDATE HISTOGRAM ON country, price; 恢复正常,后序要设计下怎么优化这个场景的统计信息估算。