为了更快的定位您的问题,请提供以下信息,谢谢
【详述】存算分离集群,查询hive external catalog, 使用hdfs作为storage volume。希望加速看板图表的查询,尝试使用starrocks的异步物化视图。
构建的视图(SQL见附件)由于含有多个count distinct字段,所以show materialized views可以看到提示:
INVALID: no valid plan: MV contains non-SPJG operators(no view rewrite): LogicalCTEAnchorOperator{cteId=‘1’},LogicalCTEProduceOperator{cteId=‘1’},LogicalCTEConsumeOperator{cteId=‘1’, limit=-1, predicate=null},LogicalCTEConsumeOperator{cteId=‘1’, limit=-1, predicate=null},LogicalCTEConsumeOperator{cteId=‘1’, limit=-1, predicate=null},LogicalCTEConsumeOperator{cteId=‘1’, limit=-1, predicate=null}
导致无法被rewrite选中(从下图的TRACE LOGS MV 中可以确认)。
原查询SQL 执行explain看到执行计划还是扫描HdfsScanNode.
想确认类似这种包含多个count distinct的sql,该如何构建异步物化视图?因为看起来当前的方式走不通
【背景】根据 告别 Count Distinct 慢查询:StarRocks 高效去重全攻略 4.4 总结的内容,尝试设置参数 set materialized_view_rewrite_mode = ‘force’ ,但是没有效果;
【业务影响】查询无法被rewrite到物化视图上
【是否存算分离】是
【StarRocks版本】例如:4.0.12
【集群规模】3 fe +10 cn
【机器信息】64C/256G/万兆
【联系方式】yangshengjie01@zhihu.com
【附件】
查询sql:
SELECT
p_date,
count(distinct case when c_level >=4 then uid end) v1,
count(distinct case when c_level >=4 and is_odcu = 1 then uid end) v2,
count(distinct case when c_level >=4 and is_mcu = 1 and income_30d >0 then uid end) v3,
count(distinct case when c_level >=4 and is_mcu = 1 then uid end) v4
FROM
hive.dws.ads_dataclaw_author_detail_pd
WHERE
p_date >= ‘2026-01-01’ and p_date<=’{yesterday}’
group by p_date
构建视图sql:
CREATE MATERIALIZED VIEW mv_okr_real_pd
PARTITION BY (str2date(p_date, ‘%Y-%m-%d’))
DISTRIBUTED BY RANDOM buckets 1
REFRESH DEFERRED ASYNC START(‘2026-06-20 09:00:00’) EVERY (interval 1 day)
properties(
“partition_refresh_number” = “10”,
“replication_num” = “1”)
AS SELECT
p_date,
count(distinct case when c_level >=4 then uid end) v1,
count(distinct case when c_level >=4 and is_odcu = 1 then uid end) v2,
count(distinct case when c_level >=4 and is_mcu = 1 and income_30d >0 then uid end) v3,
count(distinct case when c_level >=4 and is_mcu = 1 then uid end) v4
FROM
hive.dws.ads_dataclaw_author_detail_pd
WHERE
p_date >= ‘2026-01-01’
group by p_date;
基表的表结构:
CREATE TABLE ads_dataclaw_author_detail_pd (
uid bigint(20),
c_level varchar(1073741824),
is_odcu varchar(1073741824),
is_mcu varchar(1073741824),
income_30d double,
p_date varchar(1073741824) DEFAULT NULL COMMENT “日期分区”
)
PARTITION BY (p_date)
