异步视图分区表数据没有增量更新,只是覆盖数据了,怎么解决呀

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】
异步视图分区表数据没有增量更新,只是覆盖数据了,怎么解决呀,
【背景】通过异步物化视图,聚合定期聚合paimon表的数据。
【业务影响】 分区表数据不保存历史数据
【是否存算分离】 存算分离
【StarRocks版本】例如:3.3.7
【集群规模】 :3fe +3be
【机器信息】fe 24g+7cpu be:32g +12cpu
创视图表如下:
CREATE MATERIALIZED VIEW dws_base_summary_mv
PARTITION BY str2date (dt,’%Y-%m-%d’)
REFRESH ASYNC START(‘2022-09-01 08:00:00’) EVERY (interval 1 hour)
AS
SELECT dt,
SUM(CASE
WHEN key = ‘test’
AND msg LIKE ‘%测评完成%’
AND msg NOT LIKE ‘%服务端测评%’ THEN 1
ELSE 0 END) AS success_cnt
FROM paimon_catalog.paimon_log.dwd_base_log
WHERE dt = concat( current_date )
group by dt;
结果如下,数据只有3月7号的,,3月6号的数据被覆盖了。没有把历史数据存下来。

微信图片_20250307144020

75988f67-fb5c-11ef-bbfc-2692c5e4a22f mv-1824466 2025-03-07 22:00:00 2025-03-07 22:00:29 FAILED

Refresh materialized view dws_k12soe_summary_mv_1h_v2 failed after retrying 1 times(try-lock 0 times), error-msg : com.starrocks.common.DdlException: Backend node not found. Check if any backend node is down.
at com.starrocks.common.ErrorReport.reportDdlException(ErrorReport.java:90)
at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:2286)
[wrapped] com.starrocks.common.UserException: Backend node not found. Check if any backend node is down.
at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:2513)
at com.starrocks.load.InsertOverwriteJobRunner.executeInsert(InsertOverwriteJobRunner.java:356)
at com.starrocks.load.InsertOverwriteJobRunner.doLoad(InsertOverwriteJobRunner.java:170)
at com.starrocks.load.InsertOverwriteJobRunner.handle(InsertOverwriteJobRunner.java:150)
at com.starrocks.load.InsertOverwriteJobRunner.transferTo(InsertOverwriteJobRunner.java:211)
at com.starrocks.load.InsertOverwriteJobRunner.prepare(InsertOverwriteJobRunner.java:255)
at com.starrocks.load.InsertOverwriteJobRunner.handle(InsertOverwriteJobRunner.java:147)
at com.starrocks.load.InsertOverwriteJobRunner.run(InsertOverwriteJobRunner.java:135)
at com.starrocks.load.InsertOverwriteJobMgr.executeJob(InsertOverwriteJobMgr.java:91)
at com.starrocks.qe.StmtExecutor.handleInsertOverwrite(StmtExecutor.java:2044)
at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:2139)
at com.starrocks.qe.StmtExecutor.handleDMLStmtWithProfile(StmtExecutor.java:2053)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.refreshMaterializedView(PartitionBasedMvRefreshProcessor.java:1124)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doRefreshMaterializedView(PartitionBasedMvRefreshProcessor.java:462)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doRefreshMaterializedViewWithRetry(PartitionBasedMvRefreshProcessor.java:368)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doMvRefresh(PartitionBasedMvRefreshProcessor.java:327)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.processTaskRun(PartitionBasedMvRefreshProcessor.java:199)
at com.starrocks.scheduler.TaskRun.executeTaskRun(TaskRun.java:270)
at com.starrocks.scheduler.TaskRunExecutor.lambda$executeTaskRun$0(TaskRunExecutor.java:58)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

d75cf6b8-fb64-11ef-bbfc-2692c5e4a22f mv-1824466 2025-03-07 23:00:00 2025-03-08 00:00:11 FAILED

Refresh materialized view dws_k12soe_summary_mv_1h_v2 failed after retrying 1 times(try-lock 0 times), error-msg : com.starrocks.common.TimeoutException: Query timeout. Increase the query_timeout session variable and retry
at com.starrocks.common.ErrorReport.reportTimeoutException(ErrorReport.java:104)
at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:2274)
[wrapped] com.starrocks.common.UserException: Query timeout. Increase the query_timeout session variable and retry
at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:2513)
at com.starrocks.load.InsertOverwriteJobRunner.executeInsert(InsertOverwriteJobRunner.java:356)
at com.starrocks.load.InsertOverwriteJobRunner.doLoad(InsertOverwriteJobRunner.java:170)
at com.starrocks.load.InsertOverwriteJobRunner.handle(InsertOverwriteJobRunner.java:150)
at com.starrocks.load.InsertOverwriteJobRunner.transferTo(InsertOverwriteJobRunner.java:211)
at com.starrocks.load.InsertOverwriteJobRunner.prepare(InsertOverwriteJobRunner.java:255)
at com.starrocks.load.InsertOverwriteJobRunner.handle(InsertOverwriteJobRunner.java:147)
at com.starrocks.load.InsertOverwriteJobRunner.run(InsertOverwriteJobRunner.java:135)
at com.starrocks.load.InsertOverwriteJobMgr.executeJob(InsertOverwriteJobMgr.java:91)
at com.starrocks.qe.StmtExecutor.handleInsertOverwrite(StmtExecutor.java:2044)
at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:2139)
at com.starrocks.qe.StmtExecutor.handleDMLStmtWithProfile(StmtExecutor.java:2053)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.refreshMaterializedView(PartitionBasedMvRefreshProcessor.java:1124)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doRefreshMaterializedView(PartitionBasedMvRefreshProcessor.java:462)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doRefreshMaterializedViewWithRetry(PartitionBasedMvRefreshProcessor.java:368)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doMvRefresh(PartitionBasedMvRefreshProcessor.java:327)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.processTaskRun(PartitionBasedMvRefreshProcessor.java:199)
at com.starrocks.scheduler.TaskRun.executeTaskRun(TaskRun.java:270)
at com.starrocks.scheduler.TaskRunExecutor.lambda$executeTaskRun$0(TaskRunExecutor.java:58)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

638b1f0c-fc36-11ef-bbfc-2692c5e4a22f mv-1824466 2025-03-09 00:00:00 2025-03-09 00:00:35 FAILED

Refresh materialized view dws_k12soe_summary_mv_1h_v2 failed after retrying 1 times(try-lock 0 times), error-msg : com.starrocks.sql.common.DmlException: materialized view dws_k12soe_summary_mv_1h_v2 refresh task failed: sync partition failed
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doRefreshMaterializedView(PartitionBasedMvRefreshProcessor.java:430)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doRefreshMaterializedViewWithRetry(PartitionBasedMvRefreshProcessor.java:368)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doMvRefresh(PartitionBasedMvRefreshProcessor.java:327)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.processTaskRun(PartitionBasedMvRefreshProcessor.java:199)
at com.starrocks.scheduler.TaskRun.executeTaskRun(TaskRun.java:270)
at com.starrocks.scheduler.TaskRunExecutor.lambda$executeTaskRun$0(TaskRunExecutor.java:58)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

8df506a2-fcff-11ef-bbfc-2692c5e4a22f mv-1824466 2025-03-10 00:00:00 2025-03-10 00:00:10 FAILED

Refresh materialized view dws_k12soe_summary_mv_1h_v2 failed after retrying 1 times(try-lock 0 times), error-msg : com.starrocks.sql.common.DmlException: Find the corresponding mv partition names of ref base table dwd_prod_base_log_k12 failed, mv dws_k12soe_summary_mv_1h_v2:, ref partitions: [p20250309, p20250310]
at com.starrocks.scheduler.mv.MVPCTRefreshPartitioner.getMvPartitionNamesToRefresh(MVPCTRefreshPartitioner.java:201)
at com.starrocks.scheduler.mv.MVPCTRefreshRangePartitioner.getMVPartitionsToRefresh(MVPCTRefreshRangePartitioner.java:232)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.getPartitionsToRefreshForMaterializedView(PartitionBasedMvRefreshProcessor.java:970)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.getPartitionsToRefreshForMaterializedView(PartitionBasedMvRefreshProcessor.java:928)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.checkMvToRefreshedPartitions(PartitionBasedMvRefreshProcessor.java:289)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doRefreshMaterializedView(PartitionBasedMvRefreshProcessor.java:439)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doRefreshMaterializedViewWithRetry(PartitionBasedMvRefreshProcessor.java:368)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doMvRefresh(PartitionBasedMvRefreshProcessor.java:327)
at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.processTaskRun(PartitionBasedMvRefreshProcessor.java:199)
at com.starrocks.scheduler.TaskRun.executeTaskRun(TaskRun.java:270)
at com.starrocks.scheduler.TaskRunExecutor.lambda$executeTaskRun$0(TaskRunExecutor.java:58)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

paimon 表的建表:
REATE TABLE if not exists dwd_prod_base_log_k12\n" +
“(\n” +
" app_key String,\n" +
" env String,\n" +
" category String,\n" +
" action String,\n" +
" uid String,\n" +
" open_id String,\n" +
" school String,\n" +
" level String,\n" +
" msg String,\n" +
" ip String,\n" +
" channel String,\n" +
" version String,\n" +
" timestamp BIGINT,\n" +
" device_id String,\n" +
" device_type String,\n" +
" os_version String,\n" +
" platform String,\n" +
" arch String,\n" +
" hostname String,\n" +
" cpus String,\n" +
" totalmem String,\n" +
" terminal_timestamp BIGINT,\n" +
" is_batch String,\n" +
" beijing_time String,\n" +
" terminal_time String,\n" +
" dt String\n" +
“) PARTITIONED BY ( dt)\n” +
“WITH ( ‘bucket’ = ‘50’,\n” +
" ‘bucket-key’ = ‘app_key’,\n" +
" ‘sink.parallelism’ = ‘5’,\n" +
" ‘format’ = ‘orc’,\n" +
" ‘compression’ = ‘snappy’,\n" +
" ‘compaction.min.file-num’ = ‘5’,\n" +
" ‘compaction.max.file-num’ = ‘10’ \n" +
" );");
1741576195186

详细的物化视图建表语句

CREATE MATERIALIZED VIEW IF NOT EXISTS dws_k12soe_summary_mv_1h_v2
PARTITION BY str2date (dt,’%Y-%m-%d’
)
REFRESH ASYNC
START (‘2024-01-01 09:00:00’) EVERY (INTERVAL 1 hour)
AS
SELECT dt,

   SUM(CASE
           WHEN app_key = 'xx'
               AND msg LIKE '%测评完成%'
               AND msg NOT LIKE '%服务端测评%' THEN 1
           ELSE 0 END)                                                  AS soe_success_cnt,

   SUM(CASE
           WHEN app_key = 'xxxx'
               AND msg LIKE '%失败%'
               AND msg NOT LIKE '%服务端测评%'
               AND msg NOT LIKE '%测评取消失败%'
               AND msg NOT LIKE '%测评结束失败%'
               AND msg NOT LIKE '%初始化语音测评失败%'
               AND msg NOT LIKE '%联系客服%'
               AND msg NOT LIKE '%系统错误,请联系客服为您处理(err:999)%'
               AND (action = 'xxxx' OR action = 'xx') THEN 1
           ELSE 0 END)                                                  AS soe_fail_cnt,

   SUM(CASE
           WHEN level = 'error'
               AND app_key = 'xxx'
               AND msg LIKE '%首次服务端测评%'
               AND get_json_object(msg, '$.success') = 'success' THEN 1
           ELSE 0 END)                                                  AS soe_first_success_cnt,

   SUM(CASE
           WHEN app_key = 'xxx'
               AND msg LIKE '%测评成功%'
               AND msg LIKE '%服务端测评%' THEN 1
           ELSE 0 END)                                                  AS server_soe_success_cnt,
  
   SUM(CASE
           WHEN app_key = 'xxxx'
               AND msg LIKE '%失败%'
               AND msg LIKE '%服务端测评%' THEN 1
           ELSE 0 END)                                                  AS server_soe_fail_cnt,

   SUM(CASE
           WHEN level = 'error'
               AND version = '3.7.5'
               AND msg LIKE '%录音失败%'
               AND get_json_object(msg, '$.offlineEvaluating') = '0' THEN 1
           ELSE 0 END)                                                  AS default_soe_fail_cnt,

   SUM(CASE
           WHEN level = 'error'
               AND version = '3.7.5'
               AND msg LIKE '%录音失败%'
               AND get_json_object(msg, '$.offlineEvaluating') = '1' THEN 1
           ELSE 0 END)                                                  AS default_server_soe_fail_cnt,

   COUNT(DISTINCT CASE WHEN app_key = 'xxxx' THEN uid ELSE NULL END) AS soe_people_cnt,

   SUM(CASE
           WHEN app_key = 'xxxx'
               AND msg LIKE '%测评完成%'
               AND msg NOT LIKE '%服务端测评%'
               AND msg LIKE '%xxxx%' THEN 1
           ELSE 0 END)                                                  AS clio_success_cnt,

   SUM(CASE
           WHEN app_key = 'xxxx'
               AND msg LIKE '%失败%'
               AND msg NOT LIKE '%服务端测评%'
               AND msg NOT LIKE '%测评取消失败%'
               AND msg NOT LIKE '%测评结束失败%'
               AND msg NOT LIKE '%初始化语音测评失败%'
               AND msg NOT LIKE '%系统错误,请联系客服为您处理(err:999)%'
               AND (action = 'xxxxx' OR action = 'SOELog')
               AND get_json_object(get_json_object(msg, '$.info'), '$.engineType') = '3' THEN 1
           ELSE 0 END)                                                  AS clio_fail_cnt,
  
   SUM(CASE
           WHEN app_key = 'xxx'
               AND msg LIKE '%测评完成%'
               AND msg NOT LIKE '%服务端测评%'
               AND msg LIKE '%xxxx%' THEN 1
           ELSE 0 END)                                                  AS stkouyu_success_cnt,
    
   SUM(CASE
           WHEN app_key = 'xxx'
               AND msg LIKE '%失败%'
               AND msg NOT LIKE '%测评取消失败%'
               AND msg NOT LIKE '%测评结束失败%'
               AND msg NOT LIKE '%初始化语音测评失败%'
               AND msg NOT LIKE '%系统错误,请联系客服为您处理(err:999)%'
               AND (action = 'xxx' OR action = 'xxxx')
               AND get_json_object(get_json_object(msg, '$.info'), '$.engineType') = '0' THEN 1
           ELSE 0 END)                                                  AS stkouyu_fail_cnt,

   SUM(CASE
           WHEN app_key = 'unitest'
               AND msg LIKE '%测评完成%'
               AND msg NOT LIKE '%服务端测评%'
               AND msg LIKE '%xxxx%' THEN 1
           ELSE 0 END)                                                  AS zhiyan_success_cnt,
 
   SUM(CASE
           WHEN app_key = 'xxx'
               AND msg LIKE '%失败%'
               AND msg NOT LIKE '%测评取消失败%'
               AND msg NOT LIKE '%测评结束失败%'
               AND msg NOT LIKE '%初始化语音测评失败%'
               AND msg NOT LIKE '%系统错误,请联系客服为您处理(err:999)%'
               AND (action = 'xxxx' OR action = 'xxxx')
               AND get_json_object(get_json_object(msg, '$.info'), '$.engineType') = '5' THEN 1
           ELSE 0 END)                                                  AS zhiyan_fail_cnt,

   SUM(CASE
           WHEN app_key = 'unitest'
               AND msg LIKE '%测评完成%'
               AND msg NOT LIKE '%服务端测评%'
               AND msg NOT LIKE '%xxxx%'
               AND msg NOT LIKE '%xxxx%'
               AND msg NOT LIKE '%xxxxx%'
               AND msg LIKE '%xxxxx%' THEN 1
           ELSE 0 END)                                                  AS offline_success_cnt,

   SUM(CASE
           WHEN app_key = 'unitest'
               AND msg LIKE '%测评完成%'
               AND msg NOT LIKE '%服务端测评%'
               AND msg LIKE '%xxxxx%' THEN 1
           ELSE 0 END)                                                  AS chisheng_success_cnt,

   SUM(CASE
           WHEN app_key = 'unitest'
               AND msg LIKE '%失败%'
               AND msg NOT LIKE '%服务端测评%'
               AND msg NOT LIKE '%测评取消失败%'
               AND msg NOT LIKE '%测评结束失败%'
               AND msg NOT LIKE '%初始化语音测评失败%'
               AND msg NOT LIKE '%系统错误,请联系客服为您处理(err:999)%'
               AND (action = 'xx' OR action = 'xxx')
               AND get_json_object(get_json_object(msg, '$.info'), '$.engineType') = '1' THEN 1
           ELSE 0 END)                                                  AS chisheng_fail_cnt,

   SUM(CASE
           WHEN action = 'xxx-xx-xx'
               AND channel = 'xx-xx-score'
               AND device_id = 'xx-voice-xx'
               THEN 1
           ELSE 0 END)                                                  AS optimus_score_count,

   SUM(CASE
           WHEN action = 'xxxxx'
               AND channel = 'xxx'
               AND device_id = 'xxxx'
               AND msg LIKE '%\"result\":true%' THEN 1
           ELSE 0 END)                                                  AS optimus_score_success_cnt,

   SUM(CASE
           WHEN level = 'error'
               AND action = '暂无'
               AND get_json_object(msg, '$.offlineEvaluating') = '0'
               AND get_json_object(msg, '$.isErrBeforeStart') = 'true' THEN 1
           ELSE 0 END)                                                  AS unsensible_fail_cnt

FROM paimon_catalog.paimon_log.dwd_prod_base_log_k12
WHERE dt = concat(current_date)-- 仅处理当前日期的数据
GROUP BY dt;

SHOW PARTITIONS FROM dws_k12soe_summary_mv_1h_v2查看异步视图分区,显示也只有最新分区有数据,其他分区没有数据,这是为什么

image 始终在覆盖数据。求加急

https://docs.starrocks.io/zh/docs/3.3/sql-reference/sql-statements/materialized_view/CREATE_MATERIALIZED_VIEW/#参数-1 参数异步物化视图刷新设置

CREATE MATERIALIZED VIEW IF NOT EXISTS dws_k12soe_summary_mv_1h_v2
PARTITION BY str2date (dt,’%Y-%m-%d’
)
REFRESH ASYNC
START (‘2024-01-01 09:00:00’) EVERY (INTERVAL 1 hour) 我参考就是这个呀

你们是不支持 WHERE dt = concat(current_date)-- 仅处理当前日期的数据
GROUP BY dt; 这种条件性的增量分区吗,因为我们基表数据量大, 不想直按日期分组去计算,想的是,只计算一天的数据,进行增量分区的

文档里面包含了创建、刷新、保留等各种策略。

CREATE MATERIALIZED VIEW IF NOT EXISTS dws_k12soe_summary_mv_1h
PARTITION BY dt
REFRESH ASYNC
START (‘2024-01-01 08:10:00’) EVERY (INTERVAL 1 hour)
PROPERTIES(
“auto_refresh_partitions_limit” = “2”,
“partition_refresh_number” = “2”
) 我有设置过没用,只要我加了where 条件就是覆盖

FROM paimon_catalog.paimon_log.dwd_prod_base_log_k12
WHERE dt = concat(current_date)-- 仅处理当前日期的数据
GROUP BY dt; 我加了 where,他就是只刷新当前日期的数据,且表里只有current_date的数据,其他分区都是空。

“auto_refresh_partitions_limit” = “2”,
“partition_refresh_number” = “2” 设置了这个的现象是什么。where条件你已经限定了只需要今天的数据,刷新时只有今天的数据,符合预期

现象就是表里始终只有一条数据。物化视图不是分区表吗,应该保留所有我这几天数据吧 。但我这个跑了好几天,但表里始终只有一条数据,就当天的数据。

“auto_refresh_partitions_limit” = “2”,
“partition_refresh_number” = “2” 使用该参数创建物化视图,不使用where条件限制,创建后手动刷新与底层sql的结果做对比