开启pipeline_level_shuffle后计算丢失部分数据行数

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
一个线上的sql查询不出计算结果,排查后发现在row_number算子前的shuffle算子output rows行数不对,单独把子查询的sql拉出来查询是正常的,通过源码找到了enable_pipeline_level_shuffle这个配置 关闭后就能正常返回数据
【背景】做过哪些操作?
关闭前 数据行数不对 计算结果异常


关闭后 数据行数正常 计算结果正常

【业务影响】
【是否存算分离】
【StarRocks版本】3.3.6
【集群规模】例如:3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【表模型】例如:主键模型
【导入或者导出方式】例如:Flink
【联系方式】社区群22hpp
【附件】
运行的sql:
计算逻辑是分别算出两个conut distinct的值之后进行join,里面有一个开窗函数
SELECT /*+ SET_VAR(‘enable_pipeline_level_shuffle’ = ‘false’) */ dev_name AS dev_name,
khfl,t1dev_name,t2dev_name,
_COUNT_DISTINCT_hykh_zKD_w1W,
_COUNT_DISTINCT_uid_d2x_6q6
FROM (SELECT _persist_COLLECTION_vNsp3lCOkB_WORKING.dev_name,
‘total_sub_key’ AS khfl,
_persist_COLLECTION_vNsp3lCOkB_WORKING.dev_name AS t1dev_name,
_persist_COLLECTION_vNsp3lCOkB_WORKING0.dev_name AS t2dev_name,
CAST(_persist_COLLECTION_vNsp3lCOkB_WORKING.$f1 AS DOUBLE) AS _COUNT_DISTINCT_hykh_zKD_w1W,
CAST(_persist_COLLECTION_vNsp3lCOkB_WORKING0.$f1 AS DOUBLE) AS _COUNT_DISTINCT_uid_d2x_6q6
FROM ((SELECT dev_name, COUNT(hykh) AS $f1
FROM (SELECT dev_name, hykh
FROM (SELECT CASE
WHEN DATEDIFF(MAX($23) OVER (PARTITION BY dev_name), $23) <= 60
THEN effective_cus
ELSE CAST(NULL AS VARCHAR) END AS hykh,
uid,
dev_name
FROM (SELECT _persist_COLLECTION_vNsp3lCOkB_WORKING.uid,
_persist_COLLECTION_vNsp3lCOkB_WORKING.effective_cus,
_persist_COLLECTION_vNsp3lCOkB_WORKING.dev_name,
_persist_COLLECTION_vNsp3lCOkB_WORKING.create_time AS $23
FROM cubeappdata._persist_COLLECTION_vNsp3lCOkB_WORKING) AS _persist_COLLECTION_vNsp3lCOkB_WORKING) AS _persist_COLLECTION_vNsp3lCOkB_WORKING
GROUP BY dev_name, hykh) AS _persist_COLLECTION_vNsp3lCOkB_WORKING
GROUP BY dev_name) AS _persist_COLLECTION_vNsp3lCOkB_WORKING INNER JOIN (SELECT dev_name, COUNT(uid) AS $f1
FROM (SELECT dev_name, uid
FROM (SELECT CASE
WHEN DATEDIFF(MAX($23) OVER (PARTITION BY dev_name), $23) <= 60
THEN effective_cus
ELSE CAST(NULL AS VARCHAR) END AS hykh,
uid,
dev_name
FROM (SELECT _persist_COLLECTION_vNsp3lCOkB_WORKING.uid,
_persist_COLLECTION_vNsp3lCOkB_WORKING.effective_cus,
_persist_COLLECTION_vNsp3lCOkB_WORKING.dev_name,
_persist_COLLECTION_vNsp3lCOkB_WORKING.create_time AS $23
FROM cubeappdata._persist_COLLECTION_vNsp3lCOkB_WORKING) AS _persist_COLLECTION_vNsp3lCOkB_WORKING0) AS _persist_COLLECTION_vNsp3lCOkB_WORKING0
GROUP BY dev_name, uid) AS _persist_COLLECTION_vNsp3lCOkB_WORKING0
GROUP BY dev_name) AS _persist_COLLECTION_vNsp3lCOkB_WORKING0
ON _persist_COLLECTION_vNsp3lCOkB_WORKING.dev_name =
_persist_COLLECTION_vNsp3lCOkB_WORKING0.dev_name)
) AS t0

  • fe.log/be.INFO/相应截图
    两次查询的日志文件和expalin数据文件
    analyze.log (112.1 KB)
  • 完整的报错异常栈

能找个最小复现case吗

本地尝试复现了一下 没复现出来 可能跟具体数据有关 我再尝试一下看看