Process one query failed because IOException: com.starrocks.rpc.RpcException: transmit chunk rpc failed

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
【背景】做过哪些操作?
【业务影响】 SQL查询报错:StarRocks process failed
【是否存算分离】存储一体
【StarRocks版本】例如:3.1.4-0c4b2a3
【集群规模】例如:1fe(1 follower+2observer)+3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】StarRocks3.0-存算分离@糖
【附件】

  • fe.log/beINFO/相应截图

  • 慢查询:

    • Profile信息
      PLAN FRAGMENT 0
      OUTPUT EXPRS:41: scan_id | 83: round | 84: round
      PARTITION: UNPARTITIONED

    RESULT SINK

    8:EXCHANGE
    limit: 200

PLAN FRAGMENT 1
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 63: scan_id

STREAM DATA SINK
EXCHANGE ID: 08
UNPARTITIONED

7:Project
| <slot 41> : 41: scan_id
| <slot 83> : round(81: avg, 4)
| <slot 84> : round(82: avg, 4)
| limit: 200
|
6:AGGREGATE (update finalize)
| output: avg(79: get_json_double), avg(80: get_json_double)
| group by: 41: scan_id
| limit: 200
|
5:Project
| <slot 41> : 41: scan_id
| <slot 79> : get_json_double(76: para_value, ‘$.name0010[0]’)
| <slot 80> : get_json_double(76: para_value, ‘$.name0014[0]’)
|
4:HASH JOIN
| join op: RIGHT OUTER JOIN (PARTITIONED)
| colocate: false, reason:
| equal join conjunct: 63: scan_id = 41: scan_id
|
|----3:EXCHANGE
|
1:EXCHANGE

PLAN FRAGMENT 2
OUTPUT EXPRS:
PARTITION: RANDOM

STREAM DATA SINK
EXCHANGE ID: 03
HASH_PARTITIONED: 41: scan_id

2:OlapScanNode
TABLE: CP_WAFER_0725
PREAGGREGATION: ON
partitions=47/48
rollup: CP_WAFER_0725
tabletRatio=272/272
tabletList=132138,132142,132146,132150,132154,132158,132063,132067,132071,132075 …
cardinality=1480368
avgRowSize=8.0
numNodes=0

PLAN FRAGMENT 3
OUTPUT EXPRS:
PARTITION: RANDOM

STREAM DATA SINK
EXCHANGE ID: 01
HASH_PARTITIONED: 63: scan_id

0:OlapScanNode
TABLE: CP_DIE_TEST_RESULT_0724
PREAGGREGATION: ON
partitions=71/72
rollup: CP_DIE_TEST_RESULT_0724
tabletRatio=426/426
tabletList=130081,130083,130085,130087,130089,130091,130094,130096,130098,130100 …
cardinality=1264258286
avgRowSize=1032.0
numNodes=0

  • 并行度:1

  • pipeline是否开启:show variables like ‘%pipeline%’;
    |enable_pipeline_engine|true|
    |—|---|
    |enable_pipeline_query_statistic|true|
    |max_pipeline_dop|64|
    |pipeline_dop|0|
    |pipeline_profile_level|1|
    |pipeline_sink_dop|0|

  • 查询报错:
    2024-01-13 22:12:24,820 WARN (starrocks-mysql-nio-pool-18820|6510886) [StmtExecutor.execute():572] retry 1 times. stmt: /* ApplicationName=DBeaver 23.0.5 - SQLEditor / with
    base as (
    select
    bw.
    ,
    bd.part_id,
    bd.dut_id,
    bd.x_index,
    bd.y_index,
    concat_ws(’,’, bd.x_index, bd.y_index) as die_coor,
    bd.hard_bin_no,
    bd.soft_bin_no,
    bd.para_value
    from
    CP_WAFER bw
    left join CP_DIE_TEST_RESULT bd on
    bw.scan_id = bd.scan_id )
    select /*+ SET_VAR(query_mem_limit = 8147483648) */
    scan_id,
    round(avg(get_json_double(para_value, ‘$.name0010[0]’)), 4) AS name0010(MEAN),
    round(avg(get_json_double(para_value, ‘$.name0014[0]’)), 4) AS name0014(MEAN)
    from
    base
    group by
    scan_id
    LIMIT 0, 200
    2024-01-13 22:12:24,926 WARN (thrift-server-pool-6347361|6510889) [Coordinator.updateFragmentExecStatus():1670] exec state report failed status=errorCode THRIFT_RPC_ERROR transmit chunk rpc failed [dest_instance_id=808524f2-fb0d-4b40-b5fc-334aea89d76f] [dest=172.18.53.35:8060], query_id=808524f2-fb0d-4b40-b5fc-334aea89d766, instance_id=808524f2-fb0d-4b40-b5fc-334aea89d769
    2024-01-13 22:12:24,926 WARN (thrift-server-pool-6347361|6510889) [Coordinator.updateStatus():1472] one instance report fail throw updateStatus(), need cancel. job id: -1, query id: 808524f2-fb0d-4b40-b5fc-334aea89d766, instance id: 808524f2-fb0d-4b40-b5fc-334aea89d769
    2024-01-13 22:12:24,942 WARN (thrift-server-pool-6347361|6510889) [Coordinator.updateFragmentExecStatus():1670] exec state report failed status=errorCode CANCELLED InternalError, query_id=808524f2-fb0d-4b40-b5fc-334aea89d766, instance_id=808524f2-fb0d-4b40-b5fc-334aea89d770
    2024-01-13 22:12:24,945 WARN (thrift-server-pool-6347368|6510896) [Coordinator.updateFragmentExecStatus():1670] exec state report failed status=errorCode CANCELLED InternalError, query_id=808524f2-fb0d-4b40-b5fc-334aea89d766, instance_id=808524f2-fb0d-4b40-b5fc-334aea89d76d
    2024-01-13 22:12:24,946 WARN (thrift-server-pool-6347367|6510895) [Coordinator.updateFragmentExecStatus():1670] exec state report failed status=errorCode CANCELLED InternalError, query_id=808524f2-fb0d-4b40-b5fc-334aea89d766, instance_id=808524f2-fb0d-4b40-b5fc-334aea89d76e
    2024-01-13 22:12:24,947 WARN (starrocks-mysql-nio-pool-18820|6510886) [Coordinator.getNext():1492] get next fail, need cancel. status errorCode CANCELLED InternalError, query id: 808524f2-fb0d-4b40-b5fc-334aea89d766
    2024-01-13 22:12:24,947 WARN (starrocks-mysql-nio-pool-18820|6510886) [StmtExecutor.execute():652] execute IOException
    com.starrocks.rpc.RpcException: transmit chunk rpc failed [dest_instance_id=808524f2-fb0d-4b40-b5fc-334aea89d76f] [dest=172.18.53.35:8060], host: unknown
    at com.starrocks.qe.Coordinator.getNext(Coordinator.java:1515) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:938) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:503) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:363) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:477) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:753) ~[starrocks-fe.jar:?]
    at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69) ~[starrocks-fe.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
    2024-01-13 22:12:24,948 WARN (starrocks-mysql-nio-pool-18820|6510886) [ConnectProcessor.handleQuery():381] Process one query failed because IOException:
    com.starrocks.rpc.RpcException: transmit chunk rpc failed [dest_instance_id=808524f2-fb0d-4b40-b5fc-334aea89d76f] [dest=172.18.53.35:8060], host: unknown
    at com.starrocks.qe.Coordinator.getNext(Coordinator.java:1515) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:938) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:503) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:363) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:477) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:753) ~[starrocks-fe.jar:?]
    at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69) ~[starrocks-fe.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
    2024-01-13 22:12:24,948 WARN (thrift-server-pool-6347370|6510898) [Coordinator.updateFragmentExecStatus():1670] exec state report failed status=errorCode CANCELLED InternalError, query_id=808524f2-fb0d-4b40-b5fc-334aea89d766, instance_id=808524f2-fb0d-4b40-b5fc-334aea89d767
    2024-01-13 22:12:24,950 WARN (thrift-server-pool-6347369|6510897) [Coordinator.updateFragmentExecStatus():1670] exec state report failed status=errorCode CANCELLED InternalError, query_id=808524f2-fb0d-4b40-b5fc-334aea89d766, instance_id=808524f2-fb0d-4b40-b5fc-334aea89d76f
    2024-01-13 22:12:24,957 WARN (thrift-server-pool-6347372|6510901) [Coordinator.updateFragmentExecStatus():1670] exec state report failed status=errorCode CANCELLED InternalError, query_id=808524f2-fb0d-4b40-b5fc-334aea89d766, instance_id=808524f2-fb0d-4b40-b5fc-334aea89d768
    2024-01-13 22:12:24,957 WARN (thrift-server-pool-6347361|6510889) [Coordinator.updateFragmentExecStatus():1670] exec state report failed status=errorCode THRIFT_RPC_ERROR transmit chunk rpc failed [dest_instance_id=808524f2-fb0d-4b40-b5fc-334aea89d76f] [dest=172.18.53.35:8060], query_id=808524f2-fb0d-4b40-b5fc-334aea89d766, instance_id=808524f2-fb0d-4b40-b5fc-334aea89d76c
    2024-01-13 22:12:25,109 WARN (thrift-server-pool-6347364|6510892) [Coordinator.updateFragmentExecStatus():1670] exec state report failed status=errorCode CANCELLED InternalError, query_id=808524f2-fb0d-4b40-b5fc-334aea89d766, instance_id=808524f2-fb0d-4b40-b5fc-334aea89d76a
    2024-01-13 22:12:25,183 WARN (pool-23-thread-15297|6510902) [PartitionBasedMvRefreshProcessor.collectBaseTables():1318] table 10149.234492 do not exist when refreshing materialized view:DEFECT_IMAGE
    2024-01-13 22:12:25,183 WARN (pool-23-thread-15297|6510902) [TaskRunExecutor.lambda$executeTaskRun$0():54] failed to execute TaskRun.
    com.starrocks.sql.common.DmlException: Materialized view base table: 10149.234492 not exist.
    at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.collectBaseTables(PartitionBasedMvRefreshProcessor.java:1320) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.syncPartitions(PartitionBasedMvRefreshProcessor.java:654) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doMvRefresh(PartitionBasedMvRefreshProcessor.java:207) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.processTaskRun(PartitionBasedMvRefreshProcessor.java:185) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.TaskRun.executeTaskRun(TaskRun.java:189) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.TaskRunExecutor.lambda$executeTaskRun$0(TaskRunExecutor.java:47) ~[starrocks-fe.jar:?]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
    2024-01-13 22:12:33,797 ERROR (nioEventLoopGroup-8-8|204) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:12:33,822 ERROR (nioEventLoopGroup-8-4|200) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:12:34,789 ERROR (nioEventLoopGroup-8-5|201) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:12:34,809 ERROR (nioEventLoopGroup-8-8|204) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:12:35,672 ERROR (nioEventLoopGroup-8-4|200) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:12:44,893 ERROR (nioEventLoopGroup-8-4|200) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:12:44,916 ERROR (nioEventLoopGroup-8-8|204) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:17:25,200 WARN (pool-23-thread-15298|6510911) [PartitionBasedMvRefreshProcessor.collectBaseTables():1318] table 10149.234492 do not exist when refreshing materialized view:DEFECT_IMAGE
    2024-01-13 22:17:25,200 WARN (pool-23-thread-15298|6510911) [TaskRunExecutor.lambda$executeTaskRun$0():54] failed to execute TaskRun.
    com.starrocks.sql.common.DmlException: Materialized view base table: 10149.234492 not exist.
    at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.collectBaseTables(PartitionBasedMvRefreshProcessor.java:1320) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.syncPartitions(PartitionBasedMvRefreshProcessor.java:654) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.doMvRefresh(PartitionBasedMvRefreshProcessor.java:207) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.PartitionBasedMvRefreshProcessor.processTaskRun(PartitionBasedMvRefreshProcessor.java:185) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.TaskRun.executeTaskRun(TaskRun.java:189) ~[starrocks-fe.jar:?]
    at com.starrocks.scheduler.TaskRunExecutor.lambda$executeTaskRun$0(TaskRunExecutor.java:47) ~[starrocks-fe.jar:?]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
    2024-01-13 22:20:06,230 ERROR (nioEventLoopGroup-8-16|289) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:20:06,273 ERROR (nioEventLoopGroup-8-16|289) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:20:08,284 ERROR (nioEventLoopGroup-8-14|224) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:20:08,331 ERROR (nioEventLoopGroup-8-15|255) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:20:15,104 ERROR (nioEventLoopGroup-8-13|223) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js
    2024-01-13 22:20:15,138 ERROR (nioEventLoopGroup-8-11|209) [StaticResourceAction.executeGet():173] Request with wrong path. url: /static?res=starrocks.js