为了更快的定位您的问题,请提供以下信息,谢谢
【详述】对一个be的pod和一个fe的pod下电,下电后内表查询正常;上电后查询报超时;目前怀疑下电再上电后pod ip发生变化影响的,但是当前FQDN方式早就不再依赖ip,部署的时候也没有依赖IP,从下面的日志来看,上电后,brpc的channel取出来还是带了下电前的be pod ip,过了一段时间(时间不太固定有时候得20几分钟)查询自动恢复正常后,brpc的channel取出来就变成了重新上电后的be pod ip。感觉像这种pod突发异常掉电的情况,社区是不是没有考虑
【背景】对一个be的pod和一个fe的pod下电,下电后内表查询正常;上电后查询报超时
【业务影响】
【是否存算分离】存算一体
【StarRocks版本】3.2.13
【集群规模】3fe+3be fqdn的方式组建
【附件】
[2025-07-03 16:27:27.786 +0800] [] [] [WARN] [starrocks-mysql-nio-pool-94] [FragmentInstanceExecState.java] [com.starrocks.qe.scheduler.dag.FragmentInstanceExecState] [waitForDeploymentCompletion] [274] catch a execute exception java.util.concurrent.ExecutionException: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:60000(MILLISECONDS) correlationId:275 timeout with bound channel =>[id: 0xbaa63cc9, L:/20.20.156.146:33578 - R:odaeqebeservice-0.odaeqebeservice-svc.sop.svc.cluster.local/20.20.65.156:28243]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy$2.get(ProtobufRpcProxy.java:578)
at com.starrocks.qe.scheduler.dag.FragmentInstanceExecState.waitForDeploymentCompletion(FragmentInstanceExecState.java:268)
at com.starrocks.qe.scheduler.Deployer.waitForDeploymentCompletion(Deployer.java:225)
at com.starrocks.qe.scheduler.Deployer.deployFragments(Deployer.java:116)
at com.starrocks.qe.DefaultCoordinator.deliverExecFragments(DefaultCoordinator.java:589)
at com.starrocks.qe.DefaultCoordinator.startScheduling(DefaultCoordinator.java:502)
at com.starrocks.qe.scheduler.Coordinator.startScheduling(Coordinator.java:102)
at com.starrocks.qe.scheduler.Coordinator.exec(Coordinator.java:85)
at com.starrocks.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1132)
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:634)
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:346)
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:542)
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:850)
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:70)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.baidu.jprotobuf.pbrpc.ErrorDataException: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:60000(MILLISECONDS) correlationId:275 timeout with bound channel =>[id: 0xbaa63cc9, L:/20.20.156.146:33578 - R:odaeqebeservice-0.odaeqebeservice-svc.sop.svc.cluster.local/20.20.65.156:28243]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy.doWaitCallback(ProtobufRpcProxy.java:651)
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy.access$0(ProtobufRpcProxy.java:611)
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy$2.get(ProtobufRpcProxy.java:576)
… 16 more
[2025-07-03 16:27:27.788 +0800] [] [] [WARN] [starrocks-mysql-nio-pool-94] [ExecuteExceptionHandler.java] [com.starrocks.qe.ExecuteExceptionHandler] [handleRpcException] [96] Query cancelled by crash of backends or RpcException, [QueryId=6a15f144-57e7-11f0-8099-ea28c28adb39] [SQL=SELECT pr_dt_wytest1953557994_logical
.dn
AS dn
, pr_dt_wytest1953557994_logical
.timestamp
AS timestamp
FROM (SELECT pr_dt_wytest1953557994
.timestamp
AS timestamp
, pr_dt_wytest1953557994
.dn
AS dn
, pr_dt_wytest1953557994
.createTime
AS createTime
, pr_dt_wytest1953557994
.arrivalTime
AS arrivalTime
, pr_dt_wytest1953557994
.saveTime
AS saveTime
, pr_dt_wytest1953557994
.le
AS le
, pr_dt_wytest1953557994
.go_gc_heap_allocs_by_size_bytes_bucket
AS go_gc_heap_allocs_by_size_bytes_bucket
FROM _default
.dte__DEFAULT_pr_dt_wytest1953557994
AS pr_dt_wytest1953557994
) AS pr_dt_wytest1953557994_logical
WHERE pr_dt_wytest1953557994_logical
.dn
= ‘NE=U7v767kBRipMuCu7lBfPA’ AND pr_dt_wytest1953557994_logical
.timestamp
>= 1751161239103 AND pr_dt_wytest1953557994_logical
.timestamp
<= 1751334039103 ORDER BY timestamp
DESC LIMIT 1] [Plan=PLAN COST\n CPU: 1412.5039265664927\n Memory: 64.0\n\nPLAN FRAGMENT 0(F01)\n Output Exprs:2: dn | 1: timestamp\n Input Partition: UNPARTITIONED\n RESULT SINK\n\n 2:MERGING-EXCHANGE\n distribution type: GATHER\n partition type: UNPARTITIONED\n limit: 1\n cardinality: 1\n column statistics: \n * timestamp–>[1.751333943E12, 1.751334039103E12, 0.0, 8.0, 21.57037385260145] ESTIMATE\n * dn–>[-Infinity, Infinity, 0.0, 24.0, 1.0] ESTIMATE\n\nPLAN FRAGMENT 1(F00)\n\n Input Partition: RANDOM\n OutPut Partition: UNPARTITIONED\n OutPut Exchange Id: 02\n\n 1:TOP-N\n | order by: [1, BIGINT, true] DESC\n | build runtime filters:\n | - filter_id = 0, build_expr = (<slot 1> 1: timestamp), remote = false\n | offset: 0\n | limit: 1\n | cardinality: 1\n | column statistics: \n | * timestamp–>[1.751333943E12, 1.751334039103E12, 0.0, 8.0, 21.57037385260145] ESTIMATE\n | * dn–>[-Infinity, Infinity, 0.0, 24.0, 1.0] ESTIMATE\n | \n 0:OlapScanNode\n table: dte__DEFAULT_pr_dt_wytest1953557994, rollup: dte__DEFAULT_pr_dt_wytest1953557994\n preAggregation: on\n Predicates: [2: dn, VARCHAR, true] = ‘NE=U7v767kBRipMuCu7lBfPA’, [1: timestamp, BIGINT, true] >= 1751161239103, [1: timestamp, BIGINT, true] <= 1751334039103\n partitionsRatio=45/46, tabletsRatio=135/135\n tabletList=45261,45265,45269,46064,46068,46072,49827,49831,49835,53914 …\n actualRows=53960, avgRowSize=32.0\n cardinality: 22\n probe runtime filters:\n - filter_id = 0, probe_expr = (<slot 1> 1: timestamp)\n column statistics: \n * timestamp–>[1.751333943E12, 1.751334039103E12, 0.0, 8.0, 21.57037385260145] ESTIMATE\n * dn–>[-Infinity, Infinity, 0.0, 24.0, 1.0] ESTIMATE\n] com.starrocks.rpc.RpcException: rpc failed with odaeqebeservice-0.odaeqebeservice-svc.sop.svc.cluster.local: exec rpc error. backend [id=97199]
at com.starrocks.qe.DefaultCoordinator.handleErrorExecution(DefaultCoordinator.java:607)
at com.starrocks.qe.scheduler.Deployer.waitForDeploymentCompletion(Deployer.java:244)
at com.starrocks.qe.scheduler.Deployer.deployFragments(Deployer.java:116)
at com.starrocks.qe.DefaultCoordinator.deliverExecFragments(DefaultCoordinator.java:589)
at com.starrocks.qe.DefaultCoordinator.startScheduling(DefaultCoordinator.java:502)
at com.starrocks.qe.scheduler.Coordinator.startScheduling(Coordinator.java:102)
at com.starrocks.qe.scheduler.Coordinator.exec(Coordinator.java:85)
at com.starrocks.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1132)
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:634)
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:346)
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:542)
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:850)
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:70)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.concurrent.ExecutionException: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:60000(MILLISECONDS) correlationId:275 timeout with bound channel =>[id: 0xbaa63cc9, L:/20.20.156.146:33578 - R:odaeqebeservice-0.odaeqebeservice-svc.sop.svc.cluster.local/20.20.65.156:28243]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy$2.get(ProtobufRpcProxy.java:578)
at com.starrocks.qe.scheduler.dag.FragmentInstanceExecState.waitForDeploymentCompletion(FragmentInstanceExecState.java:268)
at com.starrocks.qe.scheduler.Deployer.waitForDeploymentCompletion(Deployer.java:225)
… 14 more
Caused by: com.baidu.jprotobuf.pbrpc.ErrorDataException: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:60000(MILLISECONDS) correlationId:275 timeout with bound channel =>[id: 0xbaa63cc9, L:/20.20.156.146:33578 - R:odaeqebeservice-0.odaeqebeservice-svc.sop.svc.cluster.local/20.20.65.156:28243]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy.doWaitCallback(ProtobufRpcProxy.java:651)
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy.access$0(ProtobufRpcProxy.java:611)
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy$2.get(ProtobufRpcProxy.java:576)
at com.starrocks.qe.scheduler.dag.FragmentInstanceExecState.waitForDeploymentCompletion(FragmentInstanceExecState.java:268)
at com.starrocks.qe.scheduler.Deployer.waitForDeploymentCompletion(Deployer.java:225)