【详述】问题详细描述
今天早上starrocks的查询服务突然异常,切部分写入任务也出现报错
发现一台be监控上显示dead
实际情况并没有dead
期间fe会有如下大量的异常日志
2023-09-11 10:10:00,302 WARN (starrocks-mysql-nio-pool-305896|1189777) [Coordinator.prepareResultSink():619] catch a execute exception
java.util.concurrent.ExecutionException: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:60000(MILLISECONDS) correlationId:2721763486 timeout with bound channel =>[id: 0xb213deb4, L:/10.2.72.238:42476 - R:/10.2.72.115:8060]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy$2.get(ProtobufRpcProxy.java:578) ~[jprotobuf-rpc-core-4.2.1.jar:?]
at com.starrocks.qe.Coordinator.prepareResultSink(Coordinator.java:613) ~[starrocks-fe.jar:?]
at com.starrocks.qe.Coordinator.exec(Coordinator.java:434) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:704) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:397) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:322) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:440) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:676) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:55) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
Caused by: com.baidu.jprotobuf.pbrpc.ErrorDataException: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:60000(MILLISECONDS) correlationId:2721763486 timeout with bound channel =>[id: 0xb213deb4, L:/10.2.72.238:42476 - R:/10.2.72.115:8060]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy.doWaitCallback(ProtobufRpcProxy.java:651) ~[jprotobuf-rpc-core-4.2.1.jar:?]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy.access$0(ProtobufRpcProxy.java:611) ~[jprotobuf-rpc-core-4.2.1.jar:?]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy$2.get(ProtobufRpcProxy.java:576) ~[jprotobuf-rpc-core-4.2.1.jar:?]
... 11 more
2023-09-11 10:10:00,302 WARN (starrocks-mysql-nio-pool-305896|1189777) [Coordinator.prepareResultSink():634] exec plan fragment failed, errmsg=exec rpc error. backend id: 11009, code: THRIFT_RPC_ERROR, fragmentId=F01, backend=10.2.72.115:9060
2023-09-11 10:10:00,303 WARN (starrocks-mysql-nio-pool-305896|1189777) [SimpleScheduler.addToBlacklist():143] add black list 11009
2023-09-11 10:10:00,303 WARN (starrocks-mysql-nio-pool-305896|1189777) [StmtExecutor.execute():408] retry 1 times. stmt: select count(distinct conn_id) as connid from user_enters_session_statistics_v2 where user_id = 159079459 and data_time = ’ 2023-09-11 00:00:00 ’ and isactive = 1 ;
2023-09-11 10:10:00,360 WARN (starrocks-mysql-nio-pool-305904|1189785) [ReadListener.lambda$handleEvent$0():63] Exception happened in one session(com.starrocks.mysql.nio.NConnectContext@390b4463).
java.io.IOException: Error happened when receiving packet.
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:667) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:55) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
2023-09-11 10:10:00,403 WARN (starrocks-mysql-nio-pool-305896|1189777) [ReadListener.lambda$handleEvent$0():63] Exception happened in one session(com.starrocks.mysql.nio.NConnectContext@7345772c).
java.io.IOException: Error happened when receiving packet.
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:667) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:55) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
上下文日志也已上传。
看社区也有类似的贴子,但是这边是没有路由器防火墙的,这个情况也是第一次发生。还请帮忙看下。
【背景】重启该be后服务慢慢恢复
【业务影响】无法提供服务
【StarRocks版本】2.3.3
【集群规模】3fe+9be
【机器信息】CPU虚拟核/内存/网卡,fe:40C/256G/万兆 ,be: 40C/256G/万兆 和 48C/256G/万兆
【联系方式】qinhao01@xinye.com
log_TMP.rar (56.1 MB)




