3个fe节点中的其中一个fe不响应客户端请求,客户端创建连接失败

版本信息:starrocks 1.18.4
jdk: 1.8.0_231
发生频率:隔几天

日志:
fe.warn.log:

2022-04-21 15:25:18,117 WARN (doris-mysql-nio-pool-7507|9880) [ReadListener.lambda$handleEvent$417():58] Exception happe
ned in one session(org.apache.doris.mysql.nio.NConnectContext@3c1ba7d3).
java.io.IOException: Error happened when receiving packet.
at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:617) ~[starrocks-fe.jar:?]
at org.apache.doris.mysql.nio.ReadListener.lambda$handleEvent$417(ReadListener.java:50) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]

fe.log:
2022-04-21 15:26:15,647 WARN (doris-mysql-nio-pool-7506|9879) [Coordinator.exec():529] catch a execute exception
java.util.concurrent.ExecutionException: A error occurred: errorCode=62 errorMessage:method request time out, please che
ck ‘onceTalkTimeout’ property. current value is:60000(MILLISECONDS) correlationId:2080966 timeout with bound channel =>[
id: 0x2082085a, L:/10.163.136.7:32765 - R:/10.163.138.13:8060]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy$2.get(ProtobufRpcProxy.java:551) ~[jprotobuf-rpc-core-3.5.2
1.jar:?]
at org.apache.doris.qe.Coordinator.exec(Coordinator.java:522) ~[starrocks-fe.jar:?]
at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:745) ~[starrocks-fe.jar:?]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:359) ~[starrocks-fe.jar:?]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:241) ~[starrocks-fe.jar:?]
at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:390) ~[starrocks-fe.jar:?]
at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:626) ~[starrocks-fe.jar:?]
at org.apache.doris.mysql.nio.ReadListener.lambda$handleEvent$417(ReadListener.java:50) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]
Caused by: com.baidu.jprotobuf.pbrpc.ErrorDataException: A error occurred: errorCode=62 errorMessage:method request time
out, please check ‘onceTalkTimeout’ property. current value is:60000(MILLISECONDS) correlationId:2080966 timeout with b
ound channel =>[id: 0x2082085a, L:/10.163.136.7:32765 - R:/10.163.138.13:8060]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy.doWaitCallback(ProtobufRpcProxy.java:625) ~[jprotobuf-rpc-c
ore-3.5.21.jar:?]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy.access$000(ProtobufRpcProxy.java:51) ~[jprotobuf-rpc-core-3
.5.21.jar:?]
at com.baidu.jprotobuf.pbrpc.client.ProtobufRpcProxy$2.get(ProtobufRpcProxy.java:549) ~[jprotobuf-rpc-core-3.5.2
1.jar:?]
2022-04-21 15:26:15,648 WARN (doris-mysql-nio-pool-7506|9879) [Coordinator.exec():544] exec plan fragment failed, errmsg
=exec rpc error. backend id: 11038, code: THRIFT_RPC_ERROR, fragmentId=F00, backend=10.163.138.13:9060
2022-04-21 15:26:15,648 WARN (doris-mysql-nio-pool-7506|9879) [SimpleScheduler.addToBlacklist():141] add black list 1103
8

客户端错误日志:

Error querying database. Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 5000, active 1, maxActive 200, creating 1, createElapseMillis 10006

检查下你是否并发很大?导致连接数打满了,另外fe.conf中jvm大小是多少,适当调大点

连接数没有打满,jvm也是足够的

大佬,最后怎么解决的

BE brpc_port 8060 FE<–>BE BE <–> BE BE 上的 brpc 端口,用于 BE 之间通讯

BE be_port 9060 FE --> BE BE 上 thrift server 的端口,用于接收来自 FE 的请求

10.163.138.13:8060这个be和其他be之间通讯可能出现了问题,
add black list 11038 甚至有可能加入了黑名单

你们有解决吗? 我们也有相同的问题, 隔一段时间就会有连接失败或超时

请问最后有解决嘛?目前我也出现过这个问题 从jvm来看8g是满足的

请单独发个帖子补充下您的sr的版本,使用方式,和报错信息