SR2.4.2 报错failed to read after retried 3 times!

【详述】flink cdc 导入sr任务挂掉
【背景】flink cdc mysql 导入kafka ,再从kafka同步到sr
【业务影响】实时同步停止
【StarRocks版本】2.4.2
【集群规模】阿里云 emr 3fe 4C16G+3be 4C32G
【表模型】主键模型
【导入或者导出方式】:Flink cdc
【联系方式】社区群6-哎往年
【附件】FE报错


BE报错

fe的meta目录所在的磁盘io较高,影响了元数据的读写,检查一下fe元数据所在的磁盘性能,建议将fe的meta目录放在独立的磁盘上

java.io.IOException: Connection reset by peer 是这个报错吗?

上面读取fe元数据的时候报错failed to read after retried 3 times

那下面那个IO告警也经常出现是什么问题呢?二三秒就会报一次
2023-05-17 16:23:33,589 WARN (starrocks-mysql-nio-pool-2320|3181) [AcceptListener.lambda$handleEvent$1():92] connect processor exception because
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:1.8.0_352]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[?:1.8.0_352]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:1.8.0_352]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:1.8.0_352]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) ~[?:1.8.0_352]
at org.xnio.nio.NioSocketConduit.write(NioSocketConduit.java:153) ~[xnio-nio-3.7.9.Final.jar:3.7.9.Final]
at org.xnio.conduits.ConduitStreamSinkChannel.write(ConduitStreamSinkChannel.java:150) ~[xnio-api-3.7.9.Final.jar:3.7.9.Final]
at org.xnio.channels.Channels.writeBlocking(Channels.java:97) ~[xnio-api-3.7.9.Final.jar:3.7.9.Final]
at com.starrocks.mysql.nio.NMysqlChannel.realNetSend(NMysqlChannel.java:82) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.MysqlChannel.flush(MysqlChannel.java:218) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.MysqlChannel.sendAndFlush(MysqlChannel.java:287) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.MysqlProto.negotiate(MysqlProto.java:103) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.AcceptListener.lambda$handleEvent$1(AcceptListener.java:68) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_352]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_352]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_352]

看下集群监控,IO压力是不是比较大。 Connection reset by peer(目的端拒绝了请求) 可以通过 ss -an 命令看下系统的backlog是否已经满了(达到了128个上限)