fe节点报 Connection reset by peer/OutOfMemoryError/bdb recoveryTracker should overlap or follow on disk

fe节点掉线无法启动 继续讨论:

【详述】问题详细描述

2023-02-20 01:50:37,457 WARN (starrocks-mysql-nio-pool-306|16537) [AcceptListener.lambda$handleEvent$1():92] connect processor exception because
java.io.IOException: Connection reset by peer

2023-02-20 01:56:57,241 ERROR (starrocks-mysql-nio-pool-308|16539) [AcceptListener.lambda$handleEvent$1():89] connect processor exception because
java.lang.OutOfMemoryError: Java heap space

2023-02-20 02:01:43,743 WARN (replayer|69) [GlobalStateMgr$5.setCanRead():1613] meta out of date. current time: 1676829703743, synchronized time: 1676824223122, has log: true, fe type: FOLLOWER

2023-02-20 02:34:42,549 ERROR (main|1) [BDBEnvironment.setupEnvironment():319] failed to setup environment after retried 1 times
com.sleepycat.je.EnvironmentFailureException: (JE 7.3.7) 172.17.105.69_9010_1670468419616(-1):/data/app/starrocks/fe/meta/bdb recoveryTracker should overlap or follow on disk last VLSN of 233,576,751 recoveryFirst= 233,576,753 UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.

【背景】做过哪些操作?
【业务影响】
【StarRocks版本】2.4.2
【集群规模】3+3独立部署 阿里云
【机器信息】
【联系方式】
【附件】
fe.log.0220 (14.5 MB) fe.warn.log.0220 (2.1 MB)

启动不起来是这个报错?这个是follower节点的信息吗?

是的,这个是follow节点,具体信息已经上传的日志

这个是bdb的bug,手动处理方案就是1.将元数据目录清空2.通过–helper的形式启动fe,这个问题会在3.0彻底修复

看了下内存镜像是insert log job内存泄漏引起的。升级到2.4.3吧。

你好,我是2.5.4版本,我也出现了这个问题,但是我尝试删除meta目录,并将节点踢除再重新加入集群还是报错,我的启动命令是bin/start_fe.sh --helper “masterip:9010” --daemon