【详述】因服务器迁移,使用扩容的方式解决。一共3台机器,唯有其中1台fe启动报错
【背景】最开始部署1.19版本,为支持某个数据类型替换了je-7.3.7.jar,使用je-18.3.12.jar
【业务影响】
【StarRocks版本】例如:2.3.3
【集群规模】例如:3fe(3 follower)+3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【附件】
fe.log报错日志
2022-11-27 17:15:16,831 INFO (UNKNOWN 172.24.20.45_9010_1669540500448(-1)|1) [BDBEnvironment.close():442] close log databases end
2022-11-27 17:15:16,831 INFO (UNKNOWN 172.24.20.45_9010_1669540500448(-1)|1) [BDBEnvironment.close():445] start to close epoch database
2022-11-27 17:15:16,831 INFO (UNKNOWN 172.24.20.45_9010_1669540500448(-1)|1) [BDBEnvironment.close():454] close epoch database end
2022-11-27 17:15:16,831 INFO (UNKNOWN 172.24.20.45_9010_1669540500448(-1)|1) [BDBEnvironment.close():456] start to close replicated environment
2022-11-27 17:15:16,833 INFO (UNKNOWN 172.24.20.45_9010_1669540500448(-1)|1) [BDBEnvironment.close():466] close replicated environment end
2022-11-27 17:15:17,045 WARN (UNKNOWN 172.24.20.45_9010_1669540500448(-1)|1) [BDBEnvironment.setup():225] database exception
com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.24.20.45_9010_1669540500448(-1):/data/starrocks/fe/m
eta/bdb recoveryTracker should overlap or follow on disk last VLSN of 89,585,827 recoveryFirst= 89,585,829 UNEXPECTED_STATE_FATAL: Unexpected internal state, unable t
o continue. Environment is invalid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:459) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.vlsn.VLSNIndex.merge(VLSNIndex.java:1641) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.vlsn.VLSNIndex.init(VLSNIndex.java:1534) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.vlsn.VLSNIndex.(VLSNIndex.java:426) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.impl.RepImpl.preRecoveryCheckpointInit(RepImpl.java:575) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.recovery.RecoveryManager.recover(RecoveryManager.java:508) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.dbi.EnvironmentImpl.finishInit(EnvironmentImpl.java:895) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:222) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.Environment.(Environment.java:258) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.ReplicatedEnvironment.(ReplicatedEnvironment.java:605) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.ReplicatedEnvironment.(ReplicatedEnvironment.java:464) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.ReplicatedEnvironment.(ReplicatedEnvironment.java:538) ~[je-18.3.12.jar:18.3.12]
at com.starrocks.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:185) [starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:271) [starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.open(EditLog.java:900) [starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.initialize(GlobalStateMgr.java:845) [starrocks-fe.jar:?]
at com.starrocks.StarRocksFE.start(StarRocksFE.java:109) [starrocks-fe.jar:?]
at com.starrocks.StarRocksFE.main(StarRocksFE.java:64) [starrocks-fe.jar:?]
2022-11-27 17:15:17,045 ERROR (UNKNOWN 172.24.20.45_9010_1669540500448(-1)|1) [BDBEnvironment.setup():233] error to open replicated environment. will exit.
com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.24.20.45_9010_1669540500448(-1):/data/starrocks/fe/meta/bdb recoveryTracker should overlap or follow on disk last VLSN of 89,585,827 recoveryFirst= 89,585,829 UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:459) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.vlsn.VLSNIndex.merge(VLSNIndex.java:1641) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.vlsn.VLSNIndex.init(VLSNIndex.java:1534) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.vlsn.VLSNIndex.(VLSNIndex.java:426) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.impl.RepImpl.preRecoveryCheckpointInit(RepImpl.java:575) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.recovery.RecoveryManager.recover(RecoveryManager.java:508) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.dbi.EnvironmentImpl.finishInit(EnvironmentImpl.java:895) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:222) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.Environment.(Environment.java:258) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.ReplicatedEnvironment.(ReplicatedEnvironment.java:605) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.ReplicatedEnvironment.(ReplicatedEnvironment.java:464) ~[je-18.3.12.jar:18.3.12]
at com.sleepycat.je.rep.ReplicatedEnvironment.(ReplicatedEnvironment.java:538) ~[je-18.3.12.jar:18.3.12]
at com.starrocks.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:185) [starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:271) [starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.open(EditLog.java:900) [starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.initialize(GlobalStateMgr.java:845) [starrocks-fe.jar:?]
at com.starrocks.StarRocksFE.start(StarRocksFE.java:109) [starrocks-fe.jar:?]
at com.starrocks.StarRocksFE.main(StarRocksFE.java:64) [starrocks-fe.jar:?]
多次尝试将meta清空并drop follower,然后重新加入集群,都是出现同样问题。但其他两台机器做扩容操作时正常。