FE只能启动一台

【详述】3FE通过元数据修复后,主FE可以启动,启动另外两个flowller都会挂掉,报错信息如下:
【背景】未做什么操作,发现21号早上6点停止
【业务影响】
【StarRocks版本】2.3.0
【集群规模】例如:3fe(3 follower)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:32C/128G/千兆
2022-12-21 18:13:22,209 ERROR (heartbeat mgr|45) [BDBJEJournal.write():162] catch an exception when writing to database. sleep and retry. journal id 7381492
com.sleepycat.je.rep.InsufficientReplicasException: (JE 7.3.7) Commit policy: SIMPLE_MAJORITY required 1 replica. But none were active with this master.
at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureReplicasForCommit(DurabilityQuorum.java:116) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.txnBeginHook(RepImpl.java:1161) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.txnBeginHook(MasterTxn.java:193) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.initTxn(Txn.java:377) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.(Txn.java:286) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.(Txn.java:265) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.(MasterTxn.java:144) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn$1.create(MasterTxn.java:115) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.create(MasterTxn.java:433) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.createRepUserTxn(RepImpl.java:1135) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.createAutoTxn(Txn.java:332) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.LockerFactory.getWritableLocker(LockerFactory.java:79) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.LockerFactory.getWritableLocker(LockerFactory.java:40) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1493) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
at com.starrocks.journal.bdbje.CloseSafeDatabase.put(CloseSafeDatabase.java:28) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:155) [starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.logEdit(EditLog.java:915) [starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.logHeartbeat(EditLog.java:1343) [starrocks-fe.jar:?]
at com.starrocks.system.HeartbeatMgr.runAfterCatalogReady(HeartbeatMgr.java:162) [starrocks-fe.jar:?]
at com.starrocks.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:61) [starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) [starrocks-fe.jar:?]
2022-12-21 18:13:37,215 ERROR (heartbeat mgr|45) [BDBJEJournal.write():162] catch an exception when writing to database. sleep and retry. journal id 7381492
com.sleepycat.je.rep.InsufficientReplicasException: (JE 7.3.7) Commit policy: SIMPLE_MAJORITY required 1 replica. But none were active with this master.
at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureReplicasForCommit(DurabilityQuorum.java:116) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.txnBeginHook(RepImpl.java:1161) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.txnBeginHook(MasterTxn.java:193) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.initTxn(Txn.java:377) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.(Txn.java:286) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.(Txn.java:265) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.(MasterTxn.java:144) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn$1.create(MasterTxn.java:115) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.create(MasterTxn.java:433) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.createRepUserTxn(RepImpl.java:1135) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.createAutoTxn(Txn.java:332) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.LockerFactory.getWritableLocker(LockerFactory.java:79) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.LockerFactory.getWritableLocker(LockerFactory.java:40) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1493) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
at com.starrocks.journal.bdbje.CloseSafeDatabase.put(CloseSafeDatabase.java:28) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:155) [starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.logEdit(EditLog.java:915) [starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.logHeartbeat(EditLog.java:1343) [starrocks-fe.jar:?]
at com.starrocks.system.HeartbeatMgr.runAfterCatalogReady(HeartbeatMgr.java:162) [starrocks-fe.jar:?]
at com.starrocks.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:61) [starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) [starrocks-fe.jar:?]
2022-12-21 18:13:42,218 ERROR (heartbeat mgr|45) [BDBJEJournal.write():189] write bdb failed. will exit. journalId: 7381492, bdb database Name: 7381258

是通过recovery恢复的主吗?如果是的话需要将其他两个fe drop掉,然后重新add进集群,一般是不建议采用recovery进行元数据操作的

您好,那这种FE找不到主节点的方式采用什么方式

你是不是,三台都recovery了?正常操作,应该是只recovery 一台,其它正常启动

是的,不知道什么原因就不能选出主节点了

那你操作方法不对,应该是只recovery一台,其它加进去就行

三台都Recovery了,就会出这个问题

把不正常那两台清空再加进去就行

profile-pipeline_engine=false.txt|attachment (16.1 KB) profile-pipeline_engine=true.txt|attachment (16.1 KB)