[BDBJEJournal.write():190] write bdb failed. will exit. journalId: 15733716, bdb database Name: 15703717

版本:2.2.1
环境:3FE 7BE
每台都是16C64G
问题:Master每偶数小数规律性down掉(2,4,6,8,10…)
相关日志:

2022-11-02 22:04:28,659 WARN (heartbeat-mgr-pool-0|221) [Util.getResultForUrl():337] failed to get result from url: http://172.27.14.65:8030/api/bootstrap?cluster_id=760245192&token=113fbc06-19a0-4ef0-8cf8-47ccaf02b5f9. Read timed out
2022-11-02 22:04:28,659 WARN (heartbeat mgr|33) [HeartbeatMgr.runAfterCatalogReady():142] get bad heartbeat response: type: FRONTEND, status: BAD, msg: got exception, name: 172.27.14.65_9010_1655952979062, queryPort: 0, rpcPort: 0, replayedJournalId: 0, feStartTime: \N, feVersion: null
2022-11-02 22:04:32,279 ERROR (thrift-server-pool-892|2922) [BDBJEJournal.write():162] catch an exception when writing to database. sleep and retry. journal id 15733716
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -17238310  VLSN: 32,984,930, initiated at: 22:04:22.  Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 10000ms. FeederState=172.27.14.66_9010_1655953085403(2)[MASTER]
Current feeds:
 172.27.14.65_9010_1655952979062: feederVLSN=32,984,932 replicaTxnEndVLSN=32,984,928

	at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
	at com.starrocks.journal.bdbje.CloseSafeDatabase.put(CloseSafeDatabase.java:28) ~[starrocks-fe.jar:?]
	at com.starrocks.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:155) [starrocks-fe.jar:?]
	at com.starrocks.persist.EditLog.logEdit(EditLog.java:860) [starrocks-fe.jar:?]
	at com.starrocks.persist.EditLog.logAddReplica(EditLog.java:1054) [starrocks-fe.jar:?]
	at com.starrocks.clone.TabletSchedCtx.unprotectedFinishClone(TabletSchedCtx.java:985) [starrocks-fe.jar:?]
	at com.starrocks.clone.TabletSchedCtx.finishCloneTask(TabletSchedCtx.java:887) [starrocks-fe.jar:?]
	at com.starrocks.clone.TabletScheduler.finishCloneTask(TabletScheduler.java:1260) [starrocks-fe.jar:?]
	at com.starrocks.master.MasterImpl.finishClone(MasterImpl.java:811) [starrocks-fe.jar:?]
	at com.starrocks.master.MasterImpl.finishTask(MasterImpl.java:250) [starrocks-fe.jar:?]
	at com.starrocks.service.FrontendServiceImpl.finishTask(FrontendServiceImpl.java:546) [starrocks-fe.jar:?]
	at com.starrocks.thrift.FrontendService$Processor$finishTask.getResult(FrontendService.java:1851) [starrocks-fe.jar:?]
	at com.starrocks.thrift.FrontendService$Processor$finishTask.getResult(FrontendService.java:1831) [starrocks-fe.jar:?]
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.13.0.jar:0.13.0]
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) [libthrift-0.13.0.jar:0.13.0]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) [libthrift-0.13.0.jar:0.13.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2022-11-02 22:04:47,282 ERROR (thrift-server-pool-892|2922) [BDBJEJournal.write():162] catch an exception when writing to database. sleep and retry. journal id 15733716
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -17238312  VLSN: 32,984,933, initiated at: 22:04:37.  Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 10000ms. FeederState=172.27.14.66_9010_1655953085403(2)[MASTER]
Current feeds:
 172.27.14.65_9010_1655952979062: feederVLSN=32,984,934 replicaTxnEndVLSN=32,984,928

	at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
	at com.starrocks.journal.bdbje.CloseSafeDatabase.put(CloseSafeDatabase.java:28) ~[starrocks-fe.jar:?]
	at com.starrocks.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:155) [starrocks-fe.jar:?]
	at com.starrocks.persist.EditLog.logEdit(EditLog.java:860) [starrocks-fe.jar:?]
	at com.starrocks.persist.EditLog.logAddReplica(EditLog.java:1054) [starrocks-fe.jar:?]
	at com.starrocks.clone.TabletSchedCtx.unprotectedFinishClone(TabletSchedCtx.java:985) [starrocks-fe.jar:?]
	at com.starrocks.clone.TabletSchedCtx.finishCloneTask(TabletSchedCtx.java:887) [starrocks-fe.jar:?]
	at com.starrocks.clone.TabletScheduler.finishCloneTask(TabletScheduler.java:1260) [starrocks-fe.jar:?]
	at com.starrocks.master.MasterImpl.finishClone(MasterImpl.java:811) [starrocks-fe.jar:?]
	at com.starrocks.master.MasterImpl.finishTask(MasterImpl.java:250) [starrocks-fe.jar:?]
	at com.starrocks.service.FrontendServiceImpl.finishTask(FrontendServiceImpl.java:546) [starrocks-fe.jar:?]
	at com.starrocks.thrift.FrontendService$Processor$finishTask.getResult(FrontendService.java:1851) [starrocks-fe.jar:?]
	at com.starrocks.thrift.FrontendService$Processor$finishTask.getResult(FrontendService.java:1831) [starrocks-fe.jar:?]
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.13.0.jar:0.13.0]
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) [libthrift-0.13.0.jar:0.13.0]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) [libthrift-0.13.0.jar:0.13.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2022-11-02 22:05:02,285 ERROR (thrift-server-pool-892|2922) [BDBJEJournal.write():162] catch an exception when writing to database. sleep and retry. journal id 15733716
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -17238313  VLSN: 32,984,935, initiated at: 22:04:52.  Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 10000ms. FeederState=172.27.14.66_9010_1655953085403(2)[MASTER]
Current feeds:

	at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
	at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
	at com.starrocks.journal.bdbje.CloseSafeDatabase.put(CloseSafeDatabase.java:28) ~[starrocks-fe.jar:?]
	at com.starrocks.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:155) [starrocks-fe.jar:?]
	at com.starrocks.persist.EditLog.logEdit(EditLog.java:860) [starrocks-fe.jar:?]
	at com.starrocks.persist.EditLog.logAddReplica(EditLog.java:1054) [starrocks-fe.jar:?]
	at com.starrocks.clone.TabletSchedCtx.unprotectedFinishClone(TabletSchedCtx.java:985) [starrocks-fe.jar:?]
	at com.starrocks.clone.TabletSchedCtx.finishCloneTask(TabletSchedCtx.java:887) [starrocks-fe.jar:?]
	at com.starrocks.clone.TabletScheduler.finishCloneTask(TabletScheduler.java:1260) [starrocks-fe.jar:?]
	at com.starrocks.master.MasterImpl.finishClone(MasterImpl.java:811) [starrocks-fe.jar:?]
	at com.starrocks.master.MasterImpl.finishTask(MasterImpl.java:250) [starrocks-fe.jar:?]
	at com.starrocks.service.FrontendServiceImpl.finishTask(FrontendServiceImpl.java:546) [starrocks-fe.jar:?]
	at com.starrocks.thrift.FrontendService$Processor$finishTask.getResult(FrontendService.java:1851) [starrocks-fe.jar:?]
	at com.starrocks.thrift.FrontendService$Processor$finishTask.getResult(FrontendService.java:1831) [starrocks-fe.jar:?]
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.13.0.jar:0.13.0]
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) [libthrift-0.13.0.jar:0.13.0]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) [libthrift-0.13.0.jar:0.13.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2022-11-02 22:05:07,286 ERROR (thrift-server-pool-892|2922) [BDBJEJournal.write():190] write bdb failed. will exit. journalId: 15733716, bdb database Name: 15703717
2022-11-02 22:06:44,394 WARN (UNKNOWN 172.27.14.66_9010_1655953085403(-1)|1) [Catalog.notifyNewFETypeTransfer():2396] notify new FE type transfer: UNKNOWN
2022-11-02 22:06:44,491 WARN (RepNode 172.27.14.66_9010_1655953085403(-1)|60) [Catalog.notifyNewFETypeTransfer():2396] notify new FE type transfer: MASTER

fe.warn.log (31.5 MB)

这个问题解决了么