FE自己挂了

【详述】FE自己挂了
【背景】什么操作都没做
【业务影响】
【StarRocks版本】2.0.0-GA
【集群规模】例如:4fe(3 follower+1observer)+4be(fe与be混部)
【机器信息】16C/64G/万兆
【附件】
2022-02-25 17:11:19,420 INFO (replayer|72) [DatabaseTransactionMgr.replayUpsertTransactionState():1474] replay a committed transaction TransactionState. transaction id: 405262, label: insert_0af735fe-960c-11ec-a2fc-005056a36269, db id: 10002, table id list: 10125, callback id: -1, coordinator: FE: 172.16.12.237, transaction status: COMMITTED, error replicas num: 0, replica ids: , prepare time: 1645773900945, commit time: 1645773901025, finish time: -1, reason:
2022-02-25 17:11:19,420 INFO (replayer|72) [DatabaseTransactionMgr.replayUpsertTransactionState():1477] replay a visible transaction TransactionState. transaction id: 405262, label: insert_0af735fe-960c-11ec-a2fc-005056a36269, db id: 10002, table id list: 10125, callback id: -1, coordinator: FE: 172.16.12.237, transaction status: VISIBLE, error replicas num: 0, replica ids: , prepare time: 1645773900945, commit time: 1645773901025, finish time: 1645773901044, reason:
2022-02-25 17:11:19,421 INFO (replayer|72) [TxnStateCallbackFactory.addCallback():41] add callback of txn state : 146819. current callback size: 31
2022-02-25 17:11:19,421 INFO (replayer|72) [LoadManager.replayCreateLoadJob():124] LOAD_JOB=146819, msg={replay create load job}
2022-02-25 17:11:19,429 INFO (replayer|72) [CatalogRecycleBin.replayEraseTable():305] replay erase table[142632] finished
2022-02-25 17:11:19,436 INFO (replayer|72) [DatabaseTransactionMgr.replayUpsertTransactionState():1474] replay a committed transaction TransactionState. transaction id: 405263, label: 73f863f2-b8d2-43a6-88c9-151a48239b2a, db id: 14186, table id list: 15889, callback id: -1, coordinator: BE: 172.16.12.235, transaction status: COMMITTED, error replicas num: 0, replica ids: , prepare time: 1645777745505, commit time: 1645777745737, finish time: -1, reason: attactment: com.starrocks.load.loadv2.ManualLoadTxnCommitAttachment@445815ad
2022-02-25 17:11:19,436 INFO (replayer|72) [DatabaseTransactionMgr.replayUpsertTransactionState():1477] replay a visible transaction TransactionState. transaction id: 405263, label: 73f863f2-b8d2-43a6-88c9-151a48239b2a, db id: 14186, table id list: 15889, callback id: -1, coordinator: BE: 172.16.12.235, transaction status: VISIBLE, error replicas num: 0, replica ids: , prepare time: 1645777745505, commit time: 1645777745737, finish time: 1645777745756, reason: attactment: com.starrocks.load.loadv2.ManualLoadTxnCommitAttachment@2738c279
2022-02-25 17:11:19,437 INFO (replayer|72) [DatabaseTransactionMgr.replayUpsertTransactionState():1474] replay a committed transaction TransactionState. transaction id: 405264, label: ab1dd4c8-e6cd-4e04-893d-1987792aaec8, db id: 14186, table id list: 15889, callback id: -1, coordinator: BE: 172.16.12.238, transaction status: COMMITTED, error replicas num: 0, replica ids: , prepare time: 1645777746902, commit time: 1645777747071, finish time: -1, reason: attactment: com.starrocks.load.loadv2.ManualLoadTxnCommitAttachment@4d26a87f
2022-02-25 17:11:19,437 INFO (replayer|72) [DatabaseTransactionMgr.replayUpsertTransactionState():1477] replay a visible transaction TransactionState. transaction id: 405264, label: ab1dd4c8-e6cd-4e04-893d-1987792aaec8, db id: 14186, table id list: 15889, callback id: -1, coordinator: BE: 172.16.12.238, transaction status: VISIBLE, error replicas num: 0, replica ids: , prepare time: 1645777746902, commit time: 1645777747071, finish time: 1645777747090, reason: attactment: com.starrocks.load.loadv2.ManualLoadTxnCommitAttachment@78e304f1
2022-02-25 17:11:19,444 INFO (replayer|72) [Backend.handleHbResponse():674] Backend [id=10009, host=172.16.12.235, heartbeatPort=9050, alive=false] is dead,
2022-02-25 17:11:19,458 INFO (replayer|72) [Backend.handleHbResponse():674] Backend [id=10005, host=172.16.12.236, heartbeatPort=9050, alive=false] is dead,
2022-02-25 17:11:19,459 WARN (replayer|72) [Catalog.setCanRead():2304] meta out of date. current time: 1645780279459, synchronized time: 1645779533067, has log: true, fe type: UNKNOWN
2022-02-25 17:11:21,191 INFO (UNKNOWN 172.16.12.237_9010_1640860077003(-1)|1) [Catalog.waitForReady():874] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2022-02-25 17:11:21,218 WARN (UNKNOWN 172.16.12.237_9010_1640860077003(5)|59) [Catalog.notifyNewFETypeTransfer():2333] notify new FE type transfer: FOLLOWER
2022-02-25 17:11:21,218 INFO (stateListener|71) [Catalog$4.runOneCycle():2356] begin to transfer FE type from UNKNOWN to FOLLOWER
2022-02-25 17:11:21,219 INFO (stateListener|71) [Catalog$4.runOneCycle():2442] finished to transfer FE type to FOLLOWER
2022-02-25 17:11:21,235 WARN (REPLICA 172.16.12.237_9010_1640860077003(5)|59) [BDBStateChangeListener.stateChange():61] this node is DETACHED
2022-02-25 17:11:23,192 INFO (UNKNOWN 172.16.12.237_9010_1640860077003(-1)|1) [Catalog.waitForReady():874] wait catalog to be ready. FE type: FOLLOWER. is ready: false
2022-02-25 17:11:24,462 ERROR (replayer|72) [Catalog$3.runOneCycle():2263] catch insufficient log exception. please restart.
com.sleepycat.je.rep.InsufficientLogException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 7.3.7) 172.16.12.237_9010_1640860077003(5):/data2/doris-meta/bdb INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA 172.16.12.237_9010_1640860077003(5) Originally thrown by HA thread: REPLICA 172.16.12.237_9010_1640860077003(5)refreshVLSN=10,560,284 logProviders=[Node:172.16.12.238_9010_1640860163170 172.16.12.238:9010 (is member) SECONDARY changeVersion:-1 LocalCBVLSN:10,666,341 at:星期五 二月 25 17:11:18 CST 2022 jeVersion:7.3.7
, Node:172.16.12.235_9010_1634795109117 172.16.12.235:9010 (is member) changeVersion:1 LocalCBVLSN:10,666,343 at:星期五 二月 25 17:00:00 CST 2022 jeVersion:7.3.7
, Node:172.16.12.236_9010_1634796649019 172.16.12.236:9010 (is member) changeVersion:2 LocalCBVLSN:10,655,340 at:星期五 二月 25 16:59:58 CST 2022 jeVersion:7.3.7
, Node:172.16.12.237_9010_1640860077003 172.16.12.237:9010 (is member) changeVersion:7 LocalCBVLSN:10,655,293 at:星期五 二月 25 13:01:28 CST 2022 jeVersion:7.3.7
] repImpl=com.sleepycat.je.rep.impl.RepImpl@4755f0f5 props={GROUP_NAME=PALO_JOURNAL_GROUP, REFRESH_VLSN=10560284, NODE_NAME=172.16.12.237_9010_1640860077003, HOSTNAME=172.16.12.237, P_NODETYPE3=ELECTABLE, P_NODETYPE2=ELECTABLE, P_NODETYPE1=ELECTABLE, P_NODENAME3=172.16.12.237_9010_1640860077003, P_NODETYPE0=SECONDARY, P_HOSTNAME3=172.16.12.237, P_NODENAME2=172.16.12.236_9010_1634796649019, P_HOSTNAME2=172.16.12.236, P_NODENAME1=172.16.12.235_9010_1634795109117, P_HOSTNAME1=172.16.12.235, P_NODENAME0=172.16.12.238_9010_1640860163170, PORT=9010, P_HOSTNAME0=172.16.12.238, P_NUMPROVIDERS=4, P_PORT3=9010, ENV_DIR=/data2/doris-meta/bdb, P_PORT2=9010, P_PORT1=9010, P_PORT0=9010}
at com.sleepycat.je.rep.InsufficientLogException.wrapSelf(InsufficientLogException.java:315) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1775) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Environment.checkOpen(Environment.java:2473) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Environment.getDatabaseNames(Environment.java:2245) ~[je-7.3.7.jar:7.3.7]
at com.starrocks.journal.bdbje.BDBEnvironment.getDatabaseNames(BDBEnvironment.java:339) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:209) ~[starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.getMaxJournalId(EditLog.java:98) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.getMaxJournalId(Catalog.java:5187) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.replayJournal(Catalog.java:2453) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog$3.runOneCycle(Catalog.java:2258) [starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:119) [starrocks-fe.jar:?]
Caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 7.3.7) 172.16.12.237_9010_1640860077003(5):/data2/doris-meta/bdb INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA 172.16.12.237_9010_1640860077003(5) Originally thrown by HA thread: REPLICA 172.16.12.237_9010_1640860077003(5)
at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupLogRefresh(ReplicaFeederSyncup.java:664) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:314) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:157) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:711) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:474) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:409) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1873) ~[je-7.3.7.jar:7.3.7]

2022-02-25 17:11:19,459 WARN (replayer|72) [Catalog.setCanRead():2304] meta out of date. current time: 1645780279459, synchronized time: 1645779533067, has log: true, fe type: UNKNOWN

可能最关键是这个报错,那么为什么meta out of date呢?

可能图片看起来清楚些

集群现在能够启动吗?如果不能的话推荐你升级到2.0.1先试试看。

解决了吗,我也遇到这样问题,集群节点重新部署 和你一样的问题,无法启动,当前是2.0.1

2.0.1 也有这个问题的