3.1.4版本Flink导入数据FE重启异常

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】FE报错重启,且无法正常启动
【背景】升级3.1.4版本后,使用Flink Connector导入大量实时数据后
【业务影响】FE无法启动,集群暂不可用
【StarRocks版本】3.1.4
【集群规模】1fe(1 leader)+3be(fe与be独立部署),on K8S,存算分离模式
【机器信息】FE:虚拟机4C/8GB/万兆,BE:16C/48GB/万兆
【联系方式】倪程伟(nichengwei120@163.com)
【关键日志】

2023-11-04 18:01:15,199 INFO (nioEventLoopGroup-4-21|328) [TransactionLoadAction.executeTransaction():298] redirect transaction action to destination=TNetworkAddress(hostname:dragonstarrocksbe-0.dragonstarrocksbe.dragonstarrocksservices.svc.cluster.local, port:8040), db: dove_fleet_db, table: ods_market_price, op: begin, label: flink-e5a3bc13-44a1-4948-8b02-a858c090045a
2023-11-04 18:01:15,199 WARN (leaderCheckpointer|114) [GlobalStateMgr.replayJournalInner():2304] catch exception when replaying 779,
com.starrocks.journal.JournalInconsistentException: failed to load journal type 12110
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:1089) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2293) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:2245) ~[starrocks-fe.jar:?]
        at com.starrocks.leader.Checkpoint.replayAndGenerateGlobalStateMgrImage(Checkpoint.java:215) ~[starrocks-fe.jar:?]
        at com.starrocks.leader.Checkpoint.runAfterCatalogReady(Checkpoint.java:106) ~[starrocks-fe.jar:?]
        at com.starrocks.common.util.FrontendDaemon.runOneCycle(FrontendDaemon.java:72) ~[starrocks-fe.jar:?]
        at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]
Caused by: java.lang.NullPointerException
        at com.starrocks.transaction.LakeTableTxnLogApplier.applyCommitLog(LakeTableTxnLogApplier.java:43) ~[starrocks-fe.jar:?]
        at com.starrocks.transaction.DatabaseTransactionMgr.updateCatalogAfterCommitted(DatabaseTransactionMgr.java:1449) ~[starrocks-fe.jar:?]
        at com.starrocks.transaction.DatabaseTransactionMgr.replayUpsertTransactionState(DatabaseTransactionMgr.java:1553) ~[starrocks-fe.jar:?]
        at com.starrocks.transaction.GlobalTransactionMgr.replayUpsertTransactionState(GlobalTransactionMgr.java:704) ~[starrocks-fe.jar:?]
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:601) ~[starrocks-fe.jar:?]
        ... 6 more
2023-11-04 18:01:15,200 WARN (leaderCheckpointer|114) [GlobalStateMgr.replayJournal():2247] got interrupt exception or inconsistent exception when replay journal 779, will exit,
com.starrocks.journal.JournalInconsistentException: failed to load journal type 12110
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:1089) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2293) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:2245) ~[starrocks-fe.jar:?]
        at com.starrocks.leader.Checkpoint.replayAndGenerateGlobalStateMgrImage(Checkpoint.java:215) ~[starrocks-fe.jar:?]
        at com.starrocks.leader.Checkpoint.runAfterCatalogReady(Checkpoint.java:106) ~[starrocks-fe.jar:?]
        at com.starrocks.common.util.FrontendDaemon.runOneCycle(FrontendDaemon.java:72) ~[starrocks-fe.jar:?]
        at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]
Caused by: java.lang.NullPointerException
        at com.starrocks.transaction.LakeTableTxnLogApplier.applyCommitLog(LakeTableTxnLogApplier.java:43) ~[starrocks-fe.jar:?]
        at com.starrocks.transaction.DatabaseTransactionMgr.updateCatalogAfterCommitted(DatabaseTransactionMgr.java:1449) ~[starrocks-fe.jar:?]
        at com.starrocks.transaction.DatabaseTransactionMgr.replayUpsertTransactionState(DatabaseTransactionMgr.java:1553) ~[starrocks-fe.jar:?]
        at com.starrocks.transaction.GlobalTransactionMgr.replayUpsertTransactionState(GlobalTransactionMgr.java:704) ~[starrocks-fe.jar:?]
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:601) ~[starrocks-fe.jar:?]
        ... 6 more

【附件】
fe.log.gz (5.1 MB)
meta.tar.gz (5.1 MB)

1赞

fe.conf配置metadata_journal_skip_bad_journal_ids=xxx,xxx为replay journal 779, will ext中的779

catch exception when replaying 779 应该是779

这里还不能这样配置,我给你发个临时包吧
starrocks-fe.jar (18.8 MB)
用这个包替换了 fe/lib/starrocks-fe.jar,然后重启

已替换,重启后观察看看

这个jar替换后还是会有一样的错误

skip掉一个后面还持续有异常的ID,有没有办法一次性跳过所有的?