fe节点频繁宕机

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】fe节点频繁宕机
【背景】有很多定时操作,操作为insert select和TRUNCATE
【业务影响】导致数据未及时同步,补贴发放错乱
【是否存算分离】未分离
【StarRocks版本】2.5.13
【集群规模】例如:3fe(1 follower+2observer)+3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】1959533192@qq.com
【附件】fe.gc.log.20240627-093122 (230.9 KB) fe.gc.log.20240627-231355 (159.5 KB)

fe.warn.log (1).zip (13.9 MB)

2024-06-27 18:47:37,766 WARN (replayer|86) [BDBJournalCursor.wrapDatabaseException():89] failed to read after retried 2 times! key = 74987419, db = CloseSafeDatabase{db=74973698}
2024-06-27 18:47:41,767 WARN (replayer|86) [BDBJournalCursor.wrapDatabaseException():89] failed to read after retried 3 times! key = 74987419, db = CloseSafeDatabase{db=74973698}
2024-06-27 18:47:41,767 WARN (replayer|86) [GlobalStateMgr.replayJournalInner():1964] catch exception when replaying 74987419,
com.starrocks.journal.JournalException: failed to read after retried 3 times! key = 74987419, db = CloseSafeDatabase{db=74973698}
at com.starrocks.journal.bdbje.BDBJournalCursor.wrapDatabaseException(BDBJournalCursor.java:90) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJournalCursor.next(BDBJournalCursor.java:310) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:1945) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr$5.runOneCycle(GlobalStateMgr.java:1810) ~[starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr$5.run(GlobalStateMgr.java:1875) ~[starrocks-fe.jar:?]
Caused by: com.sleepycat.je.LockTimeoutException: (JE 7.3.7) Lock expired. Locker 970452405 -1_replayer_ReplicaThreadLocker: waited for lock on database=74973698 LockAddr:1645069336 LSN=0x1394/0x3a3da type=READ grant=WAIT_NEW timeoutMillis=1000 startTime=1719485260766 endTime=1719485261766
Owners: [<LockInfo locker=“865607584 -89689224_ReplayThread_ReplayTxn” type=“WRITE”/>]
Waiters: []

at com.sleepycat.je.txn.LockManager.makeTimeoutException(LockManager.java:1117) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.LockManager.waitForLock(LockManager.java:606) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.LockManager.lock(LockManager.java:345) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.BasicLocker.lockInternal(BasicLocker.java:124) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.ReplicaThreadLocker.lockInternal(ReplicaThreadLocker.java:63) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Locker.lock(Locker.java:499) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:3585) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:3316) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.CursorImpl.lockLNAndCheckDefunct(CursorImpl.java:2138) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1950) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Cursor.searchExact(Cursor.java:4194) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4055) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Cursor.search(Cursor.java:3857) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Cursor.getInternal(Cursor.java:1284) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.get(Database.java:1271) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.get(Database.java:1330) ~[je-7.3.7.jar:7.3.7]
at com.starrocks.journal.bdbje.CloseSafeDatabase.get(CloseSafeDatabase.java:47) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJournalCursor.next(BDBJournalCursor.java:276) ~[starrocks-fe.jar:?]
... 4 more