Fe节点经常挂掉,报错slow db lock,读写锁

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】FE节点频繁挂掉,经常报错数据库读写锁
【背景】数据采用datax小时级同步,和inser into select方式写入下一层表
【业务影响】集群无法访问
【是否存算分离】是,但几乎没有使用存算分离表
【StarRocks版本】3.1.5
【集群规模】例如:2fe(1 follower)+ 3be(fe与be混部)
【机器信息】12C/48G
【联系方式】社区群14-SHH qinghaoshang@163.com
【附件】


image
image

完整的日志文本发出来,不要截图

db lock不会导致fe挂掉

2024-02-14 05:11:44,373 WARN (thrift-server-pool-423840|483643) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 3384254: get database write lock timeout, database=db_ods, timeoutMillis=15000
2024-02-14 05:11:51,065 ERROR (JournalWriter|113) [BDBJEJournal.batchWriteCommit():422] failed to commit journal after retried 2 times! txn[] db[CloseSafeDatabase{db=32117065}]
com.sleepycat.je.rep.InsufficientAcksException: (JE 18.3.16) Transaction: -43035017 VLSN: 73,813,252, initiated at: 05:11:40. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 10000ms. FeederState=192.168.0.16_9010_1691045936256(1)[MASTER]
Current feeds:
192.168.0.14_9010_1691046083233: feederVLSN=73,813,253 replicaTxnEndVLSN=73,813,247

    at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:188) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1444) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1403) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:228) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.txn.Txn.commit(Txn.java:778) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.txn.Txn.commit(Txn.java:631) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.Transaction.commit(Transaction.java:337) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.starrocks.journal.bdbje.BDBJEJournal.batchWriteCommit(BDBJEJournal.java:416) ~[starrocks-fe.jar:?]
    at com.starrocks.journal.JournalWriter.writeOneBatch(JournalWriter.java:127) ~[starrocks-fe.jar:?]
    at com.starrocks.journal.JournalWriter$1.runOneCycle(JournalWriter.java:87) ~[starrocks-fe.jar:?]
    at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]

2024-02-14 05:11:56,066 WARN (JournalWriter|113) [BDBJEJournal.rebuildCurrentTransaction():444] transaction is invalid, rebuild the txn with 1 kvs
2024-02-14 05:12:00,126 WARN (thrift-server-pool-423856|483659) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 127, owner name: COMPACTION_DISPATCH, owner stack: dump thread: COMPACTION_DISPATCH, id: 127
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
com.starrocks.journal.JournalTask.get(JournalTask.java:78)
com.starrocks.journal.JournalTask.get(JournalTask.java:27)
com.starrocks.persist.EditLog.waitInfinity(EditLog.java:1166)
com.starrocks.persist.EditLog.logEdit(EditLog.java:1102)
com.starrocks.persist.EditLog.logJsonObject(EditLog.java:2035)
com.starrocks.persist.EditLog.logInsertTransactionState(EditLog.java:1588)
com.starrocks.transaction.DatabaseTransactionMgr.unprotectUpsertTransactionState(DatabaseTransactionMgr.java:1107)
com.starrocks.transaction.DatabaseTransactionMgr.unprotectedCommitTransaction(DatabaseTransactionMgr.java:1018)
com.starrocks.transaction.DatabaseTransactionMgr.commitTransaction(DatabaseTransactionMgr.java:437)
com.starrocks.transaction.GlobalTransactionMgr.commitTransaction(GlobalTransactionMgr.java:408)
com.starrocks.transaction.GlobalTransactionMgr.commitTransaction(GlobalTransactionMgr.java:384)
com.starrocks.lake.compaction.CompactionScheduler.commitCompaction(CompactionScheduler.java:392)
com.starrocks.lake.compaction.CompactionScheduler.schedule(CompactionScheduler.java:139)
com.starrocks.lake.compaction.CompactionScheduler.runOneCycle(CompactionScheduler.java:104)
com.starrocks.common.util.Daemon.run(Daemon.java:115)

2024-02-14 05:12:00,126 WARN (thrift-server-pool-423856|483659) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 3384247: get database write lock timeout, database=db_ods, timeoutMillis=15000
2024-02-14 05:12:06,147 ERROR (JournalWriter|113) [BDBJEJournal.batchWriteCommit():422] failed to commit journal after retried 3 times! txn[] db[CloseSafeDatabase{db=32117065}]
com.sleepycat.je.rep.InsufficientAcksException: (JE 18.3.16) Transaction: -43035020 VLSN: 73,813,254, initiated at: 05:11:56. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 10000ms. FeederState=192.168.0.16_9010_1691045936256(1)[MASTER]
Current feeds:
192.168.0.14_9010_1691046083233: feederVLSN=73,813,255 replicaTxnEndVLSN=73,813,250

    at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:188) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1444) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1403) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:228) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.txn.Txn.commit(Txn.java:778) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.txn.Txn.commit(Txn.java:631) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.Transaction.commit(Transaction.java:337) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.starrocks.journal.bdbje.BDBJEJournal.batchWriteCommit(BDBJEJournal.java:416) ~[starrocks-fe.jar:?]
    at com.starrocks.journal.JournalWriter.writeOneBatch(JournalWriter.java:127) ~[starrocks-fe.jar:?]
    at com.starrocks.journal.JournalWriter$1.runOneCycle(JournalWriter.java:87) ~[starrocks-fe.jar:?]
    at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]

2024-02-14 05:12:06,147 WARN (JournalWriter|113) [JournalWriter.writeOneBatch():133] failed to commit batch, will abort current 1 journals.
com.starrocks.journal.JournalException: failed to commit journal after retried 3 times! txn[] db[CloseSafeDatabase{db=32117065}]
at com.starrocks.journal.bdbje.BDBJEJournal.batchWriteCommit(BDBJEJournal.java:423) ~[starrocks-fe.jar:?]
at com.starrocks.journal.JournalWriter.writeOneBatch(JournalWriter.java:127) ~[starrocks-fe.jar:?]
at com.starrocks.journal.JournalWriter$1.runOneCycle(JournalWriter.java:87) ~[starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]
Caused by: com.sleepycat.je.rep.InsufficientAcksException: (JE 18.3.16) Transaction: -43035020 VLSN: 73,813,254, initiated at: 05:11:56. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 10000ms. FeederState=192.168.0.16_9010_1691045936256(1)[MASTER]
Current feeds:
192.168.0.14_9010_1691046083233: feederVLSN=73,813,255 replicaTxnEndVLSN=73,813,250

    at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:188) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1444) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1403) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:228) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.txn.Txn.commit(Txn.java:778) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.txn.Txn.commit(Txn.java:631) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.sleepycat.je.Transaction.commit(Transaction.java:337) ~[starrocks-bdb-je-18.3.16.jar:?]
    at com.starrocks.journal.bdbje.BDBJEJournal.batchWriteCommit(BDBJEJournal.java:416) ~[starrocks-fe.jar:?]
    ... 3 more

2024-02-14 05:12:06,148 WARN (JournalWriter|113) [BDBJEJournal.batchWriteAbort():480] failed to abort transaction because no running transaction, will just ignore and return.
2024-02-14 05:12:06,148 ERROR (JournalWriter|113) [JournalWriter.abortJournalTask():176] failed to commit journal after retried 3 times! txn[] db[CloseSafeDatabase{db=32117065}]
2024-02-14 16:47:35,918 WARN (UNKNOWN 192.168.0.16_9010_1691045936256(-1)|1) [StateChangeExecutor.notifyNewFETypeTransfer():62] notify new FE type transfer: UNKNOWN
2024-02-14 16:47:35,979 WARN (RepNode 192.168.0.16_9010_1691045936256(-1)|61) [StateChangeExecutor.notifyNewFETypeTransfer():62] notify new FE type transfer: LEADER

这个是最近一次挂掉的日志 之前也有挂过很多次 我看日志里报了很多db锁 以为锁的原因

这里是日志更早的一部分

2024-02-14 04:40:00,029 ERROR (autovacuum-pool1-t1|175) [AutovacuumDaemon.vacuumPartitionImpl():194] failed to vacuum db_ods.amazon_order_info.p202312: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:3600000(MILLISECONDS) correlationId:12786253 timeout with bound channel =>[id: 0x5a138a24, L:/192.168.0.16:55522 - R:/192.168.0.14:8060]
2024-02-14 04:40:42,729 ERROR (autovacuum-pool1-t7|181) [AutovacuumDaemon.vacuumPartitionImpl():194] failed to vacuum db_ods.amazon_order_info.p202208: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:3600000(MILLISECONDS) correlationId:12786265 timeout with bound channel =>[id: 0xbe97fb1f, L:/192.168.0.16:54996 - R:/192.168.0.14:8060]
2024-02-14 04:41:42,229 ERROR (autovacuum-pool1-t4|178) [AutovacuumDaemon.vacuumPartitionImpl():194] failed to vacuum db_ods.amazon_order_info.p202209: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:3600000(MILLISECONDS) correlationId:12786274 timeout with bound channel =>[id: 0x2195a517, L:/192.168.0.16:54994 - R:/192.168.0.14:8060]
2024-02-14 04:42:15,129 ERROR (autovacuum-pool1-t5|179) [AutovacuumDaemon.vacuumPartitionImpl():194] failed to vacuum db_ods.amazon_order_info.p202211: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:3600000(MILLISECONDS) correlationId:12786277 timeout with bound channel =>[id: 0x8b91c37b, L:/192.168.0.16:54990 - R:/192.168.0.14:8060]
2024-02-14 04:43:53,629 ERROR (autovacuum-pool1-t2|176) [AutovacuumDaemon.vacuumPartitionImpl():194] failed to vacuum db_ods.amazon_order_info.p202310: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:3600000(MILLISECONDS) correlationId:12786280 timeout with bound channel =>[id: 0xe108f245, L:/192.168.0.16:55752 - R:/192.168.0.14:8060]
2024-02-14 04:51:40,129 ERROR (autovacuum-pool1-t8|182) [AutovacuumDaemon.vacuumPartitionImpl():194] failed to vacuum db_ods.amazon_order_info.p202307: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:3600000(MILLISECONDS) correlationId:12786283 timeout with bound channel =>[id: 0xbe97fb1f, L:/192.168.0.16:54996 - R:/192.168.0.14:8060]
2024-02-14 04:51:48,429 ERROR (autovacuum-pool1-t3|177) [AutovacuumDaemon.vacuumPartitionImpl():194] failed to vacuum db_ods.amazon_order_info.p202210: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:3600000(MILLISECONDS) correlationId:12786286 timeout with bound channel =>[id: 0x03f6fd94, L:/192.168.0.16:55754 - R:/192.168.0.14:8060]
2024-02-14 04:52:09,229 ERROR (autovacuum-pool1-t6|180) [AutovacuumDaemon.vacuumPartitionImpl():194] failed to vacuum db_ods.amazon_order_info.p202305: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:3600000(MILLISECONDS) correlationId:12786289 timeout with bound channel =>[id: 0x2bda0143, L:/192.168.0.16:54992 - R:/192.168.0.14:8060]
2024-02-14 05:02:48,993 WARN (grpc-default-executor-901|477503) [ShardManager.validateWorkerReportedReplicas():1100] shard [2045020, 2045023] not exist or have outdated info when update shard info from worker heartbeat, schedule remove from worker 3003.
2024-02-14 05:02:48,994 WARN (pool-27-thread-2|316) [ShardManager.updateShardReplicaInfoInternal():1070] shard [2045020, 2045023] not exist when update shard info from shard scheduler!
2024-02-14 05:02:49,389 WARN (grpc-default-executor-901|477503) [ShardManager.validateWorkerReportedReplicas():1100] shard [2045018, 2045021, 2045025, 2045030] not exist or have outdated info when update shard info from worker heartbeat, schedule remove from worker 3002.
2024-02-14 05:02:49,389 WARN (pool-27-thread-1|315) [ShardManager.updateShardReplicaInfoInternal():1070] shard [2045018, 2045021, 2045025, 2045030] not exist when update shard info from shard scheduler!
2024-02-14 05:02:52,495 WARN (grpc-default-executor-901|477503) [ShardManager.validateWorkerReportedReplicas():1100] shard [2045019, 2045022, 2045026, 2045028, 2045032, 2045033, 2045036, 2045040, 2045044, 2045047, 2045051, 2045054, 2045057, 2045061, 2045064, 2045068, 2045070, 2045076, 2045080, 2045083, 2045085, 2045091, 2045093] not exist or have outdated info when update shard info from worker heartbeat, schedule remove from worker 3001.
2024-02-14 05:02:52,496 WARN (pool-27-thread-2|316) [ShardManager.updateShardReplicaInfoInternal():1070] shard [2045019, 2045022, 2045026, 2045028, 2045032, 2045033, 2045036, 2045040, 2045044, 2045047, 2045051, 2045054, 2045057, 2045061, 2045064, 2045068, 2045070, 2045076, 2045080, 2045083, 2045085, 2045091, 2045093] not exist when update shard info from shard scheduler!
2024-02-14 05:02:59,010 WARN (grpc-default-executor-901|477503) [ShardManager.validateWorkerReportedReplicas():1100] shard [2045027, 2045029, 2045034, 2045037, 2045041, 2045043, 2045048, 2045049, 2045055, 2045058, 2045062, 2045065, 2045071, 2045072, 2045077, 2045078, 2045082, 2045086, 2045089, 2045092, 2045097, 2045100, 2045104, 2045107, 2045111, 2045114] not exist or have outdated info when update shard info from worker heartbeat, schedule remove from worker 3003.
2024-02-14 05:02:59,011 WARN (pool-27-thread-1|315) [ShardManager.updateShardReplicaInfoInternal():1070] shard [2045027, 2045029, 2045034, 2045037, 2045041, 2045043, 2045048, 2045049, 2045055, 2045058, 2045062, 2045065, 2045071, 2045072, 2045077, 2045078, 2045082, 2045086, 2045089, 2045092, 2045097, 2045100, 2045104, 2045107, 2045111, 2045114] not exist when update shard info from shard scheduler!
2024-02-14 05:02:59,405 WARN (grpc-default-executor-901|477503) [ShardManager.validateWorkerReportedReplicas():1100] shard [2045035, 2045039, 2045042, 2045046, 2045050, 2045053, 2045056, 2045060, 2045063, 2045067, 2045069, 2045075, 2045079, 2045081, 2045084, 2045088, 2045090, 2045095, 2045098, 2045102, 2045105, 2045109, 2045112] not exist or have outdated info when update shard info from worker heartbeat, schedule remove from worker 3002.
2024-02-14 05:02:59,406 WARN (pool-27-thread-2|316) [ShardManager.updateShardReplicaInfoInternal():1070] shard [2045035, 2045039, 2045042, 2045046, 2045050, 2045053, 2045056, 2045060, 2045063, 2045067, 2045069, 2045075, 2045079, 2045081, 2045084, 2045088, 2045090, 2045095, 2045098, 2045102, 2045105, 2045109, 2045112] not exist when update shard info from shard scheduler!
2024-02-14 05:03:02,511 WARN (grpc-default-executor-901|477503) [ShardManager.validateWorkerReportedReplicas():1100] shard [2045096, 2045099, 2045103, 2045106, 2045110, 2045113] not exist or have outdated info when update shard info from worker heartbeat, schedule remove from worker 3001.
2024-02-14 05:03:02,512 WARN (pool-27-thread-1|315) [ShardManager.updateShardReplicaInfoInternal():1070] shard [2045096, 2045099, 2045103, 2045106, 2045110, 2045113] not exist when update shard info from shard scheduler!
2024-02-14 05:05:16,384 WARN (starrocks-mysql-nio-pool-17817|483323) [Database.logSlowLockEventIfNeeded():168] slow db lock. type: tryWriteLock, db id: 11147, db name: db_dwd, wait time: 4036ms, former owner id: 483174, owner name: starrocks-mysql-nio-pool-17816, owner stack: dump thread: starrocks-mysql-nio-pool-17816, id: 483174
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
com.starrocks.journal.JournalTask.get(JournalTask.java:78)
com.starrocks.journal.JournalTask.get(JournalTask.java:27)
com.starrocks.persist.EditLog.waitInfinity(EditLog.java:1166)
com.starrocks.persist.EditLog.logEdit(EditLog.java:1102)
com.starrocks.persist.EditLog.logJsonObject(EditLog.java:2035)
com.starrocks.persist.EditLog.logInsertTransactionState(EditLog.java:1588)
com.starrocks.transaction.DatabaseTransactionMgr.unprotectUpsertTransactionState(DatabaseTransactionMgr.java:1107)
com.starrocks.transaction.DatabaseTransactionMgr.unprotectedCommitTransaction(DatabaseTransactionMgr.java:1018)
com.starrocks.transaction.DatabaseTransactionMgr.commitTransaction(DatabaseTransactionMgr.java:437)
com.starrocks.transaction.GlobalTransactionMgr.commitTransaction(GlobalTransactionMgr.java:408)
com.starrocks.transaction.GlobalTransactionMgr.commitAndPublishTransaction(GlobalTransactionMgr.java:487)
com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:1872)
com.starrocks.qe.StmtExecutor.handleDMLStmtWithProfile(StmtExecutor.java:1523)
com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:615)
com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:363)
com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:477)
com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:753)
com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69)
com.starrocks.mysql.nio.ReadListener$$Lambda$929/1122474595.run(Unknown Source)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:750)
, current stack trace:
java.lang.Thread.getStackTrace(Thread.java:1564)
com.starrocks.common.util.LogUtil.getCurrentStackTrace(LogUtil.java:73)
com.starrocks.catalog.Database.logSlowLockEventIfNeeded(Database.java:170)
com.starrocks.catalog.Database.tryWriteLock(Database.java:275)
com.starrocks.transaction.GlobalTransactionMgr.commitAndPublishTransaction(GlobalTransactionMgr.java:481)
com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:1872)
com.starrocks.qe.StmtExecutor.handleDMLStmtWithProfile(StmtExecutor.java:1523)
com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:615)
com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:363)
com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:477)
com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:753)
com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:750)
2024-02-14 05:11:35,940 ERROR (JournalWriter|113) [BDBJEJournal.batchWriteCommit():422] failed to commit journal after retried 1 times! txn[] db[CloseSafeDatabase{db=32117065}]
com.sleepycat.je.rep.InsufficientAcksException: (JE 18.3.16) Transaction: -43035012 VLSN: 73,813,249, initiated at: 05:11:25. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 10000ms. FeederState=192.168.0.16_9010_1691045936256(1)[MASTER]

日志在上面,我之前调整了数据导入频率,会好一点,但是隔几天还是会挂

fe.log和meta/bdb/je.info.0这两个日志上传下,另外截图看下meta所在磁盘的io.util负载

je.info.0 (6.2 MB) fe.log.20240214-1 (78.8 MB)

这是什么盘,iops多大

出问题的时间是在2024-02-14 凌晨5点左右

磁盘都用的SSD iops 5000

这个问题后面找到原因了么,怎么解决的

同问,找到原因了吗,怎么解决的

这个问题后面找到原因了么,怎么解决的

这个问题后面找到原因了么,怎么解决的