1月17日,5:30 线上集群故障,发现FE 节点宕机,FE 节点内存64GB,配置JVM 42GB ,版本3.2.15,
整理日志如下:
2026-01-17 05:28:24.017+08:00 WARN (heartbeat mgr|30) [HeartbeatMgr.runAfterCatalogReady():166] get bad heartbeat response: type: FRONTEND, status: BAD, msg: not ready, name: 10.200.222.11_9010_1741943503531, queryPort: 0, rpcPort: 0, replayedJournalId: 0, feStartTime: \N, feVersion: null
2026-01-17 05:28:24.029+08:00 WARN (thrift-server-pool-3428381|6594946) [Database.logSlowLockEventIfNeeded():171] slow db lock. type: writeLock, db id: 11016, db name: ODS_ZDH_5, wait time: 15286ms, former owner id: 328, owner name: ReportHandler, owner stack: dump thread: ReportHandler, id: 328
java.base@11.0.23/jdk.internal.misc.Unsafe.park(Native Method)
java.base@11.0.23/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
java.base@11.0.23/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
java.base@11.0.23/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
java.base@11.0.23/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
java.base@11.0.23/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
app//com.starrocks.journal.JournalTask.get(JournalTask.java:84)
app//com.starrocks.persist.EditLog.waitInfinity(EditLog.java:1315)
app//com.starrocks.persist.EditLog.logEdit(EditLog.java:1248)
app//com.starrocks.persist.EditLog.logJsonObject(EditLog.java:2258)
app//com.starrocks.persist.EditLog.logUpdateReplica(EditLog.java:1566)
app//com.starrocks.leader.ReportHandler.sync(ReportHandler.java:731)
app//com.starrocks.leader.ReportHandler.tabletReport(ReportHandler.java:467)
app//com.starrocks.leader.ReportHandler.access$300(ReportHandler.java:136)
app//com.starrocks.leader.ReportHandler$ReportTask.exec(ReportHandler.java:408)
app//com.starrocks.leader.ReportHandler.runOneCycle(ReportHandler.java:1871)
app//com.starrocks.common.util.Daemon.run(Daemon.java:109)
, current stack trace:
java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
com.starrocks.common.util.LogUtil.getCurrentStackTrace(LogUtil.java:75)
com.starrocks.catalog.Database.logSlowLockEventIfNeeded(Database.java:173)
com.starrocks.catalog.Database.writeLock(Database.java:253)
com.starrocks.transaction.GlobalTransactionMgr.commitTransactionUnderDatabaseWLock(GlobalTransactionMgr.java:568)
com.starrocks.transaction.GlobalTransactionMgr.retryCommitOnRateLimitExceeded(GlobalTransactionMgr.java:536)
com.starrocks.transaction.GlobalTransactionMgr.commitAndPublishTransaction(GlobalTransactionMgr.java:504)
com.starrocks.service.FrontendServiceImpl.loadTxnCommitImpl(FrontendServiceImpl.java:1496)
com.starrocks.service.FrontendServiceImpl.loadTxnCommit(FrontendServiceImpl.java:1453)
com.starrocks.thrift.FrontendService$Processor$loadTxnCommit.getResult(FrontendService.java:4501)
com.starrocks.thrift.FrontendService$Processor$loadTxnCommit.getResult(FrontendService.java:4481)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
com.starrocks.common.SRTThreadPoolServer$WorkerProcess.run(SRTThreadPoolServer.java:311)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base/java.lang.Thread.run(Thread.java:834) 1月 17, 2026 5:44:01 上午 com.baidu.jprotobuf.pbrpc.transport.RpcTimerTask run
警告: correlationId:4667866101 timeout with bound channel =>[id: 0xd4117850, L:/10.200.222.12:39176 - R:/10.200.222.13:8060]
1月 17, 2026 5:45:54 上午 com.baidu.jprotobuf.pbrpc.transport.RpcTimerTask run
警告: correlationId:4667866104 timeout with bound channel =>[id: 0x035fd388, L:/10.200.222.12:49962 - R:/10.200.222.13:8060]
Exception in thread “export_exporting_job_scheduler_thread_pool-0” java.lang.OutOfMemoryError: Java heap space
2026-01-17 06:21:30,038 starrocks-mysql-nio-pool-1328359 ERROR An exception occurred processing Appender Sys org.apache.logging.log4j.core.appender.AppenderLoggingException: java.lang.OutOfMemoryError: Java heap space
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:165)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:683)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:641)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:560)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
at org.apache.logging.log4j.core.Logger.log(Logger.java:162)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2034)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1899)
at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1444)
at com.starrocks.mysql.MysqlChannel.fetchOnePacket(MysqlChannel.java:189)
at com.starrocks.mysql.MysqlProto.readAuthPacket(MysqlProto.java:241)
at com.starrocks.mysql.MysqlProto.negotiate(MysqlProto.java:136)
at com.starrocks.mysql.nio.AcceptListener.lambda$handleEvent$1(AcceptListener.java:93)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.OutOfMemoryError: Java heap space
2026-01-17 06:21:30,039 starrocks-mysql-nio I/O-1 ERROR An exception occurred processing Appender Sys org.apache.logging.log4j.core.appender.AppenderLoggingException: java.lang.OutOfMemoryError: Java heap space
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:165)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:683)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:641)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:560)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
at org.apache.logging.log4j.core.Logger.log(Logger.java:162)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2040)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1907)
at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1449)
at com.starrocks.mysql.nio.AcceptListener.handleEvent(AcceptListener.java:81)
at com.starrocks.mysql.nio.AcceptListener.handleEvent(AcceptListener.java:57)
at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at org.xnio.nio.QueuedNioTcpServer2.acceptTask(QueuedNioTcpServer2.java:178)
at org.xnio.nio.WorkerThread.safeRun(WorkerThread.java:612)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:479)
Caused by: java.lang.OutOfMemoryError: Java heap space
1月 17, 2026 6:25:02 上午 com.baidu.jprotobuf.pbrpc.transport.RpcTimerTask run
警告: correlationId:4667912906 timeout with bound channel =>[id: 0x73b21013, L:/10.200.222.12:45024 - R:/10.200.222.14:8060]
[2026-01-17 06:41:58] failed to rebuild txn! txn[null] db[CloseSafeDatabase{db=296914670}]
Exception in thread “Timer-2” java.lang.OutOfMemoryError: Java heap space
Exception in thread “export_exporting_sub_task_scheduler_thread_pool-0” java.lang.OutOfMemoryError: Java heap space
2026-01-17 07:26:02,214 thrift-server-pool-3434001 ERROR An exception occurred processing Appender SysWF org.apache.logging.log4j.core.appender.AppenderLoggingException: java.lang.OutOfMemoryError: Java heap space
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:165)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:683)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:641)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:560)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
at org.apache.logging.log4j.core.Logger.log(Logger.java:162)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983)
at org.apache.logging.log4j.spi.AbstractLogger.error(AbstractLogger.java:750)
at com.starrocks.common.SRTThreadPoolServer$WorkerProcess.run(SRTThreadPoolServer.java:319)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.OutOfMemoryError: Java heap space
Exception in thread “pool-40-thread-1” java.lang.OutOfMemoryError: Java heap space
Exception in thread “Repository” java.lang.OutOfMemoryError: Java heap space
2026-01-17 07:26:02,219 starrocks-mysql-nio-pool-1328748 ERROR An exception occurred processing Appender Sys org.apache.logging.log4j.core.appender.AppenderLoggingException: java.lang.OutOfMemoryError: Java heap space
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:165)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:683)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:641)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:560)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
at org.apache.logging.log4j.core.Logger.log(Logger.java:162)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983)
at org.apache.logging.log4j.spi.AbstractLogger.warn(AbstractLogger.java:2671)
at com.starrocks.qe.scheduler.dag.FragmentInstanceExecState.waitForDeploymentCompletion(FragmentInstanceExecState.java:274)
at com.starrocks.qe.scheduler.Deployer.waitForDeploymentCompletion(Deployer.java:225)
at com.starrocks.qe.scheduler.Deployer.deployFragments(Deployer.java:116)
at com.starrocks.qe.DefaultCoordinator.deliverExecFragments(DefaultCoordinator.java:596)
at com.starrocks.qe.DefaultCoordinator.startScheduling(DefaultCoordinator.java:509)
at com.starrocks.qe.scheduler.Coordinator.startScheduling(Coordinator.java:102)
at com.starrocks.qe.scheduler.Coordinator.exec(Coordinator.java:85)
at com.starrocks.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1115)
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:619)
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:346)
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:540)
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:848)
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.OutOfMemoryError: Java heap space
Exception in thread “export_pending_job_scheduler_thread_pool-0” java.lang.OutOfMemoryError: Java heap space
Exception in thread “AuditEventProcessor” java.lang.OutOfMemoryError: Java heap space
Exception in thread “starrocks-mysql-nio Accept” java.lang.OutOfMemoryError: Java heap space
2026-01-17 07:26:02,230 starrocks-mysql-nio-pool-1328775 WARN org.apache.logging.log4j.spi.AbstractLogger caught java.lang.OutOfMemoryError logging ParameterizedMessage: slow db lock. type: {}, db id: {}, db name: {}, wait time: {}ms, former {}, current stack trace: {} java.lang.OutOfMemoryError: Java heap space
2026-01-17 07:26:02,230 thrift-server-pool-3433987 WARN org.apache.logging.slf4j.Log4jLogger caught java.lang.OutOfMemoryError logging SimpleMessage: Error closing output stream. java.lang.OutOfMemoryError: Java heap space
2026-01-17 07:26:02,231 starrocks-mysql-nio-pool-1328694 WARN org.apache.logging.log4j.spi.AbstractLogger caught java.lang.OutOfMemoryError logging ParameterizedMessage: slow db lock. type: {}, db id: {}, db name: {}, wait time: {}ms, former {}, current stack trace: {} java.lang.OutOfMemoryError: Java heap space
Exception in thread “export_exporting_job_scheduler_thread_pool-1” java.lang.OutOfMemoryError: Java heap space
[2026-01-17 07:26:02] this node is DETACHED
询问大模型反馈:
比较方法违反通用契约错误通常发生在比较器有传递性bug或不一致时 Sentry。从Java 7开始,排序算法被替换为TimSort算法,TimSort会在检测到比较中的不一致性时抛出IllegalArgumentException,但不保证总能检测到 Heaphero。
这个问题出现在 StatisticsCollectJobFactory.buildStatisticsCollectJob 的第97行,该处正在对统计收集任务进行排序。
两个日志文件的关联性
结合两个日志文件,我发现了一个 重要关联 :
- 时间线对应 :
- 第二个文件:05:09 - 05:29 出现 AutoStatistic 错误
- 第一个文件:05:28 开始出现慢数据库锁
- 第一个文件:06:21 开始 OOM
- 可能的因果关系 :
- AutoStatistic 线程崩溃 → 统计信息收集失败
- 统计任务堆积 → 内存占用增加
- 结合数据库锁竞争 → 最终导致 OOM
请问是否是这个问题导致的,我该怎么处理?