【详述】原本3个FE,去掉了两个。目前是1FE+1BE, 重启唯一的FE后,一直在执行类似如下的fe.log中的日志。我看里面prepare time最早从3月开始,一直到6.4号(也就是重启时间),然后又开始从3月执行了。已经来来回回跑了两天了。数据量只有1亿条左右,dbd目录在不停的增长,已经300多G了。偿试了metadata_failure_recovery = true 也是一样。
fe.log:
2024-06-06 01:30:12,708 INFO (stateChangeExecutor|95) [DatabaseTransactionMgr.replayUpsertTransactionState():1644] remove expired transaction: TransactionState. txn_id: 31526025, label: mx_login_records_routine_load-478178-0407ff05-226d-41c8-a9eb-1543c26ebb15, db id: 11385, table id list: 477392, callback id: 478178, coordinator: FE: starrocks-fe-03, transaction status: VISIBLE, error replicas num: 0, replica ids: , prepare time: 1709536015265, write end time: -1, allow commit time: -1, commit time: 1709536016115, finish time: 1709536016157, write cost: 850ms, publish total cost: 42ms, total cost: 892ms, reason: attachment: RLTaskTxnCommitAttachment [filteredRows=0, loadedRows=28, unselectedRows=0, receivedBytes=12499, taskExecutionTimeMs=696, taskId=null, jobId=0, progress=KafkaProgress [partitionIdToOffset=0_9319917]]
fe.out 有报错:
Jun 06, 2024 1:39:13 AM com.github.benmanes.caffeine.cache.LocalAsyncCache$AsyncBulkCompleter accept
WARNING: Exception thrown during asynchronous load
java.util.concurrent.CompletionException: java.lang.NullPointerException
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
at com.starrocks.lake.LakeTablet.getBackendIds(LakeTablet.java:120)
at com.starrocks.lake.LakeTablet.getQueryableReplicas(LakeTablet.java:132)
at com.starrocks.planner.OlapScanNode.addScanRangeLocations(OlapScanNode.java:489)
at com.starrocks.sql.plan.PlanFragmentBuilder$PhysicalPlanTranslator.visitPhysicalOlapScan(PlanFragmentBuilder.java:755)
at com.starrocks.sql.plan.PlanFragmentBuilder$PhysicalPlanTranslator.visitPhysicalOlapScan(PlanFragmentBuilder.java:375)
at com.starrocks.sql.optimizer.operator.physical.PhysicalOlapScanOperator.accept(PhysicalOlapScanOperator.java:185)
at com.starrocks.sql.plan.PlanFragmentBuilder$PhysicalPlanTranslator.visit(PlanFragmentBuilder.java:392)
at com.starrocks.sql.plan.PlanFragmentBuilder$PhysicalPlanTranslator.translate(PlanFragmentBuilder.java:386)
at com.starrocks.sql.plan.PlanFragmentBuilder.createPhysicalPlan(PlanFragmentBuilder.java:222)
at com.starrocks.sql.StatementPlanner.createQueryPlan(StatementPlanner.java:164)
at com.starrocks.sql.StatementPlanner.planQuery(StatementPlanner.java:125)
at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:99)
at com.starrocks.statistic.StatisticExecutor.executeDQL(StatisticExecutor.java:333)
at com.starrocks.statistic.StatisticExecutor.executeStatisticDQL(StatisticExecutor.java:323)
at com.starrocks.statistic.StatisticExecutor.queryStatisticSync(StatisticExecutor.java:117)
at com.starrocks.sql.optimizer.statistics.ColumnBasicStatsCacheLoader.queryStatisticsData(ColumnBasicStatsCacheLoader.java:129)
at com.starrocks.sql.optimizer.statistics.ColumnBasicStatsCacheLoader.lambda$asyncLoadAll$1(ColumnBasicStatsCacheLoader.java:93)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
… 3 more
fe.warn.log
2024-06-06 01:40:01,231 WARN (stateChangeExecutor|95) [CachedStatisticStorage.refreshTableStatistic():124] java.util.concurrent.ExecutionException: java.lang.NullPointerException
2024-06-06 01:40:01,441 WARN (stateChangeExecutor|95) [TaskManager.replayUpdateTaskRun():737] could not find query_id:5a8fce7b-dd2b-11ee-a4fc-06b2d51c4548, taskId:4613392, when replay update pendingTaskRun
2024-06-06 01:40:01,598 WARN (stateChangeExecutor|95) [TaskManager.replayUpdateTaskRun():737] could not find query_id:9057cfe7-dd2b-11ee-a4fc-06b2d51c4548, taskId:24777585, when replay update pendingTaskRun
2024-06-06 01:40:01,659 WARN (stateChangeExecutor|95) [TaskManager.replayUpdateTaskRun():737] could not find query_id:9e3e4faa-dd2b-11ee-a4fc-06b2d51c4548, taskId:4605808, when replay update pendingTaskRun
^X2024-06-06 01:40:04,665 WARN (stateChangeExecutor|95) [TransactionGraph.remove():127] remove txn 32087850 with dependency: [32087851] this may happen during FE upgrading
2024-06-06 01:40:04,935 WARN (stateChangeExecutor|95) [TaskManager.replayUpdateTaskRun():737] could not find query_id:3824f98e-dd30-11ee-a4fc-06b2d51c4548, taskId:24783505, when replay update pendingTaskRun