Fe leader 节点的 old JVM 持续上升

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】当前3个 fe 5个 cn 节点。leader 节点内存持续上升。见下图:


image
【业务影响】内存持续上升,导致集群无法使用
【是否存算分离】
【StarRocks版本】3.4.5
【集群规模】例如:3fe(1 follower+2observer)+5cn
【机器信息】CPU虚拟核/内存/网卡,fe 配置 16C 64G
【联系方式】cuijunle@supcon.com
【附件】fe.gc.log.20250729-160927 (3.6 MB)

  1. grep -u full fe.gc.log
  2. 当前集群的TabletNum和ReplicaNum是多少,show proc ‘/statistic’;

1、gc:


2、

观察了fe 日志 ,频繁打印:
2025-07-30 16:00:10.843+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@3acf6f31 rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.843+08:00 WARN (starrocks-taskrun-pool-6|17465) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@734cc693 rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.843+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@321e5e42 rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.843+08:00 WARN (starrocks-taskrun-pool-6|17465) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@76ff7637 rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.843+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@67bef38e rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.844+08:00 WARN (starrocks-taskrun-pool-6|17465) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@34967ae2 rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.844+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@2c49be2d rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.844+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@6d3eea72 rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.844+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@2d9255d9 rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.844+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@3da0e166 rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.844+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@5c799829 rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.844+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@460e94ae rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.844+08:00 WARN (starrocks-taskrun-pool-13|21573) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():207] Task java.util.concurrent.CompletableFuture$AsyncSupply@790d2d2b rejected from cache-dict java.util.concurrent.ThreadPoolExecutor@7d556d31[Running, pool size = 16, active threads = 16, queued tasks = 0, completed tasks = 15521]
2025-07-30 16:00:10.928+08:00 WARN (scheduler-dispatch-pool-6|17623) [ShardManager.updateShardReplicaInfoInternal():1285] shard [132894398, 132894400, 132894405, 133128720, 133128722, 133129683, 133129685, 133129695, 133129701, 133129720, 133129724, 133129735, 133129738, 133129975, 133129982, 133130034, 133130037, 133130190, 133130195, 133130234, 133130237, 133130327, 133130329, 133130863, 133130871, 133130889, 133130894, 133131017, 133131019, 133131081, 133131083, 133131248, 133131249, 133131273, 133131274, 133131285, 133131286, 133131326, 133131330, 133131363, 133131364, 133131374, 133131375, 133131387, 133131393, 133131415, 133131417, 133131429, 133131432, 133131454, 133131463, 133131467, 133131473, 133131493, 133131501, 133131519, 133131526, 133131534, 133131538, 133131545, 133131553, 133131556, 133131561, 133131600, 133131605, 133131609, 133131612, 133131621, 133131625, 133131666, 133131668, 133131927, 133131928, 133131953, 133131954, 133132066, 133132071, 133132081, 133132084] not exist when update shard info from shard scheduler!
2025-07-30 16:00:11.865+08:00 WARN (starrocks-taskrun-pool-12|21474) [StatisticsCollectionTrigger.waitFinish():222] await collect statistic task failed after 1 seconds, which mean too many jobs in the queue
2025-07-30 16:00:12.882+08:00 WARN (starrocks-taskrun-pool-13|21573) [StatisticsCollectionTrigger.waitFinish():222] await collect statistic task failed after 1 seconds, which mean too many jobs in the queue
2025-07-30 16:00:13.041+08:00 WARN (starrocks-taskrun-pool-8|19768) [StatisticsCollectionTrigger.waitFinish():222] await collect statistic task failed after 1 seconds, which mean too many jobs in the queue
2025-07-30 16:00:13.077+08:00 WARN (starrocks-taskrun-pool-10|21472) [StatisticsCollectionTrigger.waitFinish():222] await collect statistic task failed after 1 seconds, which mean too many jobs in the queue
2025-07-30 16:00:13.376+08:00 WARN (starrocks-taskrun-pool-6|17465) [StatisticsCollectionTrigger.waitFinish():222] await collect statistic task failed after 1 seconds, which mean too many jobs in the queue
2025-07-30 16:00:13.581+08:00 WARN (starrocks-taskrun-pool-7|17466) [StatisticsCollectionTrigger.waitFinish():222] await collect statistic task failed after 1 seconds, which mean too many jobs in the queue
2025-07-30 16:00:15.628+08:00 WARN (starrocks-taskrun-pool-3|17392) [StatisticsCollectionTrigger.waitFinish():222] await collect statistic task failed after 1 seconds, which mean too many jobs in the queue

filtered.log (6.7 MB)

2025-07-30 16:37:09.404+08:00 INFO (MemoryUsageTracker|65) [MemoryUsageTracker.trackMemory():111] total tracked memory: 53.5MB, jvm: Process used: 11.6GB, heap used: 9.1GB, non heap used: 373.7MB, direct buffer used: 203.7MB

没有哦,这里看你jvm设置了11G,heap达到了9.1G,确实是有点不够,机器是64G的,能设置成32G吗,怀疑就是heap不够,因为3.3后面用的是JDK-11了,正常会比JDK-8多一点内存

你好,目前设置的是32G的,这个日志应该是造成OOM后,重启后使用11.6G。

jvm参数加上:-XX:+UnlockExperimentalVMOptions -XX:G1MaxNewSizePercent=50


加了参数,上升还是很明显

mem-profile-20250805-112120.html.tar.gz (210.9 KB)

查看了mem-profile 文件,异步物化视图的计划和执行阶段,占据了大量的内存。把业务中的实时物化视图刷新全部改成分钟级别后,观察内存回收正常了。

1赞