starrocks 3.2.16 3台 be接连挂掉

【详述】
starrocks 3.2.16
starrocks 3fe 3be 混部

不知道为什么,下午3个be全部挂掉,systemctl自动拉起,但是想查询一下原因是啥。

看了当时资源,负载稍微高一些,但也没高太多

To平台與運維[1.主题]:10.176.46.208-8040 異常[2.告警时间]:2025.12.20 13:16:38[3.问题详情]:Port 8040:0[4.告警集群]:gl-starrocks1[5.告警內容]:StarRocks-be,have some problem.

To平台與運維[1.主题]:10.176.46.209-8040 異常[2.告警时间]:2025.12.20 13:16:23[3.问题详情]:Port 8040:0[4.告警集群]:gl-starrocks1[5.告警內容]:StarRocks-be,have some problem.

To平台與運維[1.主题]:10.176.46.210-8040 異常[2.告警时间]:2025.12.20 13:16:38[3.问题详情]:Port 8040:0[4.告警集群]:gl-starrocks1[5.告警內容]:StarRocks-be,have some problem.

208为例:

报警时间:2025.12.20 13:19:08
恢复时间:2025.12.20 13:19:38
其他两台时间点差不多。

【业务影响】当时一些查询失败
【是否存算分离】存算一体
【StarRocks版本】3.2.16
【集群规模】3fe(1 follower+2observer)+ 3be
【机器信息】
【联系方式】StarRocks社区群20-生鱼片
【附件】 三台机器,附近时间点所有日志如下,麻烦分析下:starrocks_log.zip (14.3 MB)

最开始,三台机器 报警时间为2025.12.20 13:16:23

所有日志在zip中,其中be.out如下:
query_id:02755598-dd63-11f0-96fa-005056ab376f, fragment_instance:02755598-dd63-11f0-96fa-005056ab3773
tracker:process consumption: 6578850264
tracker:jemalloc_metadata consumption: 131776160
tracker:query_pool consumption: 144394568
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 6180800
tracker:metadata consumption: 1030454570
tracker:tablet_metadata consumption: 696756311
tracker:rowset_metadata consumption: 322068760
tracker:segment_metadata consumption: 1700647
tracker:column_metadata consumption: 9928852
tracker:tablet_schema consumption: 4358703
tracker:segment_zonemap consumption: 1278606
tracker:short_key_index consumption: 213140
tracker:column_zonemap_index consumption: 2517364
tracker:ordinal_index consumption: 3749888
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 3076552
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 1455662000
tracker:update consumption: 2030586237
tracker:chunk_allocator consumption: 180011648
tracker:passthrough consumption: 0
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1766207776 (unix time) try “date -d @1766207776” if you are using GNU date ***
PC: @ 0x7fc88b56a79e __memcpy_evex_unaligned_erms
*** SIGSEGV (@0x373a81d) received by PID 1502337 (TID 0x7fc764185640) from PID 57911325; stack trace: ***
@ 0x6d79c02 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fc88c404bcb os::Linux::chained_handler()
@ 0x7fc88c409a2d JVM_handle_linux_signal
@ 0x7fc88c3fd538 signalHandler()
@ 0x7fc88b43e730 (unknown)
@ 0x7fc88b56a79e __memcpy_evex_unaligned_erms
@ 0x2dc474b std::__cxx11::basic_string<>::_M_assign()
@ 0x497b528 starrocks::LeadLagWindowFunction<>::get_values()
@ 0x37664a9 starrocks::Analytor::_streaming_process_for_half_unbounded_rows_frame()
@ 0x3769b91 starrocks::Analytor::process()
@ 0x3b1e1c9 starrocks::pipeline::AnalyticSinkOperator::push_chunk()
@ 0x3adccf6 starrocks::pipeline::PipelineDriver::process()
@ 0x3acd7f7 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x30496ac starrocks::ThreadPool::dispatch_thread()
@ 0x30424ca starrocks::thread::supervise_thread()
@ 0x7fc88b489d22 start_thread
@ 0x7fc88b50ed40 __clone3
@ 0x0 (unknown)
start time: Sat Dec 20 01:16:25 PM CST 2025, server uptime: 13:16:25 up 296 days, 2:33, 0 users, load average: 89.98, 29.79, 11.94
[0.001s][warning][gc] -XX:+PrintGCDetails is deprecated. Will use -Xlog:gc* instead.
[0.006s][info ][gc,heap] Heap region size: 16M
[0.013s][info ][gc ] Using G1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/starrocks/StarRocks/be/lib/jni-packages/starrocks-jdbc-bridge-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/starrocks/StarRocks/be/lib/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
[0.979s][info ][gc,start] GC(0) Pause Young (Concurrent Start) (Metadata GC Threshold)
[0.983s][info ][gc,task ] GC(0) Using 23 workers of 23 for evacuation
[1.018s][info ][gc,phases] GC(0) Pre Evacuate Collection Set: 0.0ms
[1.018s][info ][gc,phases] GC(0) Evacuate Collection Set: 21.2ms
[1.018s][info ][gc,phases] GC(0) Post Evacuate Collection Set: 13.5ms
[1.018s][info ][gc,phases] GC(0) Other: 4.8ms
[1.018s][info ][gc,heap ] GC(0) Eden regions: 6->0(5)
[1.018s][info ][gc,heap ] GC(0) Survivor regions: 0->1(1)
[1.018s][info ][gc,heap ] GC(0) Old regions: 0->1
[1.018s][info ][gc,heap ] GC(0) Humongous regions: 0->0
[1.018s][info ][gc,metaspace] GC(0) Metaspace: 20974K->20974K(22528K)
[1.018s][info ][gc ] GC(0) Pause Young (Concurrent Start) (Metadata GC Threshold) 95M->17M(2016M) 39.372ms
[1.019s][info ][gc,cpu ] GC(0) User=0.42s Sys=0.07s Real=0.04s
[1.019s][info ][gc ] GC(1) Concurrent Cycle
[1.019s][info ][gc,marking ] GC(1) Concurrent Clear Claimed Marks
[1.019s][info ][gc,marking ] GC(1) Concurrent Clear Claimed Marks 0.013ms
[1.019s][info ][gc,marking ] GC(1) Concurrent Scan Root Regions
[1.021s][info ][gc,marking ] GC(1) Concurrent Scan Root Regions 2.834ms
[1.021s][info ][gc,marking ] GC(1) Concurrent Mark (1.021s)
[1.021s][info ][gc,marking ] GC(1) Concurrent Mark From Roots
[1.023s][info ][gc,task ] GC(1) Using 6 workers of 6 for marking
[1.037s][info ][gc,marking ] GC(1) Concurrent Mark From Roots 15.604ms
[1.037s][info ][gc,marking ] GC(1) Concurrent Preclean
[1.037s][info ][gc,marking ] GC(1) Concurrent Preclean 0.131ms
[1.037s][info ][gc,marking ] GC(1) Concurrent Mark (1.021s, 1.037s) 15.763ms
[1.038s][info ][gc,start ] GC(1) Pause Remark
[1.040s][info ][gc,stringtable] GC(1) Cleaned string and symbol table, strings: 8590 processed, 0 removed, symbols: 71345 processed, 49 removed
[1.040s][info ][gc ] GC(1) Pause Remark 20M->20M(2016M) 2.240ms
[1.040s][info ][gc,cpu ] GC(1) User=0.03s Sys=0.00s Real=0.00s
[1.040s][info ][gc,marking ] GC(1) Concurrent Rebuild Remembered Sets
[1.041s][info ][gc,marking ] GC(1) Concurrent Rebuild Remembered Sets 1.428ms
[1.042s][info ][gc,start ] GC(1) Pause Cleanup
[1.042s][info ][gc ] GC(1) Pause Cleanup 20M->20M(2016M) 0.139ms
[1.042s][info ][gc,cpu ] GC(1) User=0.00s Sys=0.00s Real=0.00s
[1.042s][info ][gc,marking ] GC(1) Concurrent Cleanup for Next Mark
[1.046s][info ][gc,marking ] GC(1) Concurrent Cleanup for Next Mark 4.101ms
[1.046s][info ][gc ] GC(1) Concurrent Cycle 27.414ms
[1.742s][info ][gc,start ] GC(2) Pause Young (Normal) (G1 Evacuation Pause)
[1.742s][info ][gc,task ] GC(2) Using 23 workers of 23 for evacuation
[1.750s][info ][gc,phases ] GC(2) Pre Evacuate Collection Set: 0.0ms
[1.750s][info ][gc,phases ] GC(2) Evacuate Collection Set: 6.4ms
[1.750s][info ][gc,phases ] GC(2) Post Evacuate Collection Set: 1.8ms
[1.750s][info ][gc,phases ] GC(2) Other: 0.3ms
[1.750s][info ][gc,heap ] GC(2) Eden regions: 5->0(5)
[1.750s][info ][gc,heap ] GC(2) Survivor regions: 1->1(1)
[1.750s][info ][gc,heap ] GC(2) Old regions: 1->2
[1.750s][info ][gc,heap ] GC(2) Humongous regions: 0->0
[1.750s][info ][gc,metaspace ] GC(2) Metaspace: 31915K->31915K(32768K)
[1.750s][info ][gc ] GC(2) Pause Young (Normal) (G1 Evacuation Pause) 97M->22M(2016M) 8.401ms
[1.750s][info ][gc,cpu ] GC(2) User=0.14s Sys=0.01s Real=0.01s
[1.994s][info ][gc,start ] GC(3) Pause Young (Concurrent Start) (Metadata GC Threshold)
[1.994s][info ][gc,task ] GC(3) Using 23 workers of 23 for evacuation
[2.005s][info ][gc,phases ] GC(3) Pre Evacuate Collection Set: 0.0ms
[2.005s][info ][gc,phases ] GC(3) Evacuate Collection Set: 11.0ms
[2.005s][info ][gc,phases ] GC(3) Post Evacuate Collection Set: 0.4ms
[2.006s][info ][gc,phases ] GC(3) Other: 0.4ms
[2.006s][info ][gc,heap ] GC(3) Eden regions: 2->0(5)
[2.006s][info ][gc,heap ] GC(3) Survivor regions: 1->1(1)
[2.006s][info ][gc,heap ] GC(3) Old regions: 2->2
[2.006s][info ][gc,heap ] GC(3) Humongous regions: 0->0
[2.006s][info ][gc,metaspace ] GC(3) Metaspace: 35435K->35435K(36864K)
[2.006s][info ][gc ] GC(3) Pause Young (Concurrent Start) (Metadata GC Threshold) 38M->23M(2016M) 11.710ms
[2.006s][info ][gc,cpu ] GC(3) User=0.22s Sys=0.00s Real=0.01s
[2.006s][info ][gc ] GC(4) Concurrent Cycle
[2.006s][info ][gc,marking ] GC(4) Concurrent Clear Claimed Marks
[2.006s][info ][gc,marking ] GC(4) Concurrent Clear Claimed Marks 0.018ms
[2.006s][info ][gc,marking ] GC(4) Concurrent Scan Root Regions
[2.008s][info ][gc,marking ] GC(4) Concurrent Scan Root Regions 2.508ms
[2.008s][info ][gc,marking ] GC(4) Concurrent Mark (2.008s)
[2.008s][info ][gc,marking ] GC(4) Concurrent Mark From Roots
[2.008s][info ][gc,task ] GC(4) Using 6 workers of 6 for marking
[2.014s][info ][gc,marking ] GC(4) Concurrent Mark From Roots 5.932ms
[2.014s][info ][gc,marking ] GC(4) Concurrent Preclean
[2.014s][info ][gc,marking ] GC(4) Concurrent Preclean 0.177ms
[2.014s][info ][gc,marking ] GC(4) Concurrent Mark (2.008s, 2.014s) 6.140ms
[2.015s][info ][gc,start ] GC(4) Pause Remark
[2.027s][info ][gc,stringtable] GC(4) Cleaned string and symbol table, strings: 16916 processed, 249 removed, symbols: 112528 processed, 29 removed
[2.027s][info ][gc ] GC(4) Pause Remark 27M->27M(2016M) 12.268ms
[2.027s][info ][gc,cpu ] GC(4) User=0.23s Sys=0.01s Real=0.01s
[2.027s][info ][gc,marking ] GC(4) Concurrent Rebuild Remembered Sets
[2.032s][info ][gc,marking ] GC(4) Concurrent Rebuild Remembered Sets 4.594ms
[2.032s][info ][gc,start ] GC(4) Pause Cleanup
[2.032s][info ][gc ] GC(4) Pause Cleanup 28M->28M(2016M) 0.130ms
[2.032s][info ][gc,cpu ] GC(4) User=0.00s Sys=0.00s Real=0.00s
[2.032s][info ][gc,marking ] GC(4) Concurrent Cleanup for Next Mark
[2.036s][info ][gc,marking ] GC(4) Concurrent Cleanup for Next Mark 3.315ms
[2.036s][info ][gc ] GC(4) Concurrent Cycle 29.990ms