存算一体 3.2.12-3faf7d4, be 宕机问题

【详述】be 宕机,be.out 日志如下:

query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 44555244824
tracker:jemalloc_metadata consumption: 3254386080
tracker:jemalloc_fragmentation consumption: 2213640184
tracker:query_pool consumption: 0
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 2960
tracker:metadata consumption: 10133231469
tracker:tablet_metadata consumption: 3133308949
tracker:rowset_metadata consumption: 1720772328
tracker:segment_metadata consumption: 560807897
tracker:column_metadata consumption: 4718342295
tracker:tablet_schema consumption: 8772325
tracker:segment_zonemap consumption: 478843199
tracker:short_key_index consumption: 27504693
tracker:column_zonemap_index consumption: 1028743879
tracker:ordinal_index consumption: 1691566816
tracker:bitmap_index consumption: 7064000
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 17218976
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 284776528
tracker:update consumption: 12505968202
tracker:chunk_allocator consumption: 2147834784
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1729915586 (unix time) try “date -d @1729915586” if you are using GNU date ***
PC: @ 0x551a664 starrocks::MetadataCache::_cache_value_deleter()
*** SIGSEGV (@0x0) received by PID 3337 (TID 0x7f87b7df4700) from PID 0; stack trace: ***
@ 0x6c90c02 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f905ce6e095 os::Linux::chained_handler()
@ 0x7f905ce73091 JVM_handle_linux_signal
@ 0x7f905ce65e48 signalHandler()
@ 0x7f905c319630 (unknown)
@ 0x551a664 starrocks::MetadataCache::_cache_value_deleter()
@ 0x3032975 starrocks::LRUCache::insert()
@ 0x3032b42 starrocks::ShardedLRUCache::insert()
@ 0x551ad4e starrocks::MetadataCache::cache_rowset()
@ 0x5d133ff starrocks::Rowset::do_load()
@ 0x5d135ef starrocks::Rowset::load()
@ 0x5599114 starrocks::TabletReader::prepare()
@ 0x5d0814d starrocks::VerticalCompactionTask::_compact_column_group()
@ 0x5d08d43 starrocks::VerticalCompactionTask::_vertical_compaction_data()
@ 0x5d094b9 starrocks::VerticalCompactionTask::run_impl()
@ 0x5d02989 starrocks::CompactionTask::run()
@ 0x5605db3 _ZNSt17_Function_handlerIFvvEZN9starrocks17CompactionManager9_scheduleEvEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x3020b8c starrocks::ThreadPool::dispatch_thread()
@ 0x301a10a starrocks::thread::supervise_thread()
@ 0x7f905c311ea5 start_thread
@ 0x7f905b712b0d __clone
@ 0x0 (unknown)
start time: Sat Oct 26 12:20:45 CST 2024, server uptime: 12:20:45 up 922 days, 2:35, 1 user, load average: 13.33, 13.97, 16.83
Ignored unknown config: default_rowset_type
I0000 00:00:00.000000 26297 vlog_is_on.cc:197] RAW: Set VLOG level for “*” to 10
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apps/STARROCKS/starrocks-3.2.12-customize/be/lib/jni-packages/starrocks-jdbc-bridge-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apps/STARROCKS/starrocks-3.2.12-customize/be/lib/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.

【背景】
【业务影响】
【是否存算分离】否
【StarRocks版本】3.2.12-3faf7d4
【集群规模】例如:3fe +5be

@trueeyu 大佬,麻烦看下这个是什么原因导致的宕机?

BE元数据Cache导致,配置be.conf metadata_cache_memory_limit_percent=0 可以规避,具体原因,还要查下

把那个BE,Crash时间点附近的BE日志发我下

be日志没有了。等在出现后我发一下,目前上次宕机后,还没有宕机。