3.0.5版本BE频繁crash

【详述】 BE节点频繁crash
【背景】
【业务影响】
【是否存算分离】
【StarRocks版本】3.0.5
【集群规模】 3fe +4be(fe与be混部)
【机器信息】72C/370G/万兆
【联系方式】
【附件】
BE.OUT:

query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 91258836576
tracker:query_pool consumption: 28090164
tracker:process consumption: 91258836576
tracker:load consumption: 0
tracker:metadata consumption: 2845724823
tracker:query_pool consumption: 28090164
tracker:tablet_metadata consumption: 483315192
tracker:load consumption: 0
tracker:rowset_metadata consumption: 359997559
tracker:metadata consumption: 2845724823
tracker:segment_metadata consumption: 415178664
tracker:tablet_metadata consumption: 483315192
tracker:column_metadata consumption: 1587233408
tracker:rowset_metadata consumption: 359997559
tracker:tablet_schema consumption: 289128
tracker:segment_metadata consumption: 415178664
tracker:segment_zonemap consumption: 314760935
tracker:column_metadata consumption: 1587233408
tracker:short_key_index consumption: 70573056
tracker:tablet_schema consumption: 289128
tracker:column_zonemap_index consumption: 700714048
tracker:segment_zonemap consumption: 314760935
tracker:ordinal_index consumption: 402856480
tracker:short_key_index consumption: 70573056
tracker:bitmap_index consumption: 0
tracker:column_zonemap_index consumption: 700714048
tracker:bloom_filter_index consumption: 0
tracker:ordinal_index consumption: 402856480
tracker:compaction consumption: 10928536
tracker:bitmap_index consumption: 0
tracker:schema_change consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:column_pool consumption: 7078212211
tracker:compaction consumption: 10928536
tracker:page_cache consumption: 76526671264
tracker:schema_change consumption: 0
tracker:update consumption: 1549834
tracker:column_pool consumption: 7078212211
tracker:chunk_allocator consumption: 2136124208
tracker:page_cache consumption: 76526671264
tracker:clone consumption: 0
tracker:update consumption: 1549834
tracker:consistency consumption: 0
tracker:chunk_allocator consumption: 2136124208
*** Check failure stack trace: ***
tracker:clone consumption: 0
tracker:consistency consumption: 0
*** Aborted at 1700375600 (unix time) try “date -d @1700375600” if you are using GNU date ***
PC: @ 0x2b9e602f21f7 __GI_raise
*** SIGABRT (@0x7d00001111f) received by PID 69919 (TID 0x2ba20fb33700) from PID 69919; stack trace: ***
@ 0x63c5622 google::(anonymous namespace)::FailureSignalHandler()
@ 0x2b9e5f9a25e0 (unknown)
@ 0x2b9e602f21f7 __GI_raise
@ 0x2b9e602f38e8 __GI_abort
@ 0x30d708e starrocks::failure_function()
@ 0x63b8ffd google::LogMessage::Fail()
@ 0x63bb46f google::LogMessage::SendToLog()
@ 0x63b8b4e google::LogMessage::Flush()
@ 0x63bba79 google::LogMessageFatal::~LogMessageFatal()
@ 0x486a3c5 starrocks::TabletUpdates::_check_for_apply()
@ 0x487dfa6 starrocks::TabletUpdates::rowset_commit()
@ 0x4812f8e starrocks::Tablet::rowset_commit()
@ 0x489f653 starrocks::TxnManager::publish_txn()
@ 0x3096026 _ZZN9starrocks24run_publish_version_taskEPNS_15ThreadPoolTokenERKNS_22TPublishVersionRequestERNS_18TFinishTaskRequestERSt13unordered_setIPNS_7DataDirESt4hashIS9_ESt8equal_toIS9_ESaIS9_EEjENKUlvE_clEv
@ 0x51a4692 starrocks::ThreadPool::dispatch_thread()
@ 0x519f18a starrocks::thread::supervise_thread()
@ 0x2b9e5f99ae25 start_thread
@ 0x2b9e603b534d __clone
@ 0x0 (unknown)
start time: Sun Nov 19 23:52:38 CST 2023

参考 [问题排查]BE Crash 获取下异常日志

异常日志:
F1122 01:12:48.527740 150889 tablet_updates.cpp:788] submit apply task failed: Runtime error: Could not create thread: Resource temporarily unavailable tablet:432759625 #version:88 [30 98@86 98.1] pending: rowsets:1

https://docs.starrocks.io/zh/docs/deployment/environment_configurations/ 参考这个文档,看下ulimit这些基础环境参数都配置正确了吗

基础环境检查了,都按照要求设置了,可是刚刚又崩了一次

cat /proc/$be_pid/limits

1700632855943

process用的还是默认值

sudo prlimit --pid=$be_pid --nproc=819200:819200

OK,已执行,观察一段时间看看

最近还有crash吗

最近没有crash了,好像就是那个进程数不够导致的。多谢啦~