3.1.9 stream load be crash

机器配置:8c/32g
stream load 量特别大时,be crash。

执行 ulimit -n 65536
vm.overcommit_memory=1
vm.max_map_count=262144

be.out 日志如下:
3.1.9 RELEASE (build e1c6e4e)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 17406960688
tracker:query_pool consumption: 719582104
tracker:load consumption: 148457672
tracker:metadata consumption: 5706633955
tracker:tablet_metadata consumption: 56167089
tracker:rowset_metadata consumption: 332868484
tracker:segment_metadata consumption: 920086296
tracker:column_metadata consumption: 4397512086
tracker:tablet_schema consumption: 73905
tracker:segment_zonemap consumption: 792990488
tracker:short_key_index consumption: 10143084
tracker:column_zonemap_index consumption: 1620726662
tracker:ordinal_index consumption: 816018208
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 297072
tracker:schema_change consumption: 0
tracker:column_pool consumption: 411501905
tracker:page_cache consumption: 2286606000
tracker:update consumption: 4215901166
tracker:chunk_allocator consumption: 1868885184
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1719217513 (unix time) try “date -d @1719217513” if you are using GNU date ***
PC: @ 0x7fb29f920005 __GI_raise
*** SIGABRT (@0x3f000000ab8) received by PID 2744 (TID 0x7fb212361640) from PID 2744; stack trace: ***
@ 0x64fe742 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fb2a0c841d0 (unknown)
@ 0x7fb29f920005 __GI_raise
@ 0x7fb29f8f2894 __GI_abort
@ 0x28739ec _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x88bf226 __cxxabiv1::__terminate()
@ 0x88bf291 std::terminate()
@ 0x88bf9ff __cxa_pure_virtual
@ 0x36607b5 starrocks::pipeline::PipelineDriverPoller::run_internal()
@ 0x2cf39ea starrocks::thread::supervise_thread()
@ 0x7fb2a0c793fb start_thread
@ 0x7fb29f909e83 __GI___clone
@ 0x0 (unknown)
start time: Mon Jun 24 04:25:44 PM CST 2024

机器没开swap吧

做了以下两个调整后压测正常 :

  1. 机器关闭swap,设置vm.overcommit_memory = 1
  2. 由于是混部,调整fe jvm 和 be mem_limit 使fe,be内存加和小于机器总内存
1赞