StarRocks BE 又挂了

【详述】测试环境BE节点又挂了
【背景】升级至新版本2.2.13稳定运行了2天,第3天BE挂掉了。。
【业务影响】影响开发测试,挂掉的频率较高,从2.2.5 升级至 2.2.13 后仍然挂。
【StarRocks版本】2.2.13
【集群规模】3fe(3 follower)+3be(fe与be混部)
【机器信息】虚拟机 4C/16G
【联系方式】StarRocks社区微信群6-土豆,艾特我即可
【附件】

  • be crash
    • be.out

start time: Mon Apr 24 19:28:08 CST 2023
tcmalloc: large alloc 1195728896 bytes == 0x14212e000 @ 0x5594edf 0x582675c 0x1f5e8b8 0x57769d5
tcmalloc: large alloc 1347379200 bytes == 0x1d9590000 @ 0x5594edf 0x582675c 0x1f5e8b8 0x57769d5
terminate called after throwing an instance of ‘terminate called recursively
query_id:00000000-0000-0000-0000-000000000000
*** Aborted at 1682539052 (unix time) try “date -d @1682539052” if you are using GNU date ***
PC: @ 0x7f336a75e387 __GI_raise
*** SIGABRT (@0x3e80002108b) received by PID 135307 (TID 0x7f32ae269700) from PID 135307; stack trace: ***
std::bad_alloc’
what(): std::bad_alloc
@ 0x3db8592 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f336b213630 (unknown)
@ 0x7f336a75e387 __GI_raise
@ 0x7f336a75fa78 __GI_abort
@ 0x57779d2 __gnu_cxx::__verbose_terminate_handler()
@ 0x5776486 __cxxabiv1::__terminate()
@ 0x57764f1 std::terminate()
@ 0x5776644 __cxa_throw
@ 0x183cd14 _Znwm.cold
@ 0x57edeaa std::__cxx11::basic_string<>::_M_mutate()
@ 0x57ee8d0 std::__cxx11::basic_string<>::_M_replace_aux()
@ 0x1ea7b0d apache::thrift::protocol::TBinaryProtocolT<>::readStringBody<>()
@ 0x1ea7cbc apache::thrift::protocol::TVirtualProtocol<>::readMessageBegin_virt()
@ 0x206e749 apache::thrift::TDispatchProcessor::process()
@ 0x3da0d88 apache::thrift::server::TConnectedClient::run()
@ 0x3d99284 apache::thrift::server::TThreadedServer::TConnectedClientRunner::run()
@ 0x3d9ba8d apache::thrift::concurrency::thread::threadMain()
@ 0x3d81096 std::thread::_State_impl<>::_M_run()
@ 0x57f0810 execute_native_thread_routine
@ 0x7f336b20bea5 start_thread
@ 0x7f336a826b0d __clone
@ 0x0 (unknown)

挂掉前的监控:

请问这个问题能找到原因么?

1.cat /proc/sys/vm/overcommit_memory配置是否为1
2.be混合部署,be.conf配置下mem_limit=总内存-其他服务内存-1g
3.be内存配置比较低(<=16g),be.conf配置下
disable_column_pool=true
chunk_reserved_bytes_limit=100000000

1,该选项为0;
2,be内存mem_limit配置符合要求
3,这俩参数已配置
disable_column_pool=true
chunk_reserved_bytes_limit=100000000

这个配置改为1吧

已修改,我再观察几天

这个改后,又发生过Crash吗?

已确定原因,正在修复。