BE节点宕机

【详述】版本2.3.7,生产服务宕机,请帮忙看看
terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
*** Aborted at 1680319408 (unix time) try “date -d @1680319408” if you are using GNU date ***
PC: @ 0x7f996046d387 __GI_raise
*** SIGABRT (@0xe42) received by PID 3650 (TID 0x7f9849dd5700) from PID 3650; stack trace: ***
@ 0x40e1c82 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f9960f22630 (unknown)
@ 0x7f996046d387 __GI_raise
@ 0x7f996046ea78 __GI_abort
@ 0x1913cbd _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x5ae5886 __cxxabiv1::__terminate()
@ 0x5b89959 __cxa_call_terminate
@ 0x5ae52a1 __gxx_personality_v0
@ 0x5b905ae _Unwind_RaiseException_Phase2
@ 0x5b910a6 _Unwind_Resume
@ 0x182789c _ZN4brpc6policy17ProcessRpcRequestEPNS_16InputMessageBaseE.cold
@ 0x420a487 brpc::ProcessInputMessage()
@ 0x420b333 brpc::InputMessenger::OnNewMessages()
@ 0x42b1ffe brpc::Socket::ProcessEvent()
@ 0x41bff8f bthread::TaskGroup::task_runner()
@ 0x4348771 bthread_make_fcontext
start time: Sat Apr 1 11:23:35 CST 2023

你好,看这个日志是内存爆了导致宕机重启了,可以看下相关的内存指标的监控截图吗?确认下是不是达到了内存上限。


刚才又重启了10几台

服务器的内存是128G的内存,但是开启了资源组

麻烦发下资源隔离的配置

还有这个时间段的 be.info日志

1.log (13.7 MB) 20230403-153925.xlsx (7.8 KB)

常见 Crash / BUG / 优化 查询 这这个问题,2.3的最新版本已经修了