BE 2.3.8-4739a1e 进程异常宕机

【详述】SR集群运行过程中,有BE节点异常退出
【背景】无运维操作
【业务影响】 6节点集群,有一个BE异常退出
【StarRocks版本】例如:2.3.8-4739a1e
【集群规模】例如:6fe(1 follower+2observer)+6be(fe与be混部)
【机器信息】32C/64G/千兆
【联系方式】lirulei90@126.com
【附件】

  • be.out 报错内容
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
*** Aborted at 1679398462 (unix time) try "date -d @1679398462" if you are using GNU date ***
PC: @     0x7f7cf4e68f66 jni_CallVoidMethodV
*** SIGSEGV (@0x0) received by PID 9300 (TID 0x7f7cbb791700) from PID 0; stack trace: ***
    @          0x40e8c82 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f7cf50aae9b os::Linux::chained_handler()
    @     0x7f7cf50af90c JVM_handle_linux_signal
    @     0x7f7cf50a2858 signalHandler()
    @     0x7f7cf458a630 (unknown)
    @     0x7f7cf4e68f66 jni_CallVoidMethodV
    @          0x33931e9 JNIEnv_::CallVoidMethod()
    @          0x35ca2a8 starrocks::vectorized::JDBCScanner::close()
    @          0x357825e starrocks::connector::JDBCDataSource::close()
    @          0x293ba68 starrocks::pipeline::ConnectorChunkSource::close()
    @          0x2933471 starrocks::pipeline::ScanOperator::~ScanOperator()
    @          0x2968b5a starrocks::pipeline::PipelineDriver::~PipelineDriver()
    @          0x1a166da std::_Sp_counted_base<>::_M_release()
    @          0x35dad8a std::_Sp_counted_ptr_inplace<>::_M_dispose()
    @          0x1a166da std::_Sp_counted_base<>::_M_release()
    @          0x2975809 starrocks::pipeline::QueryContext::~QueryContext()
    @          0x1a166da std::_Sp_counted_base<>::_M_release()
    @          0x205c720 starrocks::PriorityThreadPool::work_thread()
    @          0x4083e47 thread_proxy
    @     0x7f7cf4582ea5 start_thread
    @     0x7f7cf3b9d8dd __clone
    @                0x0 (unknown)
start time: Tue Mar 21 19:37:30 CST 2023

在宕机前的一段时间内,SR整体运行很慢。下面的是变慢的初始时刻的be.warn日志如下:

W0321 16:23:23.682701  2197 fragment_mgr.cpp:180] Fail to open fragment 828892d2-c7c0-11ed-8dd0-00163e24fbfc: Cancelled: Cancelled SenderQueue::get_chunk
/root/starrocks/be/src/exec/exchange_node.cpp:132 _stream_recvr->get_chunk(&_input_chunk)
/root/starrocks/be/src/exec/vectorized/hash_join_node.cpp:574 child(0)->get_next(state, &_cur_left_input_chunk, &_probe_eos)
/root/starrocks/be/src/exec/vectorized/hash_join_node.cpp:335 _probe(state, probe_timer, chunk, tmp_eos)
/root/starrocks/be/src/exec/vectorized/project_node.cpp:121 _children[0]->get_next(state, chunk, eos)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:313 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:205 _get_next_internal_vectorized(&chunk)

您好,请问在宕机前一段时间内,cpu、内存、io的监控能看下吗?