BE节点异常挂掉

【详述】问题详细描述
BE节点异常挂掉

【背景】做过哪些操作?
从kafka同步数据

【业务影响】
【StarRocks版本】例如:1.18.2
2.3.3

【集群规模】例如:3fe(1 follower+2observer)+5be(fe与be混部)
3fe + 4be

【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
阿里云ECS 8核/64G/千兆

【附件】

  • fe.warn.log/be.warn.log/相应截图
    be.out:
    start time: Sun Oct 9 10:07:57 CST 2022
    start time: Tue Oct 11 16:57:01 CST 2022
    terminate called after throwing an instance of ‘terminate called recursively
    query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
    *** Aborted at 1665572077 (unix time) try “date -d @1665572077” if you are using GNU date ***
    std::bad_alloc’
    what(): std::bad_alloc
    PC: @ 0x7f0499722387 __GI_raise
    *** SIGABRT (@0x2b9b) received by PID 11163 (TID 0x7f0396fa8700) from PID 11163; stack trace: ***
    @ 0x3ff7972 google::(anonymous namespace)::FailureSignalHandler()
    @ 0x7f049a1d7630 (unknown)
    @ 0x7f0499722387 __GI_raise
    @ 0x7f0499723a78 __GI_abort
    @ 0x59fc6b2 __gnu_cxx::__verbose_terminate_handler()
    @ 0x59fb166 __cxxabiv1::__terminate()
    @ 0x59fb1d1 std::terminate()
    @ 0x59fb324 __cxa_throw
    @ 0x18c8d44 _Znwm.cold
    @ 0x1a500f1 std::vector<>::reserve()
    @ 0x1a5019f starrocks::vectorized::BinaryColumnBase<>::_build_slices()
    @ 0x19d47f5 starrocks::vectorized::BinaryColumnBase<>::raw_data()
    @ 0x1c625a9 starrocks::ShardByLengthSliceHashIndex::try_replace()
    @ 0x1c52d49 starrocks::PrimaryIndex::try_replace()
    @ 0x1adf68e starrocks::TabletUpdates::_apply_compaction_commit()
    @ 0x1ae0fdd starrocks::TabletUpdates::do_apply()
    @ 0x2178d8d starrocks::ThreadPool::dispatch_thread()
    @ 0x217459a starrocks::thread::supervise_thread()
    @ 0x7f049a1cfea5 start_thread
    @ 0x7f04997eab0d __clone
    @ 0x0 (unknown)
    start time: Wed Oct 12 18:57:31 CST 2022

  • 慢查询:

    • Profile信息
    • 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
    • cbo是否开启:show variables like ‘%cbo%’;
    • be节点cpu和内存使用率截图

主键模型导入占用大量内存导致,当前主键模型的表,有partition吗。在内存占用比较高的时候,使用 curl -s http://BE_IP:8040/metrics | grep “^starrocks_be_.*_mem_bytes|^starrocks_be_tcmalloc_bytes_in_use” 看下BE的内存分布

已经有优化方案,可以一起测试下吗