BE节点不定期挂掉

【详述】BE节点使用过程中不定期宕机
【背景】因为宕机时间和宕机节点不固定,不太好确定到底是查询引起还是数据导入引起的
【业务影响】某个节点宕机后,会影响当前的数据导入,失败:
Message":"Commit failed. txn: 238645568 table: t_order tablet: 105784180 quorum: 1<2 errorReplicas: 120553906
【StarRocks版本】2.2.10
【集群规模】4fe(3 follower+1observer)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,56C/384G/万兆
【联系方式】社区群7-哦豁
【附件】
下面是其中一个be节点的out日志

query_id:00000000-0000-0000-0000-000000000000
*** Aborted at 1691143626 (unix time) try "date -d @1691143626" if you are using GNU date ***
PC: @          0x19e2c0c starrocks::TabletUpdates::_debug_version_info()
*** SIGSEGV (@0x0) received by PID 286842 (TID 0x7f876f269700) from PID 0; stack trace: ***
    @          0x3d0e5d2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f883c210cf0 (unknown)
    @          0x19e2c0c starrocks::TabletUpdates::_debug_version_info()
    @          0x19ecc23 starrocks::TabletUpdates::remove_expired_versions()
    @          0x19b6200 starrocks::TabletManager::start_trash_sweep()
    @          0x198620e starrocks::StorageEngine::_start_trash_sweep()
    @          0x19777c3 starrocks::StorageEngine::_garbage_sweeper_thread_callback()
    @          0x5746810 execute_native_thread_routine
    @     0x7f883c2061cf start_thread
    @     0x7f883b6e3e73 __GI___clone
    @                0x0 (unknown)

  • be crash

你发的be.out文件,没这个堆栈啊。

有的啊,我重新上传一个be.out (66.9 KB)

https://github.com/StarRocks/starrocks/pull/7731 这个问题在2.3上修复了