be偶尔挂掉

【详述】be 偶尔就要挂掉,看不懂原因
【背景】后台没有啥大sql,运行比较稳定的时候突然挂掉,所以很不解。
【业务影响】无
【是否存算分离】存算一体的架构
【StarRocks版本】例如:3.0.8
【集群规模】例如:3fe(1 follower+2observer)+9be
【机器信息】CPU虚拟核/内存/网卡,例如:16C/64G/万兆
【联系方式】342554099@qq.com
【附件】
be-warnning和be.INFO 均未出现明显报错,be直接宕掉,由于配置了守护进程,所以没事,但是搞不懂为啥

  • be.out有相关信息:
    3.0.8 RELEASE (build 1975985)
    query_id:00000000-0000-0000-0000-000000000000, fragment_instance:9e7fdcfd-2a4e-11ef-993b-005056040111
    tracker:process consumption: 21008361320
    tracker:query_pool consumption: -9912560
    tracker:load consumption: 0
    tracker:metadata consumption: 762479505
    tracker:tablet_metadata consumption: 52378193
    tracker:rowset_metadata consumption: 9665930
    tracker:segment_metadata consumption: 91510292
    tracker:column_metadata consumption: 608925090
    tracker:tablet_schema consumption: 2210361
    tracker:segment_zonemap consumption: 24579826
    tracker:short_key_index consumption: 64138582
    tracker:column_zonemap_index consumption: 231842682
    tracker:ordinal_index consumption: 321167544
    tracker:bitmap_index consumption: 871152
    tracker:bloom_filter_index consumption: 0
    tracker:compaction consumption: 354963928
    tracker:schema_change consumption: 0
    tracker:column_pool consumption: 1415622940
    tracker:page_cache consumption: 13632563216
    tracker:update consumption: 2294549291
    tracker:chunk_allocator consumption: 2148256520
    tracker:clone consumption: 0
    tracker:consistency consumption: 0
    *** Aborted at 1718370262 (unix time) try “date -d @1718370262” if you are using GNU date ***
    PC: @ 0x0 (unknown)
    *** SIGSEGV (@0x0) received by PID 36261 (TID 0x7fefc9a58700) from PID 0; stack trace: ***
    @ 0x66c7da2 google::(anonymous namespace)::FailureSignalHandler()
    @ 0x7ff0c1534630 (unknown)
    @ 0x0 (unknown)
    start time: Fri Jun 14 21:05:02 CST 2024

有保留coredump吗

我在这里回复一下,我找鳄总问了,这算是一个已知的3.0.8的bug,需要升级至3.1.X

还是接着这个帖子吧,今天更新到了3.1.13,be还是有偶尔挂掉的问题,抓了下be.out的日志:
start time: Wed May 22 13:24:02 CST 2024
start time: Mon Jun 24 11:00:23 CST 2024
3.0.8 RELEASE (build 1975985)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:719be4c5-3202-11ef-b147-005056040103
tracker:process consumption: 17748302568
tracker:query_pool consumption: -3126272
tracker:load consumption: 0
tracker:metadata consumption: 391799380
tracker:tablet_metadata consumption: 46552678
tracker:rowset_metadata consumption: 8490107
tracker:segment_metadata consumption: 37869337
tracker:column_metadata consumption: 298887258
tracker:tablet_schema consumption: 2173486
tracker:segment_zonemap consumption: 13836774
tracker:short_key_index consumption: 22451566
tracker:column_zonemap_index consumption: 116633754
tracker:ordinal_index consumption: 156377408
tracker:bitmap_index consumption: 378592
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 1341698504
tracker:page_cache consumption: 12834929808
tracker:update consumption: 513823227
tracker:chunk_allocator consumption: 2143042600
tracker:clone consumption: 0
tracker:consistency consumption: 0
*** Aborted at 1719217154 (unix time) try “date -d @1719217154” if you are using GNU date ***
PC: @ 0x537e130 starrocks::DataStreamRecvr::PipelineSenderQueue::add_chunks<>()
*** SIGSEGV (@0x0) received by PID 113135 (TID 0x7f6a03eb4700) from PID 0; stack trace: ***
@ 0x66c7da2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f6a87f3b630 (unknown)
@ 0x537e130 starrocks::DataStreamRecvr::PipelineSenderQueue::add_chunks<>()
@ 0x5375f92 starrocks::DataStreamRecvr::PipelineSenderQueue::add_chunks()
@ 0x52e812b starrocks::DataStreamRecvr::add_chunks()
@ 0x528794f starrocks::DataStreamMgr::transmit_chunk()
@ 0x5e20f7c starrocks::PInternalServiceImplBase<>::_transmit_chunk()
@ 0x52ad820 starrocks::PriorityThreadPool::work_thread()
@ 0x6687767 thread_proxy
@ 0x7f6a87f33ea5 start_thread
@ 0x7f6a8754eb0d __clone
@ 0x0 (unknown)
start time: Mon Jun 24 16:20:02 CST 2024
start time: Wed Jul 24 09:02:10 CST 2024, server uptime: 09:02:10 up 98 days, 17:35, 2 users, load average: 2.04, 3.10, 2.80
3.1.13 RELEASE (build d9d3ed7)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 18314355096
tracker:query_pool consumption: 0
tracker:load consumption: 0
tracker:metadata consumption: 542778581
tracker:tablet_metadata consumption: 66889316
tracker:rowset_metadata consumption: 20726815
tracker:segment_metadata consumption: 42360424
tracker:column_metadata consumption: 412802026
tracker:tablet_schema consumption: 2406308
tracker:segment_zonemap consumption: 20827719
tracker:short_key_index consumption: 19678193
tracker:column_zonemap_index consumption: 187585786
tracker:ordinal_index consumption: 153217824
tracker:bitmap_index consumption: 364496
tracker:bloom_filter_index consumption: 2317056
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 1493731043
tracker:page_cache consumption: 10345647280
tracker:update consumption: 2091434685
tracker:chunk_allocator consumption: 2147671816
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1721805172 (unix time) try “date -d @1721805172” if you are using GNU date ***
PC: @ 0x515eb82 starrocks::ImmutableIndex::_read_page()
*** SIGSEGV (@0x7fca481fe034) received by PID 63641 (TID 0x7fc68f9fd700) from PID 1210048564; stack trace: ***
@ 0x653d562 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fca4f7ed630 (unknown)
@ 0x515eb82 starrocks::ImmutableIndex::_read_page()
@ 0x5173108 starrocks::ImmutableIndex::_get_in_shard_by_page()
@ 0x5176baa starrocks::ImmutableIndex::_get_in_shard()
@ 0x5177276 starrocks::ImmutableIndex::get()
@ 0x5177e3d starrocks::PersistentIndex::_get_from_immutable_index()
@ 0x51847cc starrocks::PersistentIndex::upsert()
@ 0x4dd1bb5 starrocks::PrimaryIndex::_upsert_into_persistent_index()
@ 0x4dd1f26 starrocks::PrimaryIndex::upsert()
@ 0x4ed9b48 starrocks::TabletUpdates::_do_update()
@ 0x4eea0d6 starrocks::TabletUpdates::_apply_normal_rowset_commit()
@ 0x4eecd46 starrocks::TabletUpdates::_apply_rowset_commit()
@ 0x4eed096 starrocks::TabletUpdates::do_apply()
@ 0x2d0d8ad starrocks::ThreadPool::dispatch_thread()
@ 0x2d072fa starrocks::thread::supervise_thread()
@ 0x7fca4f7e5ea5 start_thread
@ 0x7fca4ebe6b0d __clone
@ 0x0 (unknown)
start time: Wed Jul 24 15:13:02 CST 2024, server uptime: 15:13:02 up 98 days, 23:46, 0 users, load average: 1.33, 2.43, 2.70

@trueeyu 远程呼叫大佬有空看看

补充一下,大佬的帖子已标注此问题

3.1.14+ 尝试解决此问题。