【详述】某个be节点在 无异常操作或大查询情况下崩溃
【背景】无
【业务影响】生产
【是否存算分离】否
【StarRocks版本】例如:3.2.6
【集群规模】例如:3fe(1 follower)+3be(fe与be分开部署)
【机器信息】CPU虚拟核/内存/网卡,例如:16C/64G/万兆
【联系方式】社区群15-沐不木
【附件】
be.out日志:
branch-3.2 RELEASE (build 8a4f5a4) query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000 tracker:process consumption: 21137044168 tracker:query_pool consumption: 0 tracker:query_pool/connector_scan consumption: 0 tracker:load consumption: 11269080 tracker:metadata consumption: 1746018921 tracker:tablet_metadata consumption: 11149858 tracker:rowset_metadata consumption: 19402576 tracker:segment_metadata consumption: 94806894 tracker:column_metadata consumption: 1620659593 tracker:tablet_schema consumption: 2338570 tracker:segment_zonemap consumption: 11087998 tracker:short_key_index consumption: 80527003 tracker:column_zonemap_index consumption: 218090009 tracker:ordinal_index consumption: 1298686560 tracker:bitmap_index consumption: 232160 tracker:bloom_filter_index consumption: 119712 tracker:compaction consumption: 795328 tracker:schema_change consumption: 0 tracker:column_pool consumption: 0 tracker:page_cache consumption: 10806469616 tracker:update consumption: 4372157393 tracker:chunk_allocator consumption: 2147882376 tracker:clone consumption: 0 tracker:consistency consumption: 0 tracker:datacache consumption: 0 tracker:replication consumption: 0 *** Aborted at 1723012295 (unix time) try "date -d @1723012295" if you are using GNU date *** PC: @ 0x4c6f44c std::_Sp_counted_ptr<>::_M_dispose() *** SIGSEGV (@0x0) received by PID 761 (TID 0x2bc1fba02700) from PID 0; stack trace: *** @ 0x6229862 google::(anonymous namespace)::FailureSignalHandler() @ 0x2baa317c1630 (unknown) @ 0x4c6f44c std::_Sp_counted_ptr<>::_M_dispose() @ 0x291746a std::_Sp_counted_base<>::_M_release() @ 0x4ed2aab starrocks::Segment::~Segment() @ 0x55bc4aa starrocks::Rowset::do_close() @ 0x4ddfae1 starrocks::Rowset::close() @ 0x55b7008 starrocks::CompactionTask::_commit_compaction() @ 0x55b64d5 starrocks::HorizontalCompactionTask::run_impl() @ 0x55b156a starrocks::CompactionTask::run() @ 0x4feb433 _ZNSt17_Function_handlerIFvvEZN9starrocks17CompactionManager9_scheduleEvEUlvE_E9_M_invokeERKSt9_Any_data @ 0x2bccd4c starrocks::ThreadPool::dispatch_thread() @ 0x2bc69ca starrocks::Thread::supervise_thread() @ 0x2baa317b9ea5 start_thread @ 0x2baa323f496d __clone @ 0x0 (unknown) start time: Wed Aug 7 15:55:45 CST 2024 Ignored unknown config: starlet_port Ignored unknown config: tc_use_memory_min
be.INFO对应时间点日志:
I0807 14:31:35.656782 29322 compaction_task.cpp:137] compaction finish. status:OK, task info:[CompactionTaskInfo] task_id:276134, tablet_id:670057, compaction score:26.3174, algorithm:HORIZONTAL_COMPACTION, state:COMPACTION_SUCCESS, compaction_type:cumulative, output_ve
rsion:[27853-28041], start_time:2024-08-07 14:31:35.645, end_time:2024-08-07 14:31:35.656, elapsed_time:11129 us, input_rowsets_size:19016, input_segments_num:5, input_rowsets_num:5, input_rows_num:3303, output_num_rows:3274, merged_rows:29, filtered_rows:0, output_segm
ents_num:1, output_rowset_size:14932, column_group_size:0, total_output_num_rows:0, total_merged_rows:0, total_del_filtered_rows:0, is_shortcut_compaction:0, is_manual_compaction:0, progress:100
I0807 14:31:35.656939 29319 compaction_task.cpp:137] compaction finish. status:OK, task info:[CompactionTaskInfo] task_id:276131, tablet_id:670012, compaction score:26.3248, algorithm:HORIZONTAL_COMPACTION, state:COMPACTION_SUCCESS, compaction_type:cumulative, output_ve
rsion:[27545-27737], start_time:2024-08-07 14:31:35.645, end_time:2024-08-07 14:31:35.656, elapsed_time:11531 us, input_rowsets_size:18412, input_segments_num:5, input_rowsets_num:5, input_rows_num:3061, output_num_rows:3034, merged_rows:27, filtered_rows:0, output_segm
ents_num:1, output_rowset_size:14307, column_group_size:0, total_output_num_rows:0, total_merged_rows:0, total_del_filtered_rows:0, is_shortcut_compaction:0, is_manual_compaction:0, progress:100
I0807 14:31:35.659538 1440 compaction_task.cpp:137] compaction finish. status:OK, task info:[CompactionTaskInfo] task_id:276138, tablet_id:670092, compaction score:26.3033, algorithm:HORIZONTAL_COMPACTION, state:COMPACTION_SUCCESS, compaction_type:cumulative, output_ve
rsion:[28458-28650], start_time:2024-08-07 14:31:35.653, end_time:2024-08-07 14:31:35.659, elapsed_time:5800 us, input_rowsets_size:19841, input_segments_num:5, input_rowsets_num:5, input_rows_num:3659, output_num_rows:3632, merged_rows:27, filtered_rows:0, output_segme
nts_num:1, output_rowset_size:15616, column_group_size:0, total_output_num_rows:0, total_merged_rows:0, total_del_filtered_rows:0, is_shortcut_compaction:0, is_manual_compaction:0, progress:100
执行 grep “Current memory statistics” be.INFO|less后查看到当时be节点内存情况如下:
监控截图: