【详述】be crash
【背景】select、insert
【业务影响】节点崩溃
【是否存算分离】否
【StarRocks版本】3.2.10
【集群规模】3fe+4be(fe与be混部)
【机器信息】48C/251G/万兆
【联系方式】社区群3-阿坚
【附件】
be.conf
default_rowset_type = beta
streaming_load_rpc_max_alive_time_sec=4800
tablet_writer_open_rpc_timeout_sec=480
base_compaction_check_interval_seconds = 10
cumulative_compaction_num_threads_per_disk = 2
base_compaction_num_threads_per_disk = 1
cumulative_compaction_check_interval_seconds = 10
query_mem_limit = 53687091200
load_process_max_memory_limit_percent = 70
load_process_max_memory_limit_bytes = 161061273600
upload_worker_count = 6
download_worker_count = 6
max_download_speed_kbps = 500000
第一次崩溃
第一次崩溃是因为有一张表直接全表查询没有加限制数据量很大导致be崩溃
- be.out
query_id:cfd9b47b-b6bd-11ef-b56d-dc99141a34ca, fragment_instance:cfd9b47b-b6bd-11ef-b56d-dc99141a34d5
tracker:process consumption: 71426454494
tracker:query_pool consumption: 4012378056
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 1169291174
tracker:metadata consumption: 925391162
tracker:tablet_metadata consumption: 48426555
tracker:rowset_metadata consumption: 121384980
tracker:segment_metadata consumption: 141282687
tracker:column_metadata consumption: 614296940
tracker:tablet_schema consumption: 1565715
tracker:segment_zonemap consumption: 67334634
tracker:short_key_index consumption: 66764399
tracker:column_zonemap_index consumption: 276826796
tracker:ordinal_index consumption: 179912944
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 152832
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 43795580560
tracker:update consumption: 7781679091
tracker:chunk_allocator consumption: 2049061928
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1733811182 (unix time) try “date -d @1733811182” if you are using GNU date ***
PC: @ 0x5275007 starrocks::serde::(anonymous namespace)::read_raw()
*** SIGSEGV (@0x2acaad3de000) received by PID 191800 (TID 0x2ac860826700) from PID 18446744072321097728; stack trace: ***
@ 0x6baeb82 google::(anonymous namespace)::FailureSignalHandler()
@ 0x2ac7cbf7e25a os::Linux::chained_handler()
@ 0x2ac7cbf8385e JVM_handle_linux_signal
@ 0x2ac7cbf77748 signalHandler()
@ 0x2ac7cc5d15d0 (unknown)
@ 0x5275007 starrocks::serde::(anonymous namespace)::read_raw()
@ 0x527aeb7 starrocks::ColumnVisitorMutableAdapter<>::visit()
@ 0x2d44ebc starrocks::ColumnFactory<>::accept_mutable()
@ 0x527a628 starrocks::serde::ColumnArraySerde::deserialize()
@ 0x527d53a starrocks::serde::ProtobufChunkDeserializer::deserialize()
@ 0x2f7cb10 starrocks::DataStreamRecvr::SenderQueue::_deserialize_chunk()
@ 0x2f816dd starrocks::DataStreamRecvr::PipelineSenderQueue::get_chunk()
@ 0x2f531a3 starrocks::DataStreamRecvr::get_chunk_for_pipeline()
@ 0x3980cd8 starrocks::pipeline::ExchangeSourceOperator::pull_chunk()
@ 0x3a4537d starrocks::pipeline::PipelineDriver::process()
@ 0x3a372de starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x30528ec starrocks::ThreadPool::dispatch_thread()
@ 0x304c56a starrocks::supervise_thread()
@ 0x2ac7cc5c9dd5 start_thread
@ 0x2ac7cd202ead __clone
@ 0x0 (unknown)
start time: Tue Dec 10 14:17:41 CST 2024, server uptime: 14:17:41 up 359 days, 13:47, 2 users, load average: 1023.04, 747.65, 332.02
Ignored unknown config: default_rowset_type
Ignored unknown config: routine_load_thread_pool_size
Ignored unknown config: query_mem_limit
第二次崩溃
这次崩溃是因为往一个主键模型的表中用insert一次性插入了 1亿左右的数据导致崩溃,
be.out
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 67407514828
tracker:query_pool consumption: 568643352
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 1177578244
tracker:metadata consumption: 652691175
tracker:tablet_metadata consumption: 27593696
tracker:rowset_metadata consumption: 45860856
tracker:segment_metadata consumption: 96001141
tracker:column_metadata consumption: 483235482
tracker:tablet_schema consumption: 686504
tracker:segment_zonemap consumption: 33660571
tracker:short_key_index consumption: 58949529
tracker:column_zonemap_index consumption: 246783594
tracker:ordinal_index consumption: 149189688
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 4135600
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 43855300672
tracker:update consumption: 6967090813
tracker:chunk_allocator consumption: 2148116872
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1733816953 (unix time) try “date -d @1733816953” if you are using GNU date ***
PC: @ 0x2b47a815b207 __GI_raise
*** SIGABRT (@0x7d00001cfab) received by PID 118699 (TID 0x2b48460fa700) from PID 118699; stack trace: ***
@ 0x6baeb82 google::(anonymous namespace)::FailureSignalHandler()
@ 0x2b47a75f15d0 (unknown)
@ 0x2b47a815b207 __GI_raise
@ 0x2b47a815c8f8 __GI_abort
@ 0x2b7472e _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x8f75cf6 __cxxabiv1::__terminate()
@ 0x8f75d61 std::terminate()
@ 0x8f75eb4 __cxa_throw
@ 0x2b76174 std::__throw_bad_alloc()
@ 0x2dde2fd fmt::v8::basic_memory_buffer<>::grow()
@ 0x2ddf2ba fmt::v8::detail::copy_str_noinline<>()
@ 0x2deb971 fmt::v8::detail::parse_replacement_field<>()
@ 0x2dec327 fmt::v8::detail::vformat_to<>()
@ 0x78baf08 fmt::v8::vformat()
@ 0x51d6049 starrocks::io_error()
@ 0x51d766d starrocks::PosixFileSystem::delete_file()
@ 0x542edbb starrocks::Segment::_open()
@ 0x542ef11 starrocks::Segment::open()
@ 0x542f1fd starrocks::Segment::open()
@ 0x5c62465 starrocks::Rowset::do_load()
@ 0x5c62a47 starrocks::Rowset::load()
@ 0x54f5323 starrocks::OlapMetaReader::_get_segments()
@ 0x54f552e starrocks::OlapMetaReader::_init_seg_meta_collecters()
@ 0x54f59b5 starrocks::OlapMetaReader::init()
@ 0x3c5d270 starrocks::OlapMetaScanner::init()
@ 0x39cf86e starrocks::pipeline::OlapMetaScanPrepareOperator::_prepare_scan_context()
@ 0x39ceca2 starrocks::pipeline::MetaScanPrepareOperator::prepare()
@ 0x3a42758 starrocks::pipeline::PipelineDriver::prepare()
@ 0x39818cb ZNSt17_Function_handlerIFN9starrocks6StatusERKSt10shared_ptrINS0_8pipeline14PipelineDriverEEEZNS3_16FragmentExecutor7executeEPNS0_7ExecEnvEEUlS7_E
0_E9_M_invokeERKSt9_Any_dataS7
@ 0x3a4ed59 starrocks::pipeline::FragmentContext::iterate_drivers()
@ 0x3982acc starrocks::pipeline::FragmentExecutor::execute()
@ 0x5ccf490 starrocks::PInternalServiceImplBase<>::_exec_plan_fragment_by_pipeline()
start time: Tue Dec 10 15:52:50 CST 2024, server uptime: 15:52:50 up 359 days, 14:49, 2 users, load average: 994.13, 621.80, 261.97
Ignored unknown config: default_rowset_type
Ignored unknown config: routine_load_thread_pool_size
Ignored unknown config: query_mem_limit
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/d/p2/app/StarRocks/be/lib/jni-packages/starrocks-jdbc-bridge-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/d/p2/app/StarRocks/be/lib/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
以上两次崩溃CPU负载都超高,都将近1000.