-
BE 内存泄漏 (LocalExchange)
LocalExchange 内存泄漏导致内存缓慢增长
- Github Issue:
- Github Fix PR:
- Jira
- 问题版本:
- 2.2.0 ~ 2.2.13
- 2.3.0 ~ 2.3.11
- 2.4.0 ~ 2.4.4
- 2.5.0 ~ 2.5.4
- 修复版本:
- 2.2.14+
- 2.3.12+
- 2.4.5+
- 2.5.5+
- 临时规避方法:
- 无
- 问题原因:
- 析构函数未定义成虚函数
LocalExchange 内存泄漏导致内存缓慢增长
*** Aborted at 1681439883 (unix time) try "date -d @1681439883" if you are using GNU date ***
PC: @ 0x7f87e0eb5720 __memcpy_ssse3_back
*** SIGSEGV (@0x7f83fcd49ffd) received by PID 1695 (TID 0x7f862f707700) from PID 18446744073656377341; stack trace: ***
@ 0x5769222 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f87e23749db os::Linux::chained_handler()
@ 0x7f87e23794bc JVM_handle_linux_signal
@ 0x7f87e236c378 signalHandler()
@ 0x7f87e184a630 (unknown)
@ 0x7f87e0eb5720 __memcpy_ssse3_back
@ 0x2c38092 starrocks::vectorized::BinaryColumnBase<>::append_selective()
@ 0x4d29e93 starrocks::vectorized::NullableColumn::append_selective()
@ 0x4d0d42a starrocks::vectorized::Chunk::append_selective()
@ 0x310b6ee starrocks::pipeline::LocalExchangeSourceOperator::_pull_shuffle_chunk()
@ 0x310bfc7 starrocks::pipeline::LocalExchangeSourceOperator::pull_chunk()
@ 0x2c57583 starrocks::pipeline::PipelineDriver::process()
@ 0x4e075e7 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x4812f8d starrocks::ThreadPool::dispatch_thread()
@ 0x480dd1a starrocks::Thread::supervise_thread()
@ 0x7f87e1842ea5 start_thread
@ 0x7f87e0e5db0d __clone
@ 0x0 (unknown)
(1064, 'There are multi count(distinct) function call, multi distinct rewrite error')
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
*** Aborted at 1684761824 (unix time) try "date -d @1684761824" if you are using GNU date ***
PC: @ 0x315124a starrocks::PersistentIndex::_merge_compaction()
*** SIGFPE (@0x315124a) received by PID 19356 (TID 0x7f792b578700) from PID 51712586; stack trace: ***
@ 0x4877742 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f794dda7630 (unknown)
@ 0x315124a starrocks::PersistentIndex::_merge_compaction()
@ 0x315529e starrocks::PersistentIndex::commit()
@ 0x2ed2c8e starrocks::PrimaryIndex::commit()
@ 0x2fa6786 starrocks::TabletUpdates::_apply_rowset_commit()
@ 0x2fa9023 starrocks::TabletUpdates::do_apply()
@ 0x37a9945 starrocks::ThreadPool::dispatch_thread()
@ 0x37a4d7a starrocks::Thread::supervise_thread()
@ 0x7f794dd9fea5 start_thread
@ 0x7f794d3ba8dd __clone
@ 0x0 (unknown)
这个问题,也会导致 BitmapIndex 查询结果不对, 一般命中多个 BitmapIndex 的时候容易触发
*** Aborted at 1666056468 (unix time) try "date -d @1666056468" if you are using GNU date ***
PC: @ 0x416239c run_container_andnot
*** SIGSEGV (@0x0) received by PID 38015 (TID 0x7f6c3cb49700) from PID 0; stack trace: ***
@ 0x3cf85d2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f6c99db5630 (unknown)
@ 0x416239c run_container_andnot
@ 0x4160ab9 run_run_container_andnot
@ 0x4160aef run_run_container_iandnot
@ 0x4146ad5 roaring_bitmap_andnot_inplace
@ 0x1a6b0de starrocks::vectorized::SegmentIterator::_apply_bitmap_index()
@ 0x1a6fe4a starrocks::vectorized::SegmentIterator::_init()
@ 0x1a70539 starrocks::vectorized::SegmentIterator::do_get_next()
@ 0x1acd5b2 starrocks::vectorized::ProjectionIterator::do_get_next()
@ 0x1e2dc0a starrocks::SegmentIteratorWrapper::do_get_next()
@ 0x1b0566b starrocks::vectorized::TimedChunkIterator::do_get_next()
@ 0x1afe0ce starrocks::vectorized::TabletReader::do_get_next()
@ 0x27c9c4d starrocks::pipeline::OlapChunkSource::_read_chunk_from_storage()
@ 0x27ca2d0 starrocks::pipeline::OlapChunkSource::buffer_next_batch_chunks_blocking()
@ 0x27cd573 _ZNSt17_Function_handlerIFvvEZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS1_12RuntimeStateEiEUlvE0_E9_M_invokeERKSt9_Any_data
@ 0x1e54dd0 starrocks::PriorityThreadPool::work_thread()
@ 0x3c93c07 thread_proxy
@ 0x7f6c99dadea5 start_thread
@ 0x7f6c993c8b0d __clone
@ 0x0 (unknown)
#0 0x00000000025818f6 in starrocks::vectorized::Chunk::clone_empty_with_slot (this=0x15ebf4b70, size=212) at /root/starrocks/be/src/column/chunk.cpp:188
#1 0x0000000002581dc3 in starrocks::vectorized::Chunk::clone_empty_with_slot (this=0x15ebf4b70) at /root/starrocks/be/src/column/chunk.cpp:181
#2 0x0000000002a71f10 in starrocks::pipeline::LocalExchangeSourceOperator::_pull_shuffle_chunk (this=0xa7267210, state=0x3e6b76000) at /root/starrocks/be/src/exec/pipeline/exchange/local_exchange_source_operator.cpp:112
#3 0x0000000002a72a67 in starrocks::pipeline::LocalExchangeSourceOperator::pull_chunk (this=0xa7267210, state=0x3e6b76000) at /root/starrocks/be/src/exec/pipeline/exchange/local_exchange_source_operator.cpp:75
#4 0x000000000297df33 in starrocks::pipeline::PipelineDriver::process (this=this@entry=0xaf736910, runtime_state=runtime_state@entry=0x3e6b76000, worker_id=worker_id@entry=23) at /root/starrocks/be/src/exec/pipeline/pipeline_driver.cpp:164
#5 0x000000000297462e in starrocks::pipeline::GlobalDriverExecutor::_worker_thread (this=0xa77d880) at /root/starrocks/be/src/exec/pipeline/pipeline_driver_executor.cpp:124
#6 0x00000000021da2c9 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/10.3.0/bits/std_function.h:622
#7 starrocks::FunctionRunnable::run (this=<optimized out>) at /root/starrocks/be/src/util/threadpool.cpp:44
#8 starrocks::ThreadPool::dispatch_thread (this=0xbda1500) at /root/starrocks/be/src/util/threadpool.cpp:513
#9 0x00000000021d5e7a in std::function<void ()>::operator()() const (this=0x298e4c58) at /usr/include/c++/10.3.0/bits/std_function.h:622
#10 starrocks::Thread::supervise_thread (arg=0x298e4c40) at /root/starrocks/be/src/util/thread.cpp:326
#11 0x00007fa26ee31ea5 in ?? ()
#12 0x0000000000000000 in ?? ()
也有可能结果不对
*** Aborted at 1685449309 (unix time) try "date -d @1685449309" if you are using GNU date ***
PC: @ 0x374c255 starrocks::NullableAggregateFunctionUnary<>::update_batch_selectively()
*** SIGSEGV (@0x10) received by PID 14038 (TID 0x7f3361305700) from PID 16; stack trace: ***
@ 0x6240182 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f33df4ed630 (unknown)
@ 0x374c255 starrocks::NullableAggregateFunctionUnary<>::update_batch_selectively()
@ 0x34eeb6e starrocks::Aggregator::compute_batch_agg_states_with_selection()
@ 0x311b8ab starrocks::AggregateBlockingNode::open()
@ 0x4f4fe14 starrocks::PlanFragmentExecutor::_open_internal_vectorized()
@ 0x4f5221d starrocks::PlanFragmentExecutor::open()
@ 0x4e9cb2b starrocks::FragmentExecState::execute()
@ 0x4ea3303 starrocks::FragmentMgr::exec_actual()
@ 0x506b022 starrocks::ThreadPool::dispatch_thread()
@ 0x5065b1a starrocks::Thread::supervise_thread()
@ 0x7f33df4e5ea5 start_thread
@ 0x7f33deb00b0d __clone
@ 0x0 (unknown)
这种问题的原因一般是 Segment 文件数据排序结果不对,导致通过前缀索引查询出的结果不对.
ScheamChange 修改 Key 列后,查询结果不一致。
Expression child number xxxx exceeded the maximum 10000
Perf top 看到这种现象: native_queued_spin_lock_slowpath 占用了大量 CPU
一般在核数比较多的机器,并且并发比较高的场景比较严重
*** Aborted at 1686552759 (unix time) try "date -d @1686552759" if you are using GNU date ***
PC: @ 0x33f6c80 starrocks::vectorized::RuntimeBloomFilter<>::insert()
*** SIGSEGV (@0x207fa2c000) received by PID 40541 (TID 0x7f6fd50cb700) from PID 2141372416; stack trace: ***
@ 0x3f8c022 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f707104a630 (unknown)
@ 0x33f6c80 starrocks::vectorized::RuntimeBloomFilter<>::insert()
@ 0x33eb37e starrocks::vectorized::RuntimeFilterHelper::fill_runtime_bloom_filter()
@ 0x2a3824a starrocks::pipeline::PartialRuntimeFilterMerger::merge_local_bloom_filters()
@ 0x2a349bf starrocks::pipeline::HashJoinBuildOperator::set_finishing()
@ 0x29df067 starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
@ 0x29dfc85 starrocks::pipeline::PipelineDriver::process()
@ 0x29d65be starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x2235c39 starrocks::ThreadPool::dispatch_thread()
@ 0x22317ea starrocks::Thread::supervise_thread()
@ 0x7f7071042ea5 start_thread
@ 0x7f707065d9fd __clone
@ 0x0 (unknown)
BE 启动加载 Tablet 反复 Crash
*** SIGSEGV (@0x8) received by PID 244327 (TID 0x7facab9fe700) from PID 8; stack trace: ***
@ 0x481e332 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7facdbc62630 (unknown)
@ 0x24d2f7b starrocks::Rowset::do_load()
@ 0x24d35cf starrocks::Rowset::load()
@ 0x24d3966 starrocks::Rowset::get_segment_iterators2()
@ 0x20334ec starrocks::RowsetUpdateState::_do_load()
@ 0x2034f78 _ZZSt9call_onceIZN9starrocks17RowsetUpdateState4loadEPNS0_6TabletEPNS0_6RowsetEEUlvE_JEEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv
@ 0x7facdbc5920b __pthread_once_slow
@ 0x202fd63 starrocks::RowsetUpdateState::load()
@ 0x1e6da98 starrocks::TabletUpdates::_apply_rowset_commit()
@ 0x1e73bb3 starrocks::TabletUpdates::do_apply()
@ 0x2680af5 starrocks::ThreadPool::dispatch_thread()
@ 0x267bf2a starrocks::supervise_thread()
@ 0x7facdbc5aea5 start_thread
@ 0x7facdb27596d __clone
@ 0x0 (unknown)
类似于这种: PARTITION BY date_trunc('day', dt)
*** Aborted at 1686563917 (unix time) try "date -d @1686563917" if you are using GNU date ***
PC: @ 0x30d5585 starrocks::BinaryColumnBase<>::compare_at()
*** SIGSEGV (@0x4f95) received by PID 725038 (TID 0x7f70fd996700) from PID 20373; stack trace: ***
@ 0x62de642 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f715ea9b370 (unknown)
@ 0x30d5585 starrocks::BinaryColumnBase<>::compare_at()
@ 0x56f695e starrocks::OlapTablePartitionParam::find_tablets()
@ 0x5710634 starrocks::stream_load::OlapTableSink::send_chunk()
@ 0x4fd3928 starrocks::PlanFragmentExecutor::_open_internal_vectorized()
@ 0x4fd5a2d starrocks::PlanFragmentExecutor::open()
@ 0x4f1f9bb starrocks::FragmentExecState::execute()
@ 0x4f261e3 starrocks::FragmentMgr::exec_actual()
@ 0x50ed9b2 starrocks::ThreadPool::dispatch_thread()
@ 0x50e84aa starrocks::Thread::supervise_thread()
@ 0x7f715ea93dc5 start_thread
@ 0x7f715e0b476d __clone
@ 0x0 (unknown)
starrocks_be: rdkafka_broker.c:5702: rd_kafka_broker_add_logical: Assertion `rkb && *"failed to create broker thread"' failed.
*** Aborted at 1680075003 (unix time) try "date -d @1680075003" if you are using GNU date ***
PC: @ 0x7f8d378f4207 __GI_raise
*** SIGABRT (@0x7d10000a58e) received by PID 42382 (TID 0x7f8c6ff86700) from PID 42382; stack trace: ***
@ 0x354c222 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f8d385be5d0 (unknown)
@ 0x7f8d378f4207 __GI_raise
@ 0x7f8d378f58f8 __GI_abort
@ 0x7f8d378ed026 __assert_fail_base
@ 0x7f8d378ed0d2 __GI___assert_fail
@ 0x4713d5e rd_kafka_broker_add_logical
@ 0x475a2ea rd_kafka_cgrp_new
@ 0x46fcfaf rd_kafka_new
@ 0x46e78ff RdKafka::KafkaConsumer::create()
@ 0x1cfdd14 starrocks::KafkaDataConsumer::init()
@ 0x1ca19ce starrocks::DataConsumerPool::get_consumer()
@ 0x2ec7d1a starrocks::RoutineLoadTaskExecutor::get_kafka_partition_offset()
@ 0x1d16075 starrocks::PInternalServiceImpl<>::get_info()
@ 0x36d7cee brpc::policy::ProcessRpcRequest()
@ 0x36ce757 brpc::ProcessInputMessage()
@ 0x36cf603 brpc::InputMessenger::OnNewMessages()
@ 0x377634e brpc::Socket::ProcessEvent()
@ 0x368425f bthread::TaskGroup::task_runner()
@ 0x380cc11 bthread_make_fcontext
*** Aborted at 1675922674 (unix time) try "date -d @1675922674" if you are using GNU date ***
PC: @ 0x7f9632f0d465 __memcpy_ssse3
*** SIGSEGV (@0x7f918d6fe000) received by PID 30379 (TID 0x7f95579e4700) from PID 18446744071787503616; stack trace: ***
@ 0x56ec9c2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f96343c608b os::Linux::chained_handler()
@ 0x7f96343caacd JVM_handle_linux_signal
@ 0x7f96343bdcd8 signalHandler()
@ 0x7f96338a9630 (unknown)
@ 0x7f9632f0d465 __memcpy_ssse3
@ 0x4cfd328 starrocks::stream_load::OlapTableSink::_print_varchar_error_msg()
@ 0x4cffc09 starrocks::stream_load::OlapTableSink::_validate_data()
@ 0x4d0c093 starrocks::stream_load::OlapTableSink::send_chunk()
@ 0x4d7def9 starrocks::pipeline::OlapTableSinkOperator::push_chunk()
@ 0x2c39826 starrocks::pipeline::PipelineDriver::process()
@ 0x4d8cff7 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x47a3a9d starrocks::ThreadPool::dispatch_thread()
@ 0x479e82a starrocks::Thread::supervise_thread()
@ 0x7f96338a1ea5 start_thread
@ 0x7f9632ebc9fd __clone
@ 0x0 (unknown)
[HttpServerHandler.channelRead():70] accept bad request: /api/test/f_l_c_eutrancelltdd_q/_stream_load, error: HTTP header is larger than 8192 bytes.
fe.warn.log:458:com.starrocks.http.HttpRequestException: HTTP header is larger than 8192 bytes