常见 Crash / BUG / 优化 查询

  1. 主键模型: Too many versions

BE 有下面这种日志

 failed to perform update compaction. res=Not supported: primary key type not support: NONE
  1. 查询列复用,写乱内存,导致 crash

*** Aborted at 1697185757 (unix time) try "date -d @1697185757" if you are using GNU date ***
PC: @          0x4282f24 starrocks::TabletManager::find_best_tablet_to_do_update_compaction()
*** SIGSEGV (@0x60) received by PID 117044 (TID 0x7f4bc236a700) from PID 96; stack trace: ***
    @          0x5b97b22 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f51cdfbc5d0 (unknown)
    @          0x4282f24 starrocks::TabletManager::find_best_tablet_to_do_update_compaction()
    @          0x423d536 starrocks::StorageEngine::_perform_update_compaction()
    @          0x44b857e starrocks::StorageEngine::_update_compaction_thread_callback()
    @          0x80a6480 execute_native_thread_routine
    @     0x7f51cdfb4dd5 start_thread
    @     0x7f51cd5cfead __clone
    @                0x0 (unknown)
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 2.5.0 ~ 2.5.13
    • 3.0.0 ~ 3.0.6
    • 3.1.0 ~ 3.1.3
  • 修复版本:

    • 2.5.14+
    • 3.0.7+
    • 3.1.4+
  • 问题原因:

    • 列复用导致把内存写乱了
  • 临时解决办法:

  1. Array 列上执行 delete 条件导致 crash

*** SIGSEGV (@0x0) received by PID 99369 (TID 0x7f9f304e7700) from PID 0; stack trace: ***
    @         0x13eb7c32 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f9ff71e3630 (unknown)
    @         0x10ce038a starrocks::get_type_info()
    @         0x10d852ea starrocks::SegmentIterator::_get_row_ranges_by_zone_map()
    @         0x10d7ce97 starrocks::SegmentIterator::_init()
    @         0x10d88bdf starrocks::SegmentIterator::do_get_next()
    @          0xae71c0a starrocks::ChunkIterator::get_next()
    @         0x1270cb45 starrocks::SegmentIteratorWrapper::do_get_next()
    @          0xae71c0a starrocks::ChunkIterator::get_next()
    @         0x1238aba8 starrocks::TimedChunkIterator::do_get_next()
    @          0xae71c0a starrocks::ChunkIterator::get_next()
    @         0x10fc9791 starrocks::TabletReader::do_get_next()
    @          0xae71c0a starrocks::ChunkIterator::get_next()
    @          0xcfb0500 starrocks::pipeline::OlapChunkSource::_read_chunk_from_storage()
    @          0xcfaf422 starrocks::pipeline::OlapChunkSource::_read_chunk()
    @          0xcf9cdbc starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
    @          0xbfc9bd4 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
    @          0xbfcf452 _ZSt13__invoke_implIvRZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS0_12RuntimeStateEiEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
    @          0xbfcf300 _ZSt10__invoke_rIvRZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS0_12RuntimeStateEiEUlvE_JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES8_E4typeEOS9_DpOSA_
    @          0xbfcf175 _ZNSt17_Function_handlerIFvvEZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS1_12RuntimeStateEiEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x971254a std::function<>::operator()()
    @          0xc4f3cf3 starrocks::workgroup::ScanExecutor::worker_thread()
    @          0xc4f3518 _ZZN9starrocks9workgroup12ScanExecutor10initializeEiENKUlvE_clEv
    @          0xc4f54ce _ZSt13__invoke_implIvRZN9starrocks9workgroup12ScanExecutor10initializeEiEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
    @          0xc4f509d _ZSt10__invoke_rIvRZN9starrocks9workgroup12ScanExecutor10initializeEiEUlvE_JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES6_E4typeEOS7_DpOS8_
    @          0xc4f4a3a _ZNSt17_Function_handlerIFvvEZN9starrocks9workgroup12ScanExecutor10initializeEiEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x971254a std::function<>::operator()()
    @          0xa154154 starrocks::FunctionRunnable::run()
    @          0xa150ca3 starrocks::ThreadPool::dispatch_thread()
    @          0xa16daee std::__invoke_impl<>()
    @          0xa16d5c1 std::__invoke<>()
    @          0xa16c4ee _ZNSt5_BindIFMN9starrocks10ThreadPoolEFvvEPS1_EE6__callIvJEJLm0EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
  1. Routine load 导入 decimal 类型 crash

*** Aborted at 1695409324 (unix time) try "date -d @1695409324" if you are using GNU date ***
PC: @     0x7f8dc4974387 __GI_raise
*** SIGABRT (@0x3f000021cf6) received by PID 138486 (TID 0x7f8bf307f700) from PID 138486; stack trace: ***
    @          0xba4a762 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f8dc4d1b630 (unknown)
    @     0x7f8dc4974387 __GI_raise
    @     0x7f8dc4975a78 __GI_abort
    @     0x7f8dc496d1a6 __assert_fail_base
    @     0x7f8dc496d252 __GI___assert_fail
    @          0x509278d down_cast<>()
    @          0x7f9f1e0 starrocks::vectorized::ColumnHelper::cast_to_raw<>()
    @          0xa1ea62d starrocks::vectorized::DecimalNonDecimalCast<>::decimal_from()
    @          0xa1e27b4 starrocks::vectorized::DecimalFrom<>::evaluate<>()
    @          0xa1cb195 starrocks::vectorized::UnpackConstColumnUnaryFunction<>::evaluate<>()
    @          0xa162f78 starrocks::vectorized::DealNullableColumnUnaryFunction<>::evaluate<>()
    @          0xa0859d0 starrocks::vectorized::VectorizedCastExpr<>::evaluate()
    @          0x975f31e starrocks::ExprContext::evaluate()
    @          0x975f044 starrocks::ExprContext::evaluate()
    @          0x8b8fd1f starrocks::vectorized::FileScanner::materialize()
    @          0x8048f66 starrocks::vectorized::JsonScanner::get_next()
    @          0x8020bb1 starrocks::vectorized::FileScanNode::_scanner_scan()
    @          0x8021d22 starrocks::vectorized::FileScanNode::_scanner_worker()
    @          0x802ab79 std::__invoke_impl<>()
    @          0x802a906 std::__invoke<>()
    @          0x802a7fd _ZNSt6thread8_InvokerISt5tupleIJMN9starrocks10vectorized12FileScanNodeEFviiEPS4_imEEE9_M_invokeIJLm0ELm1ELm2ELm3EEEEvSt12_Index_tupleIJXspT_EEE
    @          0x802a77e std::thread::_Invoker<>::operator()()
    @          0x802a762 std::thread::_State_impl<>::_M_run()
    @          0xd840410 execute_native_thread_routine
    @     0x7f8dc4d13ea5 start_thread
    @     0x7f8dc4a3cb0d __clone
    @                0x0 (unknown)
  1. SpillToDisk use-after-free 导致 crash

开启 Spill 有可能会出现

*** Aborted at 1698028689 (unix time) try "date -d @1698028689" if you are using GNU date ***
PC: @          0x2a6bd20 starrocks::ScopedTimer<>::~ScopedTimer()
*** SIGSEGV (@0x0) received by PID 11471 (TID 0x7fddce6b2700) from PID 0; stack trace: ***
    @          0x62f3702 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fdea56ac5f0 (unknown)
    @          0x2a6bd20 starrocks::ScopedTimer<>::~ScopedTimer()
    @          0x2d787a4 _ZNSt17_Function_handlerIFvvEZN9starrocks5spill16RawSpillerWriter5flushIRNS2_14IOTaskExecutorERNS2_23ResourceMemTrackerGuardIJSt8weak_ptrINS1_8pipeline12QueryContextEES8_INS2_7SpillerEEEEEEENS1_6StatusEPNS1_12RuntimeStateEOT_OT0_EUlvE0_E9_M_invokeERKSt9_Any_data
    @          0x2aa6881 starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x4f07072 starrocks::ThreadPool::dispatch_thread()
    @          0x4f01b6a starrocks::Thread::supervise_thread()
    @     0x7fdea56a4e65 start_thread
    @     0x7fdea4cbf88d __clone
  1. Join reorder + 窗口函数导致 crash

*** Aborted at 1697526858 (unix time) try "date -d @1697526858" if you are using GNU date ***
PC: @     0x7f0f33393387 __GI_raise
*** SIGABRT (@0x2431) received by PID 9265 (TID 0x7f0e5251a700) from PID 9265; stack trace: ***
    @          0x5960be2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f0f33e48630 (unknown)
    @     0x7f0f33393387 __GI_raise
    @     0x7f0f33394a78 __GI_abort
    @          0x2c8343e starrocks::failure_function()
    @          0x59545bd google::LogMessage::Fail()
    @          0x5956a2f google::LogMessage::SendToLog()
    @          0x595410e google::LogMessage::Flush()
    @          0x5957039 google::LogMessageFatal::~LogMessageFatal()
    @          0x4ea2dc2 _ZN9starrocks20type_dispatch_columnINS_10vectorized13ColumnBuilderEJNS_14TypeDescriptorEmEEEDaNS_13PrimitiveTypeET_DpT0_
    @          0x4ea028b starrocks::vectorized::ColumnHelper::create_column()
    @          0x51cecf0 starrocks::serde::ProtobufChunkDeserializer::deserialize()
    @          0x487e470 starrocks::DataStreamRecvr::SenderQueue::_deserialize_chunk()
    @          0x488178a starrocks::DataStreamRecvr::PipelineSenderQueue::get_chunk()
    @          0x47f9e03 starrocks::DataStreamRecvr::get_chunk_for_pipeline()
    @          0x2fb035a starrocks::pipeline::ExchangeSourceOperator::pull_chunk()
    @          0x2d1dc30 starrocks::pipeline::PipelineDriver::process()
    @          0x4f91993 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x4983a52 starrocks::ThreadPool::dispatch_thread()
    @          0x497e54a starrocks::Thread::supervise_thread()
    @     0x7f0f33e40ea5 start_thread
    @     0x7f0f3345bb0d __clone
    @                0x0 (unknown)

build chunk meta error

请问专家,这个bug只能升级版本才能再用起来嘛,现在3.0.1重启FE启动不来了

3.0.7和3.1.4已修复

  1. smp_call_function_many 占用大量CPU

而且有疑似死锁现象

  • 问题版本
    • 所有使用了 jemalloc 的版本: >= 2.4
  • 解决办法
    • 修改 be/bin/start_backend.sh 将 muzzy_decay_ms:5000,dirty_decay_ms:5000 改为 muzzy_decay_ms:30000,dirty_decay_ms:30000
    • 这个参数修改只能缓解,不能根治,需要系统研究下,再解决
    • 在Linux中关闭numa,并重启机器
  1. collect_query_statistics crash

收集 Query 执行后的一些信息时 (如ScanBytes等) crash

*** Aborted at 1698221708 (unix time) try "date -d @1698221708" if you are using GNU date ***
PC: @          0x32acb84 starrocks::pipeline::QueryContextManager::collect_query_statistics()
*** SIGSEGV (@0x38) received by PID 30662 (TID 0x7f1b00698700) from PID 56; stack trace: ***
    @          0x6641b82 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f1f016c394b os::Linux::chained_handler()
    @     0x7f1f016c8b6d JVM_handle_linux_signal
    @     0x7f1f016bc1b8 signalHandler()
    @     0x7f1f00855630 (unknown)
    @          0x32acb84 starrocks::pipeline::QueryContextManager::collect_query_statistics()
    @          0x5da60d4 starrocks::PInternalServiceImplBase<>::collect_query_statistics()
    @          0x6874ccd brpc::policy::ProcessRpcRequest()
    @          0x6955797 brpc::ProcessInputMessage()
    @          0x695666b brpc::InputMessenger::OnNewMessages()
    @          0x67fb74e brpc::Socket::ProcessEvent()
    @          0x67a061f bthread::TaskGroup::task_runner()
    @          0x68e3e91 bthread_make_fcontext
  1. Global Runtime Filter crash

*** Aborted at 1700708505 (unix time) try "date -d @1700708505" if you are using GNU date ***
PC: @     0x7fd3a38b36a6 __memcpy_ssse3_back
*** SIGSEGV (@0x1ffff0) received by PID 3045 (TID 0x7fd329f54700) from PID 2097136; stack trace: ***
    @          0x63a11c2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fd3a52a1cfb os::Linux::chained_handler()
    @     0x7fd3a52a686e JVM_handle_linux_signal
    @     0x7fd3a529aad8 signalHandler()
    @     0x7fd3a4462630 (unknown)
    @     0x7fd3a38b36a6 __memcpy_ssse3_back
    @          0x43ecae9 starrocks::JoinRuntimeFilter::serialize()
    @          0x2c41066 starrocks::RuntimeBloomFilter<>::serialize()
    @          0x43cf16d starrocks::RuntimeFilterHelper::serialize_runtime_filter()
    @          0x4e32e1b starrocks::RuntimeFilterPort::publish_runtime_filters()
    @          0x2e5b879 starrocks::pipeline::HashJoinBuildOperator::set_finishing()
    @          0x2ac99e7 starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
    @          0x2acb8fc starrocks::pipeline::PipelineDriver::process()
    @          0x56a4e5e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x4f8afa2 starrocks::ThreadPool::dispatch_thread()
    @          0x4f85a3a starrocks::Thread::supervise_thread()
    @     0x7fd3a445aea5 start_thread
    @     0x7fd3a385bb0d __clone
    @                0x0 (unknown)

*** Aborted at 1688607906 (unix time) try "date -d @1688607906" if you are using GNU date ***
PC: @     0x7fcb2a79aa19 __memmove_avx_unaligned_erms
*** SIGSEGV (@0x7f8a86400000) received by PID 10757 (TID 0x7fca8dba8700) from PID 18446744071666925568; stack trace: ***
    @          0x5824342 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fcb2b19fc20 (unknown)
    @     0x7fcb2a79aa19 __memmove_avx_unaligned_erms
    @          0x3e953d6 starrocks::vectorized::JoinRuntimeFilter::serialize()
    @          0x3e6a213 starrocks::vectorized::RuntimeBloomFilter<>::serialize()
    @          0x3e60b12 starrocks::vectorized::RuntimeFilterHelper::serialize_runtime_filter()
    @          0x47a1a20 starrocks::RuntimeFilterPort::publish_runtime_filters()
    @          0x2fd8afb starrocks::pipeline::HashJoinBuildOperator::set_finishing()
    @          0x2ca6bb9 starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
    @          0x2ca6c69 starrocks::pipeline::PipelineDriver::_mark_operator_finished()
    @          0x2ca722b starrocks::pipeline::PipelineDriver::_mark_operator_cancelled()
    @          0x2ca780a starrocks::pipeline::PipelineDriver::cancel_operators()
    @          0x4ec3e4f starrocks::pipeline::PipelineDriverPoller::run_internal()
    @          0x48be68a starrocks::Thread::supervise_thread()
    @     0x7fcb2b19517a start_thread
    @     0x7fcb2a736df3 __GI___clone
    @                0x0 (unknown)
  1. remove_expired_versions crash

*** Aborted at 1699314386 (unix time) try "date -d @1699314386" if you are using GNU date ***
PC: @          0x8046b03 std::_Rb_tree_rebalance_for_erase()
*** SIGSEGV (@0x0) received by PID 1950245 (TID 0x7fc7ae8fd700) from PID 0; stack trace: ***
    @          0x5ba1a42 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fc9499b07fb os::Linux::chained_handler()
    @     0x7fc9499b54bd JVM_handle_linux_signal
    @     0x7fc9499a7e78 signalHandler()
    @     0x7fc9486a9cf0 (unknown)
    @          0x8046b03 std::_Rb_tree_rebalance_for_erase()
    @          0x42db369 starrocks::TabletUpdates::remove_expired_versions()
    @          0x4294fa0 starrocks::TabletManager::start_trash_sweep()
    @          0x424aefb starrocks::StorageEngine::_start_trash_sweep()
    @          0x44c3620 starrocks::StorageEngine::_garbage_sweeper_thread_callback()
    @          0x80b4f20 execute_native_thread_routine
    @     0x7fc94869f1ca start_thread
    @     0x7fc94830be73 __GI___clone
    @                0x0 (unknown) 

*** Aborted at 1693450540 (unix time) try "date -d @1693450540" if you are using GNU date ***

PC: @ 0x459fa2d starrocks::ParsedPageV2::read()

*** SIGSEGV (@0xfffffffffffffffc) received by PID 1205872 (TID 0x7feaec60f700) from PID 18446744073709551612; stack trace: ***

 @ 0x5960be2 google::(anonymous namespace)::FailureSignalHandler()

 @ 0x7fec5d00ec20 (unknown)

 @ 0x459fa2d starrocks::ParsedPageV2::read()

 @ 0x4573fcb starrocks::ScalarColumnIterator::next_batch()

 @ 0x4164573 starrocks::vectorized::SegmentIterator::_read()

 @ 0x415ad8c starrocks::vectorized::SegmentIterator::_do_get_next()

 @ 0x415e380 starrocks::vectorized::SegmentIterator::do_get_next()

 @ 0x41e2ab2 starrocks::vectorized::ProjectionIterator::do_get_next()

 @ 0x4786ee5 starrocks::SegmentIteratorWrapper::do_get_next()

 @ 0x45b6723 starrocks::vectorized::TimedChunkIterator::do_get_next()

 @ 0x420b30e starrocks::vectorized::TabletReader::do_get_next()

 @ 0x2fd2dfb starrocks::pipeline::OlapChunkSource::_read_chunk_from_storage()

 @ 0x2fd34db starrocks::pipeline::OlapChunkSource::_read_chunk()

 @ 0x2fc2dcf starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()

 @ 0x2d410b4 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv

 @ 0x2d521ae starrocks::workgroup::ScanExecutor::worker_thread()

 @ 0x4983a52 starrocks::ThreadPool::dispatch_thread()

 @ 0x497e54a starrocks::Thread::supervise_thread()

 @ 0x7fec5d00417a start_thread

 @ 0x7fec5c5a5dc3 __GI___clone

 @ 0x0 (unknown)
  1. Spill crash

*** Aborted at 1701857859 (unix time) try "date -d @1701857859" if you are using GNU date ***
PC: @          0x2e10aa2 _ZNSt17_Function_handlerIFN9starrocks6StatusEvEZZNS0_8pipeline34SpillablePartitionSortSinkOperator13set_finishingEPNS0_12RuntimeStateEENKUlS6_T_E0_clISt10shared_ptrINS0_5spill14IOTaskExecutorEEEEDaS6_S7_EUlvE_E9_M_invokeERKSt9_An
y_data
*** SIGSEGV (@0x0) received by PID 2438961 (TID 0x7f37bb86d700) from PID 0; stack trace: ***
    @          0x6387fa2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f39722bacf0 (unknown)
    @          0x2e10aa2 _ZNSt17_Function_handlerIFN9starrocks6StatusEvEZZNS0_8pipeline34SpillablePartitionSortSinkOperator13set_finishingEPNS0_12RuntimeStateEENKUlS6_T_E0_clISt10shared_ptrINS0_5spill14IOTaskExecutorEEEEDaS6_S7_EUlvE_E9_M_invokeERKSt9_An
y_data
    @          0x2dde13d _ZNSt17_Function_handlerIFN9starrocks6StatusEvEZNS0_5spill7Spiller23set_flush_all_call_backINS3_23ResourceMemTrackerGuardIJSt8weak_ptrINS0_8pipeline12QueryContextEES7_IS4_EEEEEES1_RKSt8functionIS2_EPNS0_12RuntimeStateERNS3_14IOTa
skExecutorERKT_EUlvE_E9_M_invokeERKSt9_Any_data
    @          0x2e773a5 starrocks::spill::SpillerWriter::_decrease_running_flush_tasks()
    @          0x2dd1f27 _ZZZN9starrocks5spill16RawSpillerWriter5flushIRNS0_14IOTaskExecutorERNS0_23ResourceMemTrackerGuardIJSt8weak_ptrINS_8pipeline12QueryContextEES6_INS0_7SpillerEEEEEEENS_6StatusEPNS_12RuntimeStateEOT_OT0_ENKUlvE0_clEvENKUlvE1_clEv
    @          0x2dd5ad7 _ZNSt17_Function_handlerIFvvEZN9starrocks5spill16RawSpillerWriter5flushIRNS2_14IOTaskExecutorERNS2_23ResourceMemTrackerGuardIJSt8weak_ptrINS1_8pipeline12QueryContextEES8_INS2_7SpillerEEEEEEENS1_6StatusEPNS1_12RuntimeStateEOT_OT0_
EUlvE0_E9_M_invokeERKSt9_Any_data
    @          0x2b017b1 starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x4f782b2 starrocks::ThreadPool::dispatch_thread()
    @          0x4f72d4a starrocks::Thread::supervise_thread()
    @     0x7f39722b01ca start_thread
    @     0x7f3971576e73 __GI___clone
    @                0x0 (unknown)
  1. 存算分离模式下,get_tablet_stats crash

*** Aborted at 1692934579 (unix time) try "date -d @1692934579" if you are using GNU date ***
PC: @          0x95af4bd bthread_mutex_unlock
*** SIGSEGV (@0x1) received by PID 2750248 (TID 0x7fb460dff640) from PID 1; stack trace: ***
    @          0x943287a google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fb4e1842520 (unknown)
    @          0x95af4bd bthread_mutex_unlock
    @          0x848a49a _ZNSt17_Function_handlerIFvvEZN9starrocks15LakeServiceImpl16get_tablet_statsEPN6google8protobuf13RpcControllerEPKNS1_4lake17TabletStatRequestEPNS7_18TabletStatResponseEPNS4_7ClosureEEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x85b429b starrocks::ThreadPool::dispatch_thread()
    @          0x85ae62a starrocks::Thread::supervise_thread()
    @     0x7fb4e1894b43 (unknown)
    @     0x7fb4e1926a00 (unknown)
    @                0x0 (unknown)
  1. Spill crash

*** Aborted at 1702867669 (unix time) try "date -d @1702867669" if you are using GNU date ***
PC: @          0x32d6d38 starrocks::MergeCursorsCascade::init()
*** SIGSEGV (@0x0) received by PID 15893 (TID 0x7fdc0f5bf700) from PID 0; stack trace: ***
    @          0x6240182 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fde620c8630 (unknown)
    @          0x32d6d38 starrocks::MergeCursorsCascade::init()
    @          0x4f127db starrocks::CascadeChunkMerger::init()
    @          0x346548d starrocks::spill::OrderedInputStream::init()
    @          0x3465c1e starrocks::spill::BlockGroup::as_ordered_stream()
    @          0x34567f8 starrocks::spill::RawSpillerWriter::acquire_stream()
    @          0x345d2d5 starrocks::spill::Spiller::_acquire_input_stream()
    @          0x33fa3cd _ZNSt17_Function_handlerIFN9starrocks6StatusEvEZNS0_5spill7Spiller23set_flush_all_call_backERKSt8functionIS2_EPNS0_12RuntimeStateERNS3_14IOTaskExecutorERKNS3_15MemTrackerGuardEEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x3456395 starrocks::spill::SpillerWriter::_decrease_running_flush_tasks()
    @          0x345b546 starrocks::spill::RawSpillerWriter::set_flush_all_call_back()
    @          0x340e31d _ZZN9starrocks8pipeline38SpillableAggregateBlockingSinkOperator13set_finishingEPNS_12RuntimeStateEENKUlS3_T_E0_clISt10shared_ptrINS_5spill14IOTaskExecutorEEEEDaS3_S4_
    @          0x340f162 starrocks::pipeline::SpillableAggregateBlockingSinkOperator::set_finishing()
    @          0x30c6817 starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
    @          0x30c6929 starrocks::pipeline::PipelineDriver::_mark_operator_finished()
    @          0x30c6fdb starrocks::pipeline::PipelineDriver::_mark_operator_cancelled()
    @          0x30c769a starrocks::pipeline::PipelineDriver::cancel_operators()
    @          0x56fc80f starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x506b022 starrocks::ThreadPool::dispatch_thread()
    @          0x5065b1a starrocks::Thread::supervise_thread()
    @     0x7fde620c0ea5 start_thread
    @     0x7fde616dbb0d __clone
    @                0x0 (unknown)

*** Aborted at 1688627331 (unix time) try "date -d @1688627331" if you are using GNU date ***
PC: @     0x7f7b534e69d5 __GI_raise
*** SIGABRT (@0x3e900237c64) received by PID 2325604 (TID 0x7f7a69fef640) from PID 2325604; stack trace: ***
    @          0x59bd9c2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f7b5369b1d0 (unknown)
    @     0x7f7b534e69d5 __GI_raise
    @     0x7f7b534cf894 __GI_abort
    @          0x237c0d2 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
    @          0x787cad6 __cxxabiv1::__terminate()
    @          0x787cb41 std::terminate()
    @          0x787d1ff __cxa_pure_virtual
    @          0x2997fab starrocks::spill::RawSpillerWriter::acquire_stream()
    @          0x28f7603 _ZZN9starrocks5spill24PartitionedSpillerWriter5flushIRNS0_14IOTaskExecutorERNS0_23ResourceMemTrackerGuardIJSt8weak_ptrINS_8pipeline12QueryContextEEEEEEENS_6StatusEPNS_12RuntimeStateEOT_OT0_ENKUlvE0_clEv
    @          0x28f8c84 _ZNSt17_Function_handlerIFvvEZN9starrocks5spill24PartitionedSpillerWriter5flushIRNS2_14IOTaskExecutorERNS2_23ResourceMemTrackerGuardIJSt8weak_ptrINS1_8pipeline12QueryContextEEEEEEENS1_6StatusEPNS1_12RuntimeStateEOT_OT0_EUlvE0_E9_M_invokeERKSt9_Any_data
    @          0x2624271 starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x488f0f2 starrocks::ThreadPool::dispatch_thread()
    @          0x4889bea starrocks::Thread::supervise_thread()
    @     0x7f7b536903fb start_thread
    @     0x7f7b535acc23 __GI___clone
    @                0x0 (unknown)
  1. Agg Crash

*** Aborted at 1702555022 (unix time) try "date -d @1702555022" if you are using GNU date ***
PC: @          0x26ca130 starrocks::vectorized::TDistinctAggregateFunction<>::merge()
*** SIGSEGV (@0x2ad195326000) received by PID 1513 (TID 0x2ad102607700) from PID 18446744071917690880; stack trace: ***
    @          0x4875742 google::(anonymous namespace)::FailureSignalHandler()
    @     0x2ad0cd8e2630 (unknown)
    @          0x26ca130 starrocks::vectorized::TDistinctAggregateFunction<>::merge()
    @          0x2501d23 starrocks::vectorized::NullableAggregateFunctionBase<>::merge_batch()
    @          0x228c98d starrocks::Aggregator::compute_batch_agg_states()
    @          0x2232738 starrocks::pipeline::AggregateBlockingSinkOperator::push_chunk()
    @          0x1e5c71c starrocks::pipeline::PipelineDriver::process()
    @          0x3ce10ad starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x37a7915 starrocks::ThreadPool::dispatch_thread()
    @          0x37a2e3a starrocks::Thread::supervise_thread()
    @     0x2ad0cd8daea5 start_thread
    @     0x2ad0ce2fb96d __clone
    @                0x0 (unknown)
  1. HashJoin crash

*** Aborted at 1702536561 (unix time) try "date -d @1702536561" if you are using GNU date ***
PC: @          0x2d44521 starrocks::vectorized::FixedLengthColumnBase<>::serialize()
*** SIGSEGV (@0x2c126d41b000) received by PID 23691 (TID 0x2b7c47dea700) from PID 1833021440; stack trace: ***
    @          0x5bfc642 google::(anonymous namespace)::FailureSignalHandler()
    @     0x2b7a336657fb os::Linux::chained_handler()
    @     0x2b7a3366a4bd JVM_handle_linux_signal
    @     0x2b7a3365ce78 signalHandler()
    @     0x2b7a33d59630 (unknown)
    @          0x2d44521 starrocks::vectorized::FixedLengthColumnBase<>::serialize()
    @          0x2e5a7e8 starrocks::vectorized::SerializedJoinBuildFunc::_build_columns()
    @          0x2e5dbf2 starrocks::vectorized::SerializedJoinBuildFunc::construct_hash_table()
    @          0x2e5f381 starrocks::vectorized::JoinHashTable::build()
    @          0x322a187 starrocks::vectorized::HashJoiner::_build()
    @          0x322a30c starrocks::vectorized::HashJoiner::build_ht()
    @          0x30dbc8c starrocks::pipeline::HashJoinBuildOperator::set_finishing()
    @          0x2da6e59 starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
    @          0x2da6f09 starrocks::pipeline::PipelineDriver::_mark_operator_finished()
    @          0x2da74fb starrocks::pipeline::PipelineDriver::_mark_operator_cancelled()
    @          0x2da7aea starrocks::pipeline::PipelineDriver::cancel_operators()
    @          0x51d3ca7 starrocks::pipeline::PipelineDriverPoller::run_internal()
    @          0x4bb586a starrocks::Thread::supervise_thread()
    @     0x2b7a33d51ea5 start_thread
    @     0x2b7a34772b0d __clone
    @                0x0 (unknown)
  • Github Issue:
  • Github Fix PR:
  • Jira
  • 问题版本:
    • 2.3.0 ~ 2.3.18
    • 2.4.0 ~ latest
    • 2.5.0 ~ 2.5.16
    • 3.0.0 ~ 3.0.6
    • 3.1.0 ~ 3.1.3
  • 修复版本:
    • 2.3.19+
    • 2.4 未修复
    • 2.5.17+
    • 3.0.7+
    • 3.1.4+
  • 问题原因:
  • 临时解决办法:
  1. 存算分离版本 _column_index_mem_usage crash

*** Aborted at 1703239184 (unix time) try "date -d @1703239184" if you are using GNU date ***
PC: @          0x4aab274 starrocks::ColumnReader::mem_usage()
*** SIGSEGV (@0x0) received by PID 13993 (TID 0x7f570561f700) from PID 0; stack trace: ***
    @          0x6387fa2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f5793e6c630 (unknown)
    @          0x4aab274 starrocks::ColumnReader::mem_usage()
    @          0x463d07d starrocks::Segment::_column_index_mem_usage()
    @          0x47a9226 starrocks::lake::TabletManager::cache_segment()
    @          0x47a92f3 starrocks::lake::TabletManager::update_segment_cache_size()
    @          0x4641611 starrocks::Segment::open()
    @          0x47a0f18 starrocks::lake::Tablet::load_segment()
    @          0x4792f76 starrocks::lake::Rowset::load_segments()
    @          0x4794fee starrocks::lake::Rowset::read()
    @          0x47b49b5 starrocks::lake::TabletReader::get_segment_iterators()
    @          0x47b5a41 starrocks::lake::TabletReader::init_collector()
    @          0x47b720b starrocks::lake::TabletReader::open()
    @          0x55c6e84 starrocks::connector::LakeDataSource::init_tablet_reader()
    @          0x55c7844 starrocks::connector::LakeDataSource::open()
    @          0x2afb401 starrocks::pipeline::ConnectorChunkSource::_open_data_source()
    @          0x2afc4e1 starrocks::pipeline::ConnectorChunkSource::_read_chunk()
    @          0x2db0867 starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
    @          0x2af8d0b _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
    @          0x2b017b1 starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x4f782b2 starrocks::ThreadPool::dispatch_thread()
    @          0x4f72d4a starrocks::Thread::supervise_thread()
    @     0x7f5793e64ea5 start_thread
    @     0x7f5793265b0d __clone
    @                0x0 (unknown)

有pr吗

  1. array_overlap crash

*** Aborted at 1705652227 (unix time) try "date -d @1705652227" if you are using GNU date ***
PC: @          0x4dbc078 starrocks::vectorized::ArrayColumn::get()
*** SIGSEGV (@0x1010) received by PID 10923 (TID 0x7f0b95ff0700) from PID 4112; stack trace: ***
    @          0x5824342 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f0e366013c2 os::Linux::chained_handler()
    @     0x7f0e36608196 JVM_handle_linux_signal
    @     0x7f0e365fe253 signalHandler()
    @     0x7f0e35ac9630 (unknown)
    @          0x4dbc078 starrocks::vectorized::ArrayColumn::get()
    @          0x4d2e38e starrocks::vectorized::ArrayOverlap<>::_array_overlap_item<>()
    @          0x4d2ecd9 starrocks::vectorized::ArrayOverlap<>::_array_overlap<>()
    @          0x4d2f313 starrocks::vectorized::ArrayFunctions::array_overlap_varchar()
    @          0x3ddde57 starrocks::vectorized::VectorizedFunctionCallExpr::evaluate()
    @          0x37c586e starrocks::ExprContext::evaluate()
    @          0x2cde821 starrocks::eager_prune_eval_conjuncts()
    @          0x2ce05d6 starrocks::ExecNode::eval_conjuncts()
    @          0x2f5c505 starrocks::pipeline::OlapChunkSource::_read_chunk_from_storage()
    @          0x2f5c96b starrocks::pipeline::OlapChunkSource::_read_chunk()
    @          0x2f4c34c starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
    @          0x2ccaca4 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
    @          0x2cdbd5e starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x48c3b92 starrocks::ThreadPool::dispatch_thread()
    @          0x48be68a starrocks::Thread::supervise_thread()
    @     0x7f0e35ac1ea5 start_thread
    @     0x7f0e350dcb0d __clone
    @                0x0 (unknown)