常见 Crash / BUG / 优化 查询

  1. FE 启动失败: audit plugin 加载失败

 FE start failed: java.lang.NoClassDefFoundError: com/starrocks/plugin/audit/AuditLoaderPlugin$AuditLoaderConf
  1. FE启动失败: failed to load journal type 10081

2023-09-16 12:02:04,536 WARN (replayer|66) [GlobalStateMgr.replayJournalInner():1963] catch exception when replaying 87,com.starrocks.journal.JournalInconsistentException: failed to load journal type 10081
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:967) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:1952) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr$5.runOneCycle(GlobalStateMgr.java:1809) ~[starrocks-fe.jar:?]
        at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr$5.run(GlobalStateMgr.java:1874) ~[starrocks-fe.jar:?]Caused by: java.lang.NullPointerException
        at com.starrocks.scheduler.TaskRunBuilder.build(TaskRunBuilder.java:37) ~[starrocks-fe.jar:?]
        at com.starrocks.scheduler.TaskManager.replayCreateTaskRun(TaskManager.java:597) ~[starrocks-fe.jar:?]
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:669) ~[starrocks-fe.jar:?]
        ... 4 more
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 2.5.0 ~ 2.5.12
    • 3.0.0 ~ 3.0.6
    • 3.1.0 ~ 3.1.3
  • 修复版本:

    • 2.5.13+
    • 3.0.7+
    • 3.1.4+
  • 问题原因:

    • Task添加字段,通过反序列化出来的默认值是null
  1. BE 启动加载元数据占用内存过高

BE 启动过程中加载元数据内存过高,启动时间长,启动成功后,内存会降下来

  1. Restore 表后,3副本出现不一致

有的副本可能为空

  1. 窗口函数 Crash

*** Aborted at 1694398352 (unix time) try "date -d @1694398352" if you are using GNU date ***
PC: @     0x7f5d104cff83 __memmove_avx_unaligned_erms
*** SIGSEGV (@0x7f5ce63ff000) received by PID 221675 (TID 0x7f5a8d70c700) from PID 18446744073277534208; stack trace: ***
    @          0x5b1ba42 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f5d11412ce0 (unknown)
    @     0x7f5d104cff83 __memmove_avx_unaligned_erms
    @          0x2d41ce4 starrocks::vectorized::FixedLengthColumnBase<>::remove_first_n_values()
    @          0x3071e03 (unknown)
    @          0x3077bf0 starrocks::pipeline::LocalPartitionTopnContext::push_one_chunk_to_partitioner()
    @          0x3052899 starrocks::pipeline::LocalPartitionTopnSinkOperator::push_chunk()
    @          0x2d7bace starrocks::pipeline::PipelineDriver::process()
    @          0x51333fa starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x4b17352 starrocks::ThreadPool::dispatch_thread()
    @          0x4b11e4a starrocks::Thread::supervise_thread()
    @     0x7f5d114081cf start_thread
    @     0x7f5d10439d83 __GI___clone
    @                0x0 (unknown)
  1. Date_diff 函数 crash

*** SIGSEGV (@0x0) received by PID 135714 (TID 0x7fdd8bbfd700) from PID 0; stack trace: ***
    @          0x6e57cb2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fdee909c630 (unknown)
    @          0x5223dd2 starrocks::TimeFunctions::datediff()
    @          0x50c2de4 starrocks::VectorizedFunctionCallExpr::evaluate_checked()
    @          0x482e1d3 starrocks::ExprContext::evaluate()
    @          0x482e51f starrocks::ExprContext::evaluate()
    @          0x3e2bed4 starrocks::pipeline::ProjectOperator::push_chunk()
    @          0x3eea3e4 starrocks::pipeline::PipelineDriver::process()
    @          0x3ed892e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x5fe477a starrocks::ThreadPool::dispatch_thread()
    @          0x5fdeaca starrocks::Thread::supervise_thread()
    @     0x7fdee9094ea5 start_thread
    @     0x7fdee86afb0d __clone
    @                0x0 (unknown)
  1. Union的时候,列复用导致随机 crash

PC: @          0x1a06303 starrocks::TabletMeta::max_version()
*** SIGSEGV (@0x8) received by PID 126209 (TID 0x7f6172bc0700) from PID 8; stack trace: ***
    @          0x3db9592 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f62944dd5d0 (unknown)
    @          0x1a06303 starrocks::TabletMeta::max_version()
    @          0x19e241a starrocks::Tablet::rowset_with_max_version()
    @          0x19e5073 starrocks::Tablet::can_do_compaction()
    @          0x19f7898 starrocks::TabletManager::find_best_tablet_to_compaction()
    @          0x19d2b35 starrocks::StorageEngine::_perform_cumulative_compaction()
    @          0x19bf09d starrocks::StorageEngine::_cumulative_compaction_thread_callback()
    @          0x57f1810 execute_native_thread_routine
    @     0x7f62944d5dd5 start_thread
    @     0x7f6293af0ead __clone
    @                0x0 (unknown)
  1. BE stream load 导入死锁

admin execute on 10004 ‘System.print(ExecEnv.get_stack_trace_for_all_threads())’; 看有下面典型的堆栈: (10004是BE_ID, show backends 可以看到)

48 tids: 495840,495841,495842,495843,495844,495845,495846,495847,495848,495849,495850,495851,495852,495853,495854,495855,495856,495857,495858,495859,495860,495861,495862,495863,495864,495865,495866,495867,495868,495869,495870,495871,495872,495873,495874,495875,495876,495877,495878,495879,495880,495881,495882,495883,495884,495885,495886,495887
    0x7fddf41edaf7  syscall
         0x91db983  std::__atomic_futex_unsigned_base::_M_futex_wait_until()
         0x531efa0  starrocks::StreamLoadAction::_handle()
         0x531f4c1  starrocks::StreamLoadAction::handle()
         0x5ef5277  evhttp_handle_request
         0x5ef5f23  bufferevent_readcb
         0x5ee2662  event_process_active_single_queue
         0x5ee2d9f  event_base_loop
         0x53058c4  _ZZN9starrocks12EvHttpServer5startEvENKUlvE_clEv
         0x920b9d0  execute_native_thread_routine
    0x7fddf4455fa3  start_thread
    0x7fddf41f306f  clone
             (nil)  (unknown)
48 tids: 495214,495215,495216,495217,495218,495219,495220,495221,495222,495223,495224,495225,495226,495227,495228,495229,495230,495231,495232,495233,495234,495235,495236,495237,495238,495239,495240,495241,495242,495243,495244,495245,495246,495247,495248,495249,495250,495251,495252,495253,495254,495255,495256,495257,495258,495259,495260,495261
    0x7fddf445c00a  __pthread_cond_wait
         0x91a063c  std::condition_variable::wait()
         0x4cf9cd6  starrocks::StreamLoadPipe::read()
         0x335e25d  starrocks::vectorized::JsonReader::_read_and_parse_json()
         0x3362447  starrocks::vectorized::JsonScanner::_open_next_reader()
         0x3363cda  starrocks::vectorized::JsonScanner::get_next()
         0x53c76d1  starrocks::connector::FileDataSource::get_next()
         0x3460c45  starrocks::vectorized::ConnectorScanNode::_scanner_thread()
         0x4c7c2f0  starrocks::PriorityThreadPool::work_thread()
         0x5e29b67  thread_proxy
    0x7fddf4455fa3  start_thread
    0x7fddf41f306f  clone
             (nil)  (unknown)
120 tids: 495264,495265,495266,495267,495268,495269,495270,495271,495272,495273,495274,495275,495276,495277,495278,495279,495280,495281,495282,495283,495284,495285,495286,495287,495288,495289,495290,495291,495292,495293,495294,495295,495296,495297,495298,495299,495300,495301,495302,495303,495304,495305,495306,495307,495308,495309,495310,495311,495312,495313,495314,495315,495316,495317,495318,495319,495320,495321,495322,495323,495324,495325,495326,495327,2154605,2154606,2154607,2154608,2154610,2154671,2154964,2154965,2154966,2154967,2154969,2154984,2155017,2155039,2155069,2155070,2155080,2155081,2155082,2155083,2155084,2155085,2155086,2155087,2155088,2155113,2155125,2155128,2155130,2155131,2155132,2155133,2155148,2155153,2155154,2155335,2155336,2155349,2155354,2155365,2155404,2155417,2155432,2155451,2155461,2155462,2155485,2155486,2155503,2155509,2155521,2155576,2155639,2155712,2155713,2155733
    0x7fddf445c00a  __pthread_cond_wait
         0x91a063c  std::condition_variable::wait()
         0x3460520  starrocks::vectorized::ConnectorScanNode::get_next()
         0x4d4ad53  starrocks::PlanFragmentExecutor::_get_next_internal_vectorized()
         0x4d4b140  starrocks::PlanFragmentExecutor::_open_internal_vectorized()
         0x4d4d2dd  starrocks::PlanFragmentExecutor::open()
         0x4c9e71b  starrocks::FragmentExecState::execute()
         0x4ca4993  starrocks::FragmentMgr::exec_actual()
         0x4e43062  starrocks::ThreadPool::dispatch_thread()
         0x4e3db5a  starrocks::Thread::supervise_thread()
    0x7fddf4455fa3  start_thread
    0x7fddf41f306f  clone
             (nil)  (unknown)
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 2.5.0 ~ 2.5.12
    • 3.0.0 ~ 3.0.6
    • 3.1.0 ~ 3.1.3
  • 修复版本:

    • 2.5.13+
    • 3.0.7+
    • 3.1.4+
  • 问题原因:

  • 临时解决办法:

    • 修改be.conf, 调大这两个配置

webserver_num_workers=128 (默认48)
scanner_thread_pool_thread_num=128 (默认48)

  1. group_concat crash

*** Aborted at 1682471071 (unix time) try "date -d @1682471071" if you are using GNU date ***
PC: @          0x3534a66 starrocks::vectorized::GroupConcatAggregateFunction<>::finalize_to_column()
*** SIGSEGV (@0x7f6c017fd000) received by PID 3835823 (TID 0x7f7901aea700) from PID 25153536; stack trace: ***
    @          0x5824342 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f7a97ef44c0 (unknown)
    @          0x3534a66 starrocks::vectorized::GroupConcatAggregateFunction<>::finalize_to_column()
    @          0x3058a19 starrocks::Aggregator::_finalize_to_chunk()
    @          0x30aa4f6 starrocks::Aggregator::convert_to_chunk_no_groupby()
    @          0x2fb5690 starrocks::pipeline::AggregateBlockingSourceOperator::pull_chunk()
    @          0x2ca7d73 starrocks::pipeline::PipelineDriver::process()
    @          0x4ec2213 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x48c3b92 starrocks::ThreadPool::dispatch_thread()
    @          0x48be68a starrocks::Thread::supervise_thread()
    @     0x7f7a97ee9f1b (unknown)
    @     0x7f7a97c8c1a0 clone
    @                0x0 (unknown)
  1. Schema change 一直失败

BE 有这种日志

get base tablet rowsets error tablet
  1. group_concat crash

query_id:e501e17d-68e1-11ee-9050-005056aadc5e, fragment_instance:e501e17d-68e1-11ee-9050-005056aadc65
*** Aborted at 1697102991 (unix time) try "date -d @1697102991" if you are using GNU date ***
PC: @          0x2cc9888 starrocks::vectorized::GroupConcatAggregateFunction<>::convert_to_serialize_format()
*** SIGSEGV (@0x2d0) received by PID 2437363 (TID 0x7f4e25ef5700) from PID 720; stack trace: ***
    @          0x3f973c2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f4eacc12ce0 (unknown)
    @          0x2cc9888 starrocks::vectorized::GroupConcatAggregateFunction<>::convert_to_serialize_format()
    @          0x2d69266 starrocks::vectorized::NullableAggregateFunctionVariadic<>::convert_to_serialize_format()
    @          0x2a624d0 starrocks::Aggregator::output_chunk_by_streaming()
    @          0x29fd67f starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_auto()
    @          0x2a0415d starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk()
    @          0x29e45a7 starrocks::pipeline::PipelineDriver::process()
    @          0x29da46e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x22395b9 starrocks::ThreadPool::dispatch_thread()
    @          0x223516a starrocks::Thread::supervise_thread()
    @     0x7f4eacc081cf start_thread
    @     0x7f4eabc39d83 __GI___clone
    @                0x0 (unknown)
  1. 主键模型: Too many versions

BE 有下面这种日志

 failed to perform update compaction. res=Not supported: primary key type not support: NONE
  1. 查询列复用,写乱内存,导致 crash

*** Aborted at 1697185757 (unix time) try "date -d @1697185757" if you are using GNU date ***
PC: @          0x4282f24 starrocks::TabletManager::find_best_tablet_to_do_update_compaction()
*** SIGSEGV (@0x60) received by PID 117044 (TID 0x7f4bc236a700) from PID 96; stack trace: ***
    @          0x5b97b22 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f51cdfbc5d0 (unknown)
    @          0x4282f24 starrocks::TabletManager::find_best_tablet_to_do_update_compaction()
    @          0x423d536 starrocks::StorageEngine::_perform_update_compaction()
    @          0x44b857e starrocks::StorageEngine::_update_compaction_thread_callback()
    @          0x80a6480 execute_native_thread_routine
    @     0x7f51cdfb4dd5 start_thread
    @     0x7f51cd5cfead __clone
    @                0x0 (unknown)
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 2.5.0 ~ 2.5.13
    • 3.0.0 ~ 3.0.6
    • 3.1.0 ~ 3.1.3
  • 修复版本:

    • 2.5.14+
    • 3.0.7+
    • 3.1.4+
  • 问题原因:

    • 列复用导致把内存写乱了
  • 临时解决办法:

  1. Array 列上执行 delete 条件导致 crash

*** SIGSEGV (@0x0) received by PID 99369 (TID 0x7f9f304e7700) from PID 0; stack trace: ***
    @         0x13eb7c32 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f9ff71e3630 (unknown)
    @         0x10ce038a starrocks::get_type_info()
    @         0x10d852ea starrocks::SegmentIterator::_get_row_ranges_by_zone_map()
    @         0x10d7ce97 starrocks::SegmentIterator::_init()
    @         0x10d88bdf starrocks::SegmentIterator::do_get_next()
    @          0xae71c0a starrocks::ChunkIterator::get_next()
    @         0x1270cb45 starrocks::SegmentIteratorWrapper::do_get_next()
    @          0xae71c0a starrocks::ChunkIterator::get_next()
    @         0x1238aba8 starrocks::TimedChunkIterator::do_get_next()
    @          0xae71c0a starrocks::ChunkIterator::get_next()
    @         0x10fc9791 starrocks::TabletReader::do_get_next()
    @          0xae71c0a starrocks::ChunkIterator::get_next()
    @          0xcfb0500 starrocks::pipeline::OlapChunkSource::_read_chunk_from_storage()
    @          0xcfaf422 starrocks::pipeline::OlapChunkSource::_read_chunk()
    @          0xcf9cdbc starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
    @          0xbfc9bd4 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
    @          0xbfcf452 _ZSt13__invoke_implIvRZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS0_12RuntimeStateEiEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
    @          0xbfcf300 _ZSt10__invoke_rIvRZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS0_12RuntimeStateEiEUlvE_JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES8_E4typeEOS9_DpOSA_
    @          0xbfcf175 _ZNSt17_Function_handlerIFvvEZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS1_12RuntimeStateEiEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x971254a std::function<>::operator()()
    @          0xc4f3cf3 starrocks::workgroup::ScanExecutor::worker_thread()
    @          0xc4f3518 _ZZN9starrocks9workgroup12ScanExecutor10initializeEiENKUlvE_clEv
    @          0xc4f54ce _ZSt13__invoke_implIvRZN9starrocks9workgroup12ScanExecutor10initializeEiEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
    @          0xc4f509d _ZSt10__invoke_rIvRZN9starrocks9workgroup12ScanExecutor10initializeEiEUlvE_JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES6_E4typeEOS7_DpOS8_
    @          0xc4f4a3a _ZNSt17_Function_handlerIFvvEZN9starrocks9workgroup12ScanExecutor10initializeEiEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x971254a std::function<>::operator()()
    @          0xa154154 starrocks::FunctionRunnable::run()
    @          0xa150ca3 starrocks::ThreadPool::dispatch_thread()
    @          0xa16daee std::__invoke_impl<>()
    @          0xa16d5c1 std::__invoke<>()
    @          0xa16c4ee _ZNSt5_BindIFMN9starrocks10ThreadPoolEFvvEPS1_EE6__callIvJEJLm0EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
  1. Routine load 导入 decimal 类型 crash

*** Aborted at 1695409324 (unix time) try "date -d @1695409324" if you are using GNU date ***
PC: @     0x7f8dc4974387 __GI_raise
*** SIGABRT (@0x3f000021cf6) received by PID 138486 (TID 0x7f8bf307f700) from PID 138486; stack trace: ***
    @          0xba4a762 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f8dc4d1b630 (unknown)
    @     0x7f8dc4974387 __GI_raise
    @     0x7f8dc4975a78 __GI_abort
    @     0x7f8dc496d1a6 __assert_fail_base
    @     0x7f8dc496d252 __GI___assert_fail
    @          0x509278d down_cast<>()
    @          0x7f9f1e0 starrocks::vectorized::ColumnHelper::cast_to_raw<>()
    @          0xa1ea62d starrocks::vectorized::DecimalNonDecimalCast<>::decimal_from()
    @          0xa1e27b4 starrocks::vectorized::DecimalFrom<>::evaluate<>()
    @          0xa1cb195 starrocks::vectorized::UnpackConstColumnUnaryFunction<>::evaluate<>()
    @          0xa162f78 starrocks::vectorized::DealNullableColumnUnaryFunction<>::evaluate<>()
    @          0xa0859d0 starrocks::vectorized::VectorizedCastExpr<>::evaluate()
    @          0x975f31e starrocks::ExprContext::evaluate()
    @          0x975f044 starrocks::ExprContext::evaluate()
    @          0x8b8fd1f starrocks::vectorized::FileScanner::materialize()
    @          0x8048f66 starrocks::vectorized::JsonScanner::get_next()
    @          0x8020bb1 starrocks::vectorized::FileScanNode::_scanner_scan()
    @          0x8021d22 starrocks::vectorized::FileScanNode::_scanner_worker()
    @          0x802ab79 std::__invoke_impl<>()
    @          0x802a906 std::__invoke<>()
    @          0x802a7fd _ZNSt6thread8_InvokerISt5tupleIJMN9starrocks10vectorized12FileScanNodeEFviiEPS4_imEEE9_M_invokeIJLm0ELm1ELm2ELm3EEEEvSt12_Index_tupleIJXspT_EEE
    @          0x802a77e std::thread::_Invoker<>::operator()()
    @          0x802a762 std::thread::_State_impl<>::_M_run()
    @          0xd840410 execute_native_thread_routine
    @     0x7f8dc4d13ea5 start_thread
    @     0x7f8dc4a3cb0d __clone
    @                0x0 (unknown)
  1. SpillToDisk use-after-free 导致 crash

开启 Spill 有可能会出现

*** Aborted at 1698028689 (unix time) try "date -d @1698028689" if you are using GNU date ***
PC: @          0x2a6bd20 starrocks::ScopedTimer<>::~ScopedTimer()
*** SIGSEGV (@0x0) received by PID 11471 (TID 0x7fddce6b2700) from PID 0; stack trace: ***
    @          0x62f3702 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fdea56ac5f0 (unknown)
    @          0x2a6bd20 starrocks::ScopedTimer<>::~ScopedTimer()
    @          0x2d787a4 _ZNSt17_Function_handlerIFvvEZN9starrocks5spill16RawSpillerWriter5flushIRNS2_14IOTaskExecutorERNS2_23ResourceMemTrackerGuardIJSt8weak_ptrINS1_8pipeline12QueryContextEES8_INS2_7SpillerEEEEEEENS1_6StatusEPNS1_12RuntimeStateEOT_OT0_EUlvE0_E9_M_invokeERKSt9_Any_data
    @          0x2aa6881 starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x4f07072 starrocks::ThreadPool::dispatch_thread()
    @          0x4f01b6a starrocks::Thread::supervise_thread()
    @     0x7fdea56a4e65 start_thread
    @     0x7fdea4cbf88d __clone
  1. Join reorder + 窗口函数导致 crash

*** Aborted at 1697526858 (unix time) try "date -d @1697526858" if you are using GNU date ***
PC: @     0x7f0f33393387 __GI_raise
*** SIGABRT (@0x2431) received by PID 9265 (TID 0x7f0e5251a700) from PID 9265; stack trace: ***
    @          0x5960be2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f0f33e48630 (unknown)
    @     0x7f0f33393387 __GI_raise
    @     0x7f0f33394a78 __GI_abort
    @          0x2c8343e starrocks::failure_function()
    @          0x59545bd google::LogMessage::Fail()
    @          0x5956a2f google::LogMessage::SendToLog()
    @          0x595410e google::LogMessage::Flush()
    @          0x5957039 google::LogMessageFatal::~LogMessageFatal()
    @          0x4ea2dc2 _ZN9starrocks20type_dispatch_columnINS_10vectorized13ColumnBuilderEJNS_14TypeDescriptorEmEEEDaNS_13PrimitiveTypeET_DpT0_
    @          0x4ea028b starrocks::vectorized::ColumnHelper::create_column()
    @          0x51cecf0 starrocks::serde::ProtobufChunkDeserializer::deserialize()
    @          0x487e470 starrocks::DataStreamRecvr::SenderQueue::_deserialize_chunk()
    @          0x488178a starrocks::DataStreamRecvr::PipelineSenderQueue::get_chunk()
    @          0x47f9e03 starrocks::DataStreamRecvr::get_chunk_for_pipeline()
    @          0x2fb035a starrocks::pipeline::ExchangeSourceOperator::pull_chunk()
    @          0x2d1dc30 starrocks::pipeline::PipelineDriver::process()
    @          0x4f91993 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x4983a52 starrocks::ThreadPool::dispatch_thread()
    @          0x497e54a starrocks::Thread::supervise_thread()
    @     0x7f0f33e40ea5 start_thread
    @     0x7f0f3345bb0d __clone
    @                0x0 (unknown)

build chunk meta error

请问专家,这个bug只能升级版本才能再用起来嘛,现在3.0.1重启FE启动不来了

3.0.7和3.1.4已修复

  1. smp_call_function_many 占用大量CPU

而且有疑似死锁现象

  • 问题版本
    • 所有使用了 jemalloc 的版本: >= 2.4
  • 解决办法
    • 修改 be/bin/start_backend.sh 将 muzzy_decay_ms:5000,dirty_decay_ms:5000 改为 muzzy_decay_ms:30000,dirty_decay_ms:30000
    • 这个参数修改只能缓解,不能根治,需要系统研究下,再解决
    • 在Linux中关闭numa,并重启机器