常见 Crash / BUG / 优化 查询

  1. 主键模型 Persistent Index l0 文件反复读写导致磁盘 IO 高

  1. roaring2range 占用大量 CPU 导致主键模型并发性能问题

                         |          |                                                                  --51.02%--starrocks::vectorized::SegmentIterator::_init
                          |          |                                                                            |          
                          |          |                                                                            |--48.93%--starrocks::vectorized::SegmentIterator::_apply_del_vector
                          |          |                                                                            |          |          
                          |          |                                                                            |          |--47.45%--starrocks::vectorized::roaring2range
                          |          |                                                                            |          |          |          
                          |          |                                                                            |          |          |--20.68%--roaring_read_uint32_iterator
                          |          |                                                                            |          |          |          
                          |          |                                                                            |          |          |--4.62%--starrocks::vectorized::SparseRange::add
                          |          |                                                                            |          |          |          
                          |          |                                                                            |          |           --1.68%--std::vector<starrocks::vectorized::Range, std::allocator<starrocks::vectorized::Range> >::_M_realloc_insert<starrocks::vectorized::Range const&>
                          |          |                                                                            |          |          
                          |          |                                                                            |           --1.36%--starrocks::vectorized::SparseRange::add


这个PR可以优化这个问题,但还不能彻底解决。

  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 2.3.0 ~ latest

    • 2.4.0 ~ latest

    • 2.5.0 ~ latest

    • 3.0.0 ~ latest

    • 3.1.0 ~ 3.1.11

    • 3.2.0 ~ 3.2.6

  • 修复版本:

    • 2.3 未修复

    • 2.4 未修复

    • 2.5 未修复

    • 3.0 未修复

    • 3.1.12+

    • 3.2.7+

  • 问题原因:

  • 临时解决办法:

  1. Grouping sets crash

*** Aborted at 1705967445 (unix time) try “date -d @1705967445” if you are using GNU date ***
PC: @ 0x2d57320 starrocks::vectorized::FixedLengthColumnBase<>::append_selective()
*** SIGSEGV (@0x1000) received by PID 14905 (TID 0x7ffb0cafb700) from PID 4096; stack trace: ***
@ 0x5b97b22 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7ffbd8c54630 (unknown)
@ 0x2d57320 starrocks::vectorized::FixedLengthColumnBase<>::append_selective()
@ 0x50cb458 starrocks::vectorized::NullableColumn::append_selective()
@ 0x50ae6ca starrocks::vectorized::Chunk::append_selective()
@ 0x324dfbe starrocks::pipeline::LocalExchangeSourceOperator::_pull_shuffle_chunk()
@ 0x324e897 starrocks::pipeline::LocalExchangeSourceOperator::pull_chunk()
@ 0x2d906c0 starrocks::pipeline::PipelineDriver::process()
@ 0x51add6a starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x4b968f2 starrocks::ThreadPool::dispatch_thread()
@ 0x4b9138a starrocks:
:supervise_thread()
@ 0x7ffbd8c4cea5 start_thread
@ 0x7ffbd8267b0d __clone
@ 0x0 (unknown)
  1. ThreadResourceMgr 锁导致 BE CPU压不上去,并发性能不行

现像:

  • CPU使用率低

  • 锁冲突严重

  • 并发性能差

#0  0x00007f09c61f675d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f09c61efa79 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x0000000001ea9242 in __gthread_mutex_lock (__mutex=0x7f09c373d688) at /usr/include/c++/10.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:749
#3  std::mutex::lock (this=0x7f09c373d688) at /usr/include/c++/10.3.0/bits/std_mutex.h:100
#4  std::unique_lock<std::mutex>::lock (this=0x7ef689befd70) at /usr/include/c++/10.3.0/bits/unique_lock.h:138
#5  std::unique_lock<std::mutex>::unique_lock (__m=..., this=0x7ef689befd70) at /usr/include/c++/10.3.0/bits/unique_lock.h:68
#6  starrocks::ThreadResourceMgr::unregister_pool (this=0x7f09c373d680, pool=0x7f05bcd373a0) at /root/starrocks/be/src/runtime/thread_resource_mgr.cpp:96
#7  0x0000000001f1c07e in starrocks::RuntimeState::~RuntimeState (this=0x7efa59c5ac10, __in_chrg=<optimized out>) at /root/starrocks/be/src/runtime/exec_env.h:141
#8  0x0000000001ead272 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7efa59c5ac00) at /usr/include/c++/10.3.0/ext/atomicity.h:70
#9  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7efa59c5ac00) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:151
#10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:733
#11 std::__shared_ptr<starrocks::RuntimeState, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:1183
#12 std::shared_ptr<starrocks::RuntimeState>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/shared_ptr.h:121
#13 starrocks::FragmentExecState::~FragmentExecState (this=<optimized out>, __in_chrg=<optimized out>) at /root/starrocks/be/src/runtime/fragment_mgr.cpp:170
#14 0x0000000001eb67eb in std::_Sp_counted_ptr<starrocks::FragmentExecState*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:379
#15 0x000000000192690a in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7ef1aeb59840) at /usr/include/c++/10.3.0/ext/atomicity.h:70
#16 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7ef1aeb59840) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:151
#17 0x0000000001eae775 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7efa585fa010, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/std_function.h:245
#18 std::__shared_ptr<starrocks::FragmentExecState, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7efa585fa008, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:1183
#19 std::shared_ptr<starrocks::FragmentExecState>::~shared_ptr (this=0x7efa585fa008, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/shared_ptr.h:121
#20 ~<lambda> (this=0x7efa585fa000, __in_chrg=<optimized out>) at /root/starrocks/be/src/runtime/fragment_mgr.cpp:438
#21 std::_Function_base::_Base_manager<starrocks::FragmentMgr::exec_plan_fragment(const starrocks::TExecPlanFragmentParams&, const StartSuccCallback&, const FinishCallback&)::<lambda()> >::_M_destroy (__victim=...) at /usr/include/c++/10.3.0/bits/std_function.h:176
#22 std::_Function_base::_Base_manager<starrocks::FragmentMgr::exec_plan_fragment(const starrocks::TExecPlanFragmentParams&, const StartSuccCallback&, const FinishCallback&)::<lambda()> >::_M_manager (__op=<optimized out>, __source=..., __dest=...) at /usr/include/c++/10.3.0/bits/std_function.h:200
#23 std::_Function_handler<void(), starrocks::FragmentMgr::exec_plan_fragment(const starrocks::TExecPlanFragmentParams&, const StartSuccCallback&, const FinishCallback&)::<lambda()> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>) at /usr/include/c++/10.3.0/bits/std_function.h:283
#24 0x0000000001ff7692 in std::_Function_base::~_Function_base (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/std_function.h:245
#25 std::function<void ()>::~function() (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/std_function.h:303
#26 starrocks::FunctionRunnable::~FunctionRunnable (this=<optimized out>, __in_chrg=<optimized out>) at /root/starrocks/be/src/util/threadpool.cpp:41
#27 0x0000000001ff7192 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7efa585fa040) at /root/starrocks/be/src/util/threadpool.cpp:471
#28 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7efa585fa040) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:151
#29 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:733
#30 std::__shared_ptr<starrocks::Runnable, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:1183
#31 std::__shared_ptr<starrocks::Runnable, (__gnu_cxx::_Lock_policy)2>::reset (this=<synthetic pointer>) at /usr/include/c++/10.3.0/bits/shared_ptr_base.h:1301
#32 starrocks::ThreadPool::dispatch_thread (this=0x7f09c4003c00) at /root/starrocks/be/src/util/threadpool.cpp:522
#33 0x0000000001ff298a in std::function<void ()>::operator()() const (this=0x7efe056fd8d8) at /usr/include/c++/10.3.0/bits/std_function.h:248
#34 starrocks::Thread::supervise_thread (arg=0x7efe056fd8c0) at /root/starrocks/be/src/util/thread.cpp:327
#35 0x00007f09c61ed17a in start_thread () from /lib64/libpthread.so.0
#36 0x00007f09c578edf3 in clone () from /lib64/libc.so.6
  1. 主键模型表 sort key 中有重复列,导致 BE Crash

如这种:

CREATE TABLE orders2 (
    order_id bigint NOT NULL,
    dt date NOT NULL,
    merchant_id int NOT NULL,
    user_id int NOT NULL,
    good_id int NOT NULL,
    good_name string NOT NULL,
    price int NOT NULL,
    cnt int NOT NULL,
    revenue int NOT NULL,
    state tinyint NOT NULL
)
PRIMARY KEY (order_id,dt,merchant_id)
PARTITION BY date_trunc('day', dt)
DISTRIBUTED BY HASH (merchant_id)
ORDER BY (dt,merchant_id,dt) //dt是重复的
PROPERTIES (
    "enable_persistent_index" = "true"
);
*** Aborted at 1710928318 (unix time) try "date -d @1710928318" if you are using GNU date ***
PC: @          0x58f1556 _ZZN9starrocksL17prepare_ops_datasERKNS_6SchemaERKSt6vectorIjSaIjEERKNS_5ChunkEPS3_IPFvPKviPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEESaISL_EEPS3_ISC_SaISC_EEENUlSC_iSJ_E6_4_FUNESC_iSJ_
*** SIGSEGV (@0x8) received by PID 36293 (TID 0x701034335640) from PID 8; stack trace: ***
    @          0x7cd4b2a google::(anonymous namespace)::FailureSignalHandler()
    @     0x70112d442520 (unknown)
    @          0x58f1556 _ZZN9starrocksL17prepare_ops_datasERKNS_6SchemaERKSt6vectorIjSaIjEERKNS_5ChunkEPS3_IPFvPKviPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEESaISL_EEPS3_ISC_SaISC_EEENUlSC_iSJ_E6_4_FUNESC_iSJ_
    @          0x58f365d starrocks::PrimaryKeyEncoder::encode_sort_key()
    @          0x5acf518 starrocks::MergeEntry<>::next()
    @          0x5ad2999 starrocks::RowsetMergerImpl<>::_do_merge_horizontally()
    @          0x5ad44fc starrocks::RowsetMergerImpl<>::_do_merge_vertically()
    @          0x5ad651c starrocks::RowsetMergerImpl<>::do_merge()
    @          0x5ac894f starrocks::compaction_merge_rowsets()
    @          0x3ebfcfc starrocks::TabletUpdates::_do_compaction()
    @          0x3ec13a6 starrocks::TabletUpdates::compaction()
    @          0x3c95451 starrocks::StorageEngine::_perform_update_compaction()
    @          0x3cb4607 starrocks::StorageEngine::_update_compaction_thread_callback()
    @          0xa34af34 execute_native_thread_routine
    @     0x70112d494ac3 (unknown)
    @     0x70112d526850 (unknown)
    @                0x0 (unknown)
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 2.5.0 ~ 2.5.20

    • 3.0.0 ~ latest

    • 3.1.0 ~ 3.1.10

    • 3.2.0 ~ 3.2.6

  • 修复版本:

    • 2.5.21+

    • 3.0 未修复

    • 3.1.11+

    • 3.2.7+

  • 问题原因:

  • 解决办法:

    • Drop table force 清除掉有问题的表,如果BE启动失败,用meta_tool清除掉有问题的Tablet,并升级,只升级不能解决问题,需要清除有问题的Tablet后再升级。
  1. 跨集群数据同步导致 BE crash

*** Aborted at 1716360239 (unix time) try "date -d @1716360239" if you are using GNU date ***
PC: @          0x5072261 starrocks::ReplicationUtils::calc_column_unique_id_map<>()
*** SIGSEGV (@0x18) received by PID 186073 (TID 0x2af0f7202700) from PID 24; stack trace: ***
    @          0x67749a2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x2aeeec3c2630 (unknown)
    @          0x5072261 starrocks::ReplicationUtils::calc_column_unique_id_map<>()
    @          0x506d87a starrocks::ReplicationTxnManager::replicate_remote_snapshot()
    @          0x506e168 starrocks::ReplicationTxnManager::replicate_snapshot()
    @          0x341f4d0 starrocks::run_replicate_snapshot_task()
    @          0x2e79d7c starrocks::ThreadPool::dispatch_thread()
    @          0x2e739fa starrocks::Thread::supervise_thread()
    @     0x2aeeec3baea5 start_thread
    @     0x2aeeecff596d __clone
    @                0x0 (unknown)
  1. str_to_jodatime crash

select str_to_jodatime('2014-12-21 12:34:56', 'yyyy-MM-dd HH:mm:ss');
** Aborted at 1703578849 (unix time) try "date -d @1703578849" if you are using GNU date ***
PC: @          0x4aac80a _ZNSt17_Function_handlerIFbvEZN9starrocks4joda10JodaFormat7prepareESt17basic_string_viewIcSt11char_traitsIcEEEUlvE11_E9_M_invokeERKSt9_Any_data
*** SIGSEGV (@0x0) received by PID 32502 (TID 0x7f0104fda700) from PID 0; stack trace: ***
    @          0x5e5a102 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f020d5bb7ab os::Linux::chained_handler()
    @     0x7f020d5c028c JVM_handle_linux_signal
    @     0x7f020d5b3148 signalHandler()
    @     0x7f020ca916d0 (unknown)
    @          0x4aac80a _ZNSt17_Function_handlerIFbvEZN9starrocks4joda10JodaFormat7prepareESt17basic_string_viewIcSt11char_traitsIcEEEUlvE11_E9_M_invokeERKSt9_Any_data
    @          0x4aaf57a starrocks::joda::JodaFormat::parse()
    @          0x55096d2 starrocks::TimeFunctions::parse_jodatime()
    @          0x408dfb4 starrocks::VectorizedFunctionCallExpr::evaluate_checked()
    @          0x37d5c93 starrocks::ExprContext::evaluate()
    @          0x37d5fdf starrocks::ExprContext::evaluate()
    @          0x2b0e864 starrocks::pipeline::ProjectOperator::push_chunk()
    @          0x281ea8c starrocks::pipeline::PipelineDriver::process()
    @          0x53ced3e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x4cc33d2 starrocks::ThreadPool::dispatch_thread()
    @          0x4cbde6a starrocks::Thread::supervise_thread()
    @     0x7f020ca89e25 start_thread
    @     0x7f020be8cbad __clone
    @                0x0 (unknown)
  1. Group by 中有 const 值,查询结果跳变或 crash

*** Aborted at 1717004982 (unix time) try "date -d @1717004982" if you are using GNU date ***
PC: @          0x3d0b5fc starrocks::AggregateFunctionBatchHelper<>::merge_batch()
*** SIGSEGV (@0x7f8158d15000) received by PID 698674 (TID 0x7f810c387640) from PID 1490112512; stack trace: ***
    @          0x5ee6742 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f81b0454df0 (unknown)
    @          0x3d0b5fc starrocks::AggregateFunctionBatchHelper<>::merge_batch()
    @          0x350338c starrocks::Aggregator::compute_batch_agg_states()
    @          0x3409e6d starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_force_preaggregation()
    @          0x340bc1f starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk()
    @          0x33d9378 starrocks::pipeline::PipelineDriver::process()
    @          0x33ca1ee starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x2a28a1a starrocks::ThreadPool::dispatch_thread()
    @          0x2a2348a starrocks::Thread::supervise_thread()
    @     0x7f81b049f802 start_thread
    @     0x7f81b043f450 __clone3
    @                0x0 (unknown)
  1. 表只有一个Tablet(分桶)的时候,聚合结果不对

  1. Hive catalog 查询 Crash

*** Aborted at 1717678507 (unix time) try “date -d @1717678507” if you are using GNU date ***
PC: @ 0x6fa6714 starrocks::CSVReader::buff_capacity()
*** SIGSEGV (@0x98) received by PID 128 (TID 0x7f1faf12b640) from PID 152; stack trace: ***
@ 0x9854bba google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f204629e520 (unknown)
@ 0x6fa6714 starrocks::CSVReader::buff_capacity()
@ 0x6fa0896 starrocks::HdfsTextScanner::estimated_mem_usage()
@ 0x72d5872 starrocks::pipeline::ConnectorChunkSource::close()
@ 0x5255034 starrocks::pipeline::ScanOperator::_close_chunk_source_unlocked()
@ 0x5253d4f starrocks::pipeline::ScanOperator::_finish_chunk_source_task()
@ 0x5258fde ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlRT_E_clINS_9workgroup12YieldContextEEEDaS5.constprop.0
@ 0x543a72b starrocks::workgroup::ScanExecutor::worker_thread()
@ 0x82d2d7c starrocks::ThreadPool::dispatch_thread()
@ 0x82cc22a starrocks::supervise_thread()
@ 0x7f20462f0ac3 (unknown)
@ 0x7f2046381a04 clone
@ 0x0 (unknown)
  1. 主键模型 Compaction crash

PC: @          0x540c550 starrocks::ShardByLengthMutableIndex::~ShardByLengthMutableIndex()
*** SIGSEGV (@0x0) received by PID 160027 (TID 0x2c6508e61700) from PID 0; stack trace: ***
    @          0x67c3642 google::(anonymous namespace)::FailureSignalHandler()
    @     0x2b0a3c181acf os::Linux::chained_handler()
    @     0x2b0a3c187938 JVM_handle_linux_signal
    @     0x2b0a3c179338 signalHandler()
    @     0x2b0a3cae7630 (unknown)
    @          0x540c550 starrocks::ShardByLengthMutableIndex::~ShardByLengthMutableIndex()
    @          0x53a13f3 starrocks::PersistentIndex::_reload()
    @          0x53a7acb starrocks::PersistentIndex::major_compaction()
    @          0x500424b starrocks::PrimaryIndex::major_compaction()
    @          0x511683d starrocks::TabletUpdates::pk_index_major_compaction()
    @          0x5352272 starrocks::PkIndexMajorCompactionTask::run()
    @          0x2e7aadd starrocks::ThreadPool::dispatch_thread()
    @          0x2e744fa starrocks::Thread::supervise_thread()
    @     0x2b0a3cadfea5 start_thread
    @     0x2b0a3d71a96d __clone
    @                0x0 (unknown)
  1. 主键模型导入 crash

*** Aborted at 1716988014 (unix time) try "date -d @1716988014" if you are using GNU date ***
PC: @          0x2c45d7b starrocks::BinaryColumnBase<>::append_selective()
*** SIGSEGV (@0x7feb929f2ffc) received by PID 118163 (TID 0x7fea91cf9700) from PID 18446744071874490364; stack trace: ***
    @          0x67749a2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7feabbade630 (unknown)
    @          0x2c45d7b starrocks::BinaryColumnBase<>::append_selective()
    @          0x343bb6a starrocks::Chunk::append_selective()
    @          0x5259033 starrocks::MemTable::_split_upserts_deletes()
    @          0x525a640 starrocks::MemTable::finalize()
    @          0x52501ae starrocks::DeltaWriter::flush_memtable_async()
    @          0x5250bd3 starrocks::DeltaWriter::close()
    @          0x51cc109 starrocks::AsyncDeltaWriter::_execute()
    @          0x68d76dc bthread::ExecutionQueueBase::_execute()
    @          0x68d8458 bthread::ExecutionQueueBase::_execute_tasks()
    @          0x2e79d7c starrocks::ThreadPool::dispatch_thread()
    @          0x2e739fa starrocks::Thread::supervise_thread()
    @     0x7feabbad6ea5 start_thread
    @     0x7feabaed7b0d __clone
    @                0x0 (unknown)
  1. Iceberg 表查询 crash

tracker:replication consumption: 0
*** Aborted at 1718109279 (unix time) try "date -d @1718109279" if you are using GNU date ***
PC: @          0x6d29ccb starrocks::parquet::ScalarColumnReader::fill_dst_column()
*** SIGSEGV (@0x0) received by PID 548886 (TID 0x7fede17fe640) from PID 0; stack trace: ***
    @          0x9c8aa1a google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fefe8d8825a os::Linux::chained_handler()
    @     0x7fefe8d8d6f2 JVM_handle_linux_signal
    @     0x7fefe8d81748 signalHandler()
    @     0x7fefe7d06520 (unknown)
    @          0x6d29ccb starrocks::parquet::ScalarColumnReader::fill_dst_column()
    @          0x6d2297a starrocks::parquet::GroupReader::_fill_dst_chunk()
    @          0x6d23114 starrocks::parquet::GroupReader::get_next()
    @          0x6cf4830 starrocks::parquet::FileReader::get_next()
    @          0x6b30c0f starrocks::HdfsParquetScanner::do_get_next()
    @          0x6b2086b starrocks::HdfsScanner::get_next()
    @          0x6aadb56 starrocks::connector::HiveDataSource::get_next()
    @          0x3f351ff starrocks::pipeline::ConnectorChunkSource::_read_chunk()
    @          0x4286d86 starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
    @          0x3f25588 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlRT_E_clINS_9workgroup12YieldContextEEEDaS5_.constprop.0
    @          0x4047c7b starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x34f960c starrocks::ThreadPool::dispatch_thread()
    @          0x34f37ea starrocks::Thread::supervise_thread()
    @     0x7fefe7d58ac3 (unknown)
    @     0x7fefe7dea850 (unknown)
    @                0x0 (unknown)
  1. Text 文件查询 crash

*** Aborted at 1717678507 (unix time) try “date -d @1717678507” if you are using GNU date ***
PC: @ 0x6fa6714 starrocks::CSVReader::buff_capacity()
*** SIGSEGV (@0x98) received by PID 128 (TID 0x7f1faf12b640) from PID 152; stack trace: ***
@ 0x9854bba google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f204629e520 (unknown)
@ 0x6fa6714 starrocks::CSVReader::buff_capacity()
@ 0x6fa0896 starrocks::HdfsTextScanner::estimated_mem_usage()
@ 0x72d5872 starrocks::pipeline::ConnectorChunkSource::close()
@ 0x5255034 starrocks::pipeline::ScanOperator::_close_chunk_source_unlocked()
@ 0x5253d4f starrocks::pipeline::ScanOperator::_finish_chunk_source_task()
@ 0x5258fde ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlRT_E_clINS_9workgroup12YieldContextEEEDaS5.constprop.0
@ 0x543a72b starrocks::workgroup::ScanExecutor::worker_thread()
@ 0x82d2d7c starrocks::ThreadPool::dispatch_thread()
@ 0x82cc22a starrocks::supervise_thread()
@ 0x7f20462f0ac3 (unknown)
@ 0x7f2046381a04 clone
@ 0x0 (unknown)
(END)
  1. couldn’t found dict cid:

 couldn't found dict cid:23 
  1. 窗口函数 crash

*** Aborted at 1718873562 (unix time) try "date -d @1718873562" if you are using GNU date ***
PC: @          0x32370a3 starrocks::ColumnCompare::do_visit()
*** SIGSEGV (@0x0) received by PID 27779 (TID 0x7f95b7fa8640) from PID 0; stack trace: ***
    @          0x5ee6742 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f9656d6b7fb os::Linux::chained_handler()
    @     0x7f9656d7035c JVM_handle_linux_signal
    @     0x7f9656d62e78 signalHandler()
    @     0x7f9656054df0 (unknown)
    @          0x32370a3 starrocks::ColumnCompare::do_visit()
    @          0x3237246 starrocks::ColumnVisitorAdapter<>::visit()
    @          0x2facfcc starrocks::ColumnFactory<>::accept()
    @          0x3230642 starrocks::compare_column()
    @          0x3230757 starrocks::compare_columns()
    @          0x31983c8 starrocks::get_compare_results_colwise()
    @          0x3198658 starrocks::DataSegment::get_filter_array()
    @          0x31dde6b starrocks::ChunksSorterTopn::_filter_and_sort_data()
    @          0x31dfb2a starrocks::ChunksSorterTopn::_sort_chunks()
    @          0x31dff3b starrocks::ChunksSorterTopn::update()
    @          0x339e8b1 (unknown)
    @          0x33b431b starrocks::pipeline::LocalPartitionTopnContext::transfer_all_chunks_from_partitioner_to_sorters()
    @          0x3395025 starrocks::pipeline::LocalPartitionTopnSinkOperator::set_finishing()
    @          0x33d688b starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
    @          0x33d6999 starrocks::pipeline::PipelineDriver::_mark_operator_finished()
    @          0x33d6ffb starrocks::pipeline::PipelineDriver::_mark_operator_cancelled()
    @          0x33d762a starrocks::pipeline::PipelineDriver::cancel_operators()
    @          0x33ca506 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x2a28a1a starrocks::ThreadPool::dispatch_thread()
    @          0x2a2348a starrocks::Thread::supervise_thread()
    @     0x7f965609f802 start_thread
    @     0x7f965603f450 __clone3
    @                0x0 (unknown)
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 2.5.0 ~ latest

    • 3.0.0 ~ latest

    • 3.1.0 ~ 3.1.13

    • 3.2.0 ~ 3.2.8

    • 3.3.0

  • 修复版本:

    • 2.5 未修复

    • 3.0 未修复

    • 3.1.14+

    • 3.2.9+

    • 3.3.1+

  • 问题原因:

  • 临进解决办法:

  1. 主键模型写入 crash

*** Aborted at 1720652163 (unix time) try "date -d @1720652163" if you are using GNU date ***
PC: @          0x515ebd2 starrocks::ImmutableIndex::_read_page()
*** SIGSEGV (@0x7f8d3df760b8) received by PID 383772 (TID 0x7f6fdb9dc700) from PID 1039622328; stack trace: ***
    @          0x653d642 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f90873acc17 os::Linux::chained_handler()
    @     0x7f90873b4565 JVM_handle_linux_signal
    @     0x7f90873a97b3 signalHandler()
    @     0x7f908684a630 (unknown)
    @          0x515ebd2 starrocks::ImmutableIndex::_read_page()
    @          0x5173158 starrocks::ImmutableIndex::_get_in_shard_by_page()
    @          0x5176bfa starrocks::ImmutableIndex::_get_in_shard()
    @          0x51772c6 starrocks::ImmutableIndex::get()
    @          0x5177c25 starrocks::PersistentIndex::_get_from_immutable_index()
    @          0x518481c starrocks::PersistentIndex::upsert()
    @          0x4dd1c05 starrocks::PrimaryIndex::_upsert_into_persistent_index()
    @          0x4dd1f76 starrocks::PrimaryIndex::upsert()
    @          0x4ed9b98 starrocks::TabletUpdates::_do_update()
    @          0x4eea126 starrocks::TabletUpdates::_apply_normal_rowset_commit()
    @          0x4eecd96 starrocks::TabletUpdates::_apply_rowset_commit()
    @          0x4eed0e6 starrocks::TabletUpdates::do_apply()
    @          0x2d0d8ed starrocks::ThreadPool::dispatch_thread()
    @          0x2d0733a starrocks::Thread::supervise_thread()
    @     0x7f9086842ea5 start_thread
    @     0x7f9085c43b0d __clone
    @                0x0 (unknown)
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 3.1.0 ~ 3.1.13

    • 3.2.0 ~ 3.2.9

    • 3.3.0

  • 修复版本:

    • 3.1.14+

    • 3.2.10+

    • 3.3.1+

  • 问题原因:

  • 临进解决办法:

    • be.conf enable_pindex_read_by_page=true
  1. Metatadata 内存统计为负数

  1. 磁盘满导致主键模型索引异常

 Fail to apply staros://275836/log/000000000004357C_00000000008C2B30.log: Internal error: prepare_primary_index: load primary index failed: Already exist: FixedMutableIndex<16> insert found duplicate key 800000003B9B13F080000017F816EE46, new(rssid=5610 rowid=2), old(rssid=5609 rowid=0)
load primary index failed: Already exist: FixedMutableIndex<16> insert found duplicate key 800000003B9ACF8D80000000411AB34E, new(rssid=11649 rowid=0), old(rssid=11645 rowid=0)
  1. 使用迁移工具迁移数据后,读Decimal的表 Crash

*** Aborted at 1720971684 (unix time) try “date -d @1720971684” if you are using GNU date ***
PC: @ 0x2c39b4d starrocks::DecimalV3Column<>::put_mysql_row_buffer()
*** SIGSEGV (@0x8) received by PID 6303 (TID 0x2b3d6a763700) from PID 8; stack trace: ***
@ 0x67283e2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x2b3d38f77cab os::Linux::chained_handler()
@ 0x2b3d38f7c59c JVM_handle_linux_signal
@ 0x2b3d38f6f8f8 signalHandler()
@ 0x2b3d3965b5d0 (unknown)
@ 0x2c39b4d starrocks::DecimalV3Column<>::put_mysql_row_buffer()
@ 0x5860704 starrocks::MysqlResultWriter::process_chunk()
@ 0x37a1e0d starrocks::pipeline::ResultSinkOperator::push_chunk()
@ 0x3842949 starrocks::pipeline::PipelineDriver::process()
@ 0x383378e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x2e493aa starrocks::ThreadPool::dispatch_thread()
@ 0x2e43e0a starrocks:
:supervise_thread()
@ 0x2b3d39653dd5 start_thread
@ 0x2b3d3a28d02d __clone
@ 0x0 (unknown)