OlapScanNode use-after-free
*** Aborted at 1666617333 (unix time) try "date -d @1666617333" if you are using GNU date ***
PC: @ 0x2b5243b starrocks::ExprContext::close()
*** SIGSEGV (@0x60) received by PID 302338 (TID 0x7f58702a0700) from PID 96; stack trace: ***
@ 0x3cf55d2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f5921984630 (unknown)
@ 0x2b5243b starrocks::ExprContext::close()
@ 0x2b535c0 starrocks::Expr::close()
@ 0x27467e6 starrocks::vectorized::TabletScanner::close()
@ 0x2746e78 starrocks::vectorized::TabletScanner::~TabletScanner()
@ 0x246bbd7 _ZZN9starrocks10ObjectPool3addINS_10vectorized13TabletScannerEEEPT_S5_ENUlPvE_4_FUNES6_
@ 0x246b3ff starrocks::vectorized::OlapScanNode::~OlapScanNode()
@ 0x246ba22 starrocks::vectorized::OlapScanNode::~OlapScanNode()
@ 0x1ed7787 std::_Sp_counted_ptr<>::_M_dispose()
@ 0x18e7fda std::_Sp_counted_base<>::_M_release()
@ 0x1ed3e42 starrocks::RuntimeState::~RuntimeState()
@ 0x1e65a22 starrocks::FragmentExecState::~FragmentExecState()
@ 0x1e6edab std::_Sp_counted_ptr<>::_M_dispose()
@ 0x18e7fda std::_Sp_counted_base<>::_M_release()
@ 0x1e66db5
或堆栈打不全
*** SIGSEGV (@0x0) received by PID 3104 (TID 0x7f51b752b700) from PID 0; stack trace: ***
@ 0x3ff7972 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f522347c630 (unknown)
@ 0x0 (unknown)
- Github Issue: https://github.com/StarRocks/starrocks/issues/11395
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/11396
- 问题版本:
- 2.1.0 ~ 2.1.13
- 2.2.0 ~ 2.2.8
- 2.3.0 ~ 2.3.3
- 2.4.0
- 修复版本:
- 2.1.14+
- 2.2.9+
- 2.3.4+
- 2.4.1+
- 临时规避方法:
- 无
- 问题原因:
- 析构顺序问题
array_agg crash
*** Aborted at 1670508783 (unix time) try "date -d @1670508783" if you are using GNU date ***
PC: @ 0x28f91b5 starrocks::vectorized::NullableAggregateFunctionUnary<>::update_batch_selectively()
*** SIGSEGV (@0xa0) received by PID 4440 (TID 0x7f9a6a737700) from PID 160; stack trace: ***
@ 0x3ca37d2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f9ad08a3630 (unknown)
@ 0x28f91b5 starrocks::vectorized::NullableAggregateFunctionUnary<>::update_batch_selectively()
@ 0x26c9cd9 starrocks::Aggregator::compute_batch_agg_states_with_selection()
@ 0x263df4f starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_auto()
@ 0x26446cd starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk()
@ 0x2627d5d starrocks::pipeline::PipelineDriver::process()
@ 0x261da11 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x1f5a169 starrocks::ThreadPool::dispatch_thread()
@ 0x1f55d1a starrocks::Thread::supervise_thread()
@ 0x7f9ad089bea5 start_thread
@ 0x7f9acfeb6b0d __clone
@ 0x0 (unknown)
- Github Issue: https://github.com/StarRocks/starrocks/issues/12073
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/12074
- 问题版本:
- 2.2.0 ~ 2.2.7
- 2.3.0 ~ 2.3.4
- 修复版本:
- 2.2.8+
- 2.3.5+
- 临时规避方法:
- 不使用array_agg
- 问题原因:
- 见issue描述
主键模型使用 Persistent index 导入 crash
也有可能启动占用大量内存
W0912 16:42:51.123860 2268472 mem_hook.cpp:254] large memory alloc: 103079215105 bytes, stack:
@ 0x4a360eb malloc
@ 0x7faf765 operator new()
@ 0x41ed62d std::__cxx11::basic_string<>::_M_mutate()
@ 0x4458b11 std::__cxx11::basic_string<>::resize()
@ 0x44f3f97 starrocks::SliceMutableIndex::load_snapshot()
@ 0x4448f76 starrocks::ShardByLengthMutableIndex::load_snapshot()
@ 0x4459795 starrocks::ShardByLengthMutableIndex::load()
@ 0x44669dd starrocks::PersistentIndex::_load()
@ 0x4467a3f starrocks::PersistentIndex::load()
@ 0x446e6c3 starrocks::PersistentIndex::load_from_tablet()
@ 0x4180f92 starrocks::PrimaryIndex::_do_load()
@ 0x41824bf starrocks::PrimaryIndex::load()
@ 0x426265e starrocks:: TabletUpdates::_apply_rowset_commit()
@ 0x4266353 starrocks::TabletUpdates::do_apply()
@ 0x4b17465 starrocks::ThreadPool::dispatch_thread()
@ 0x4b11e4a starrocks::Thread::supervise_thread()
@ 0x7f759126c609 start_thread
@ 0x7f7591030133 clone
@ (nil) (unknown)
或
start time:2022年11月08日屋期二07:23:17csT
terminate called after throwing an instance of 'std:bad_alloc'
what(): std::bad_alloc
query_id:00000000-0000-0000-0000-000000000000,fragment_instance:00000000-0000-0000-0000-000000000000
**Aborted at 1667863398 (unix time)try "date -d @1667863398"if you are using GNU date **
PC:
0x7fd3c5943207 GI_raise
**SIGABRT (@Ox5abc)received by PID 23228 (TID 0x7fd3515fd700)from PID 23228;stack trace:**
0x481e332 google::(anonymous namespace)::FailuresignalHandler()
0x7fd3c63f75d0 (unknown)
0x7fd3c5943207 GI_raise
0x7fd3c59448f8 GI_abort
0x1c4acef_ZN9_gnu_cxx27_verbose_terminate_handlerEv.cold
0x62a0af6 _cxxabivl:_terminate()
0x62a0b61 std::terminate()
0x62a0cb4 __cxa_throw
0x1c4abf6 _Znwm.cold
0x22bab2b starrocks::FixedMutableIndex::load_snapshot()
0x229e9e6 starrocks::shardByLengthMutableIndex::load()
0x22aa9bc starrocks::PersistentIndex::_load()
0x22abe77 starrocks:PersistentIndex::load()
0x22ad821 starrocks:PersistentIndex::load_from_tablet()
0x1ff713c starrocks:PrimaryIndex:_do_load()
0x1ff7edf starrocks::PrimaryIndex::load()
0x1e6df30 starrocks:Tabletupdates::_apply_rowset_commit()
0x1e73bb3 starrocks::Tabletupdates::do_apply()
0x2680af5 starrocks:ThreadPool:dispatch_thread()
0x267bf2a starrocks:Thread:supervise_thread()
0x7fd3c63efdd5 start_thread
0x7fd3c5a0aead clone
- Github Issue: https://github.com/StarRocks/starrocks/issues/14844
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/14819
- 问题版本:
- 2.4.0 ~ 2.4.1
- 修复版本:
- 2.4.2+
- 临时规避方法:
- 无
- 问题原因:
- 开启vlog,找到有问题的Tablet,并使用meta_tool.sh删除这个Tablet
我看这个pr cherrypick到2.3的分支失败了,然后就close了,是有什么特殊原因吗?
机器升级规格后,FE记录的 Cpu Cores 信息不对
Show backens; CpuCores还是记录的原来的32核,实际是64核
Version: 2.2.8-3edb7b7
Status: {"lastSuccessReportTabletsTime":"2022-12-11 12:00:45"}
DataTotalCapacity: 4.584 TB
DataUsedPct: 20.59 %
CpuCores: 32
- Github Issue: https://github.com/StarRocks/starrocks/issues/12188
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/12187
- 问题版本:
- 2.1.0 ~ latest
- 2.2.0 ~ 2.2.8
- 2.3.0 ~ 2.3.3
- 修复版本:
- 2.2.9+
- 2.3.4+
- 临时规避方法:
- 无
- 问题原因:
- BE Cpu 核数增加后, FE 元数据未更新
Join + bitmap crash
terminate called after throwing an instance of 'std::runtime_error'
what(): failed memory alloc in constructor
*** Aborted at 1670985104 (unix time) try "date -d @1670985104" if you are using GNU date ***
PC: @ 0x7fae92edf387 __GI_raise
*** SIGABRT (@0x3dc0001022a) received by PID 66090 (TID 0x7fae04ae5700) from PID 66090; stack trace: ***
@ 0x3fa3ad2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fae93994630 (unknown)
@ 0x7fae92edf387 __GI_raise
@ 0x7fae92ee0a78 __GI_abort
@ 0x188857d _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x59a72a6 __cxxabiv1::__terminate()
@ 0x59a7311 std::terminate()
@ 0x59a74b6 __cxa_rethrow
@ 0x16067b0 _ZNSt8_Rb_treeIjSt4pairIKj7RoaringESt10_Select1stIS3_ESt4lessIjESaIS3_EE7_M_copyINS9_11_Alloc_nodeEEEPSt13_Rb_tree_nodeIS3_EPKSD_PSt18_Rb_tree_node_baseRT_.isra.0.cold
@ 0x209078b starrocks::BitmapValue::BitmapValue()
@ 0x24e859b starrocks::vectorized::ObjectColumn<>::append()
@ 0x24e88b5 starrocks::vectorized::ObjectColumn<>::append_selective()
@ 0x26918f8 starrocks::vectorized::JoinHashMap<>::_copy_build_column()
@ 0x2692099 starrocks::vectorized::JoinHashMap<>::_build_output()
@ 0x26e6289 starrocks::vectorized::JoinHashMap<>::probe()
@ 0x2673478 starrocks::vectorized::JoinHashTable::probe()
@ 0x26646a8 starrocks::vectorized::HashJoinNode::_probe()
@ 0x2665839 starrocks::vectorized::HashJoinNode::get_next()
@ 0x274bd91 starrocks::vectorized::ProjectNode::get_next()
@ 0x2051f23 starrocks::PlanFragmentExecutor::_get_next_internal_vectorized()
@ 0x205375e starrocks::PlanFragmentExecutor::_open_internal_vectorized()
@ 0x2054347 starrocks::PlanFragmentExecutor::open()
@ 0x1fd8fab starrocks::FragmentExecState::execute()
@ 0x1fdd5dc starrocks::FragmentMgr::exec_actual()
@ 0x1fdde81 _ZNSt17_Function_handlerIFvvEZN9starrocks11FragmentMgr18exec_plan_fragmentERKNS1_23TExecPlanFragmentParamsERKSt8functionIFvPNS1_20PlanFragmentExecutorEEESC_EUlvE_E9_M_invokeERKSt9_Any_data
@ 0x2132549 starrocks::ThreadPool::dispatch_thread()
@ 0x212e0fa starrocks::Thread::supervise_thread()
@ 0x7fae9398cea5 start_thread
@ 0x7fae92fa796d __clone
@ 0x0 (unknown)
- Github Issue: https://github.com/StarRocks/starrocks/issues/10265
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/10249
- 问题版本:
- 2.1.0 ~ 2.1.12
- 2.2.0 ~ 2.2.5
- 2.3.0 ~ 2.3.1
- 修复版本:
- 2.1.13+
- 2.2.6+
- 2.3.2+
- 临时规避方法:
- set global exec_mem_limit=一个很大的值
- 问题原因:
- RoaringBitmap抛出runtime error exception, 没有 catch 住
读 Parquet crash
*** Aborted at 1670504382 (unix time) try "date -d @1670504382" if you are using GNU date ***
PC: @ 0xcd14fa9 starrocks::parquet::DictDecoder<>::get_dict_values()
*** SIGSEGV (@0x604313bec270) received by PID 155105 (TID 0x7f7e6ae2b700) from PID 331268720; stack trace: ***
@ 0xdb15d42 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f7f4792f3ab os::Linux::chained_handler()
@ 0x7f7f47933efc JVM_handle_linux_signal
@ 0x7f7f47926d48 signalHandler()
@ 0x7f7f46e04630 (unknown)
@ 0xcd14fa9 starrocks::parquet::DictDecoder<>::get_dict_values()
@ 0xccfe8b3 starrocks::parquet::ColumnChunkReader::get_dict_values()
@ 0xccfee2c starrocks::parquet::StoredColumnReader::get_dict_values()
@ 0xccf0520 starrocks::parquet::ScalarColumnReader::get_dict_values()
@ 0xcce43d0 starrocks::parquet::GroupReader::_dict_decode()
@ 0xccdd403 starrocks::parquet::GroupReader::get_next()
@ 0xcca4e6a starrocks::parquet::FileReader::get_next()
@ 0xc96db97 starrocks::vectorized::HdfsParquetScanner::do_get_next()
@ 0xc9414b9 starrocks::vectorized::HdfsScanner::get_next()
@ 0xc825ce6 starrocks::connector::HiveDataSource::get_next()
@ 0x749bd98 starrocks::pipeline::ConnectorChunkSource::_read_chunk()
@ 0x7449923 starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
@ 0x63a7ae8 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
@ 0x63ac2fe _ZSt13__invoke_implIvRZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS0_12RuntimeStateEiEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
@ 0x63ac1ac _ZSt10__invoke_rIvRZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS0_12RuntimeStateEiEUlvE_JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES8_E4typeEOS9_DpOSA_
@ 0x63ac021 _ZNSt17_Function_handlerIFvvEZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS1_12RuntimeStateEiEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x6022428 std::function<>::operator()()
@ 0x63bf8be starrocks::workgroup::ScanExecutor::worker_thread()
@ 0x63bf108 _ZZN9starrocks9workgroup12ScanExecutor10initializeEiENKUlvE_clEv
@ 0x63c0ada _ZSt13__invoke_implIvRZN9starrocks9workgroup12ScanExecutor10initializeEiEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
@ 0x63c07a2 _ZSt10__invoke_rIvRZN9starrocks9workgroup12ScanExecutor10initializeEiEUlvE_JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES6_E4typeEOS7_DpOS8_
@ 0x63c0317 _ZNSt17_Function_handlerIFvvEZN9starrocks9workgroup12ScanExecutor10initializeEiEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x6022428 std::function<>::operator()()
@ 0xbbe3054 starrocks::FunctionRunnable::run()
@ 0xbbdfe94 starrocks::ThreadPool::dispatch_thread()
@ 0xbbfd420 std::__invoke_impl<>()
@ 0xbbfcd79 std::__invoke<>()
- Github Issue: 无
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/15189
- 问题版本:
- 2.3.0 ~ 2.3.5
- 2.4.0 ~ 2.4.2
- 修复版本:
- 2.3.6+
- 2.4.3+
- 临时规避方法:
- 无
- 问题原因:
- 一个scan range跨越多个row group, 一个row group有字典, 一个row group没有字典编码
TopN crash
query_id:09e1a166-803f-11ed-b2ae-c4b8b44f4875, fragment_instance:09e1a166-803f-11ed-b2ae-c4b8b44f4875
*** Aborted at 1671524376 (unix time) try "date -d @1671524376" if you are using GNU date ***
PC: @ 0x2af0ba23c676 __memcmp_sse4_1
*** SIGSEGV (@0xe4d00000019) received by PID 20674 (TID 0x2af0f4fcd700) from PID 25; stack trace: ***
@ 0x3ff4972 google::(anonymous namespace)::FailureSignalHandler()
@ 0x2af0b97b6630 (unknown)
@ 0x2af0ba23c676 __memcmp_sse4_1
@ 0x27e54d8 _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIPN9starrocks10vectorized20VerticalColumnSorter16CompactChunkItemINS2_5SliceEEESt6vectorIS7_SaIS7_EEEElS7_NS0_5__ops15_Iter_comp_iterIZNS3_L19sort_and_tie_helperIZNS4_8do_visitIjEENS2_6StatusERKNS3_16BinaryColumnBaseIT_EEEUlRKS7_SO_E_SB_EESH_RKbPKNS3_6ColumnEbRT0_RS9_IhSaIhEESJ_St4pairIiiEbmPmEUlSJ_SV_E_EEEvSJ_SV_SV_T1_T2_
@ 0x2884c07 starrocks::vectorized::VerticalColumnSorter::do_visit<>()
@ 0x2885a76 starrocks::ColumnVisitorAdapter<>::visit()
@ 0x19d1edc starrocks::vectorized::ColumnFactory<>::accept()
@ 0x27dcc5b starrocks::vectorized::sort_vertical_columns()
@ 0x2859c4b starrocks::vectorized::VerticalColumnSorter::do_visit()
@ 0x2859e26 starrocks::ColumnVisitorAdapter<>::visit()
@ 0x252298c starrocks::vectorized::ColumnFactory<>::accept()
@ 0x27dcc5b starrocks::vectorized::sort_vertical_columns()
@ 0x27de21a starrocks::vectorized::sort_vertical_chunks()
@ 0x27529e5 starrocks::vectorized::ChunksSorterTopn::_partial_sort_col_wise()
@ 0x2752e9c starrocks::vectorized::ChunksSorterTopn::_filter_and_sort_data()
@ 0x2756434 starrocks::vectorized::ChunksSorterTopn::_sort_chunks()
@ 0x2756b10 starrocks::vectorized::ChunksSorterTopn::done()
@ 0x27419e5 starrocks::vectorized::ChunksSorter::finish()
@ 0x28ba860 starrocks::pipeline::PartitionSortSinkOperator::set_finishing()
@ 0x28def07 starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
@ 0x28dff3b starrocks::pipeline::PipelineDriver::process()
@ 0x28d67dc starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x21772c9 starrocks::ThreadPool::dispatch_thread()
@ 0x2172e7a starrocks::Thread::supervise_thread()
@ 0x2af0b97aeea5 start_thread
@ 0x2af0ba1cfb0d __clone
@ 0x0 (unknown)
- Github Issue: https://github.com/StarRocks/starrocks/issues/10987
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/10988
- 问题版本:
- 2.3.0 ~ 2.3.2
- 修复版本:
- 2.3.3+
- 临时规避方法:
- 无
- 问题原因:
- 见 Issue 描述 (order by limit的时候,TopN处理有问题)
-
CrossJoin Crash
*** SIGABRT (@0x9219) received by PID 37401 (TID 0x2aaeee9bf700) from PID 37401; stack trace: ***
102
103 @ 0x3d0a5d2 google::(anonymous namespace)::FailureSignalHandler()
105 @ 0x2aaeb4ebe630 (unknown)
107 @ 0x2aaeb580f387 __GI_raise
109 @ 0x2aaeb5810a78 __GI_abort
111 @ 0x17f69ed _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
113 @ 0x56c8086 __cxxabiv1::__terminate()
115 @ 0x56c80f1 std::terminate()
117 @ 0x56c8244 __cxa_throw
119 @ 0x17f85d1 std::__throw_length_error()
121 @ 0x196c682 std::vector<>::_M_range_insert<>()
123 @ 0x1966ce4 starrocks::vectorized::BinaryColumn::append()
125 @ 0x232fd96 starrocks::vectorized::NullableColumn::append()
127 @ 0x2677b80 starrocks::pipeline::CrossJoinLeftOperator::_copy_joined_rows_with_index_base_probe()
129 @ 0x2678041 starrocks::pipeline::CrossJoinLeftOperator::pull_chunk()
131 @ 0x268f05a starrocks::pipeline::PipelineDriver::process()
133 @ 0x268525e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
135 @ 0x1fb37d9 starrocks::ThreadPool::dispatch_thread()
137 @ 0x1faf38a starrocks::Thread::supervise_thread()
139 @ 0x2aaeb4eb6ea5 start_thread
141 @ 0x2aaeb58d7b0d __clone
143 @ 0x0 (unknown)
- Github Issue: 无
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/15497
- 问题版本:
- 2.3.0 ~ 2.3.6
- 2.4.0 ~ 2.4.2
- 修复版本:
- 2.3.7+
- 2.4.3+
- 临时规避方法:
- 无
- 问题原因:
- 基数估计有问题,导致执行计划生成不合理。
-
BThread crash
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
*** Aborted at 1672016623 (unix time) try "date -d @1672016623" if you are using GNU date ***
PC: @ 0x7fa77afe1387 __GI_raise
*** SIGABRT (@0x3eb00002b0f) received by PID 11023 (TID 0x7fa5b3873700) from PID 11023; stack trace: ***
@ 0x403cce2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fa77ba96630 (unknown)
@ 0x7fa77afe1387 __GI_raise
@ 0x7fa77afe2a78 __GI_abort
@ 0x18cd85d _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x5a40906 __cxxabiv1::__terminate()
@ 0x5ae49d9 __cxa_call_terminate
@ 0x5a40321 __gxx_personality_v0
@ 0x5aeb62e _Unwind_RaiseException_Phase2
@ 0x5aec126 _Unwind_Resume
@ 0x17e1422 _ZN4brpc6policy17ProcessRpcRequestEPNS_16InputMessageBaseE.cold
@ 0x41654e7 brpc::ProcessInputMessage()
@ 0x4166393 brpc::InputMessenger::OnNewMessages()
@ 0x420d05e brpc::Socket::ProcessEvent()
@ 0x411afef bthread::TaskGroup::task_runner()
@ 0x42a37d1 bthread_make_fcontext
或:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
*** Aborted at 1672390713 (unix time) try "date -d @1672390713" if you are using GNU date ***
PC: @ 0x7fac0aea3387 __GI_raise
*** SIGABRT (@0x3f000004a75) received by PID 19061 (TID 0x7fab128b6700) from PID 19061; stack trace: ***
@ 0x4875742 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fac0b958630 (unknown)
@ 0x7fac0aea3387 __GI_raise
@ 0x7fac0aea4a78 __GI_abort
@ 0x1c4f8af _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x62f82f6 __cxxabiv1::__terminate()
@ 0x62f8361 std::terminate()
@ 0x62f84b4 __cxa_throw
@ 0x1c4f7b6 _Znwm.cold
@ 0x4868341 google::LogMessage::Init()
@ 0x4868a01 google::LogMessage::LogMessage()
@ 0x499ec02 brpc::InputMessenger::OnNewMessages()
@ 0x4a4595e brpc::Socket::ProcessEvent()
@ 0x4953a0f bthread::TaskGroup::task_runner()
@ 0x4adc0d1 bthread_make_fcontext
- Github Issue: https://github.com/StarRocks/starrocks/issues/16046
- Github Fix PR:
- 问题版本:
- 2.3.5 ~ 2.3.7
- 2.4.2
- 2.5.0 ~ 2.5.3
- 修复版本:
- 2.3.8+
- 2.4.3+
- 2.5.4+
- 临时规避方法:
- 无
- 问题原因:
- BThread MemTracker 内存控制逻辑有问题
- 也有可能是Swap开启导致,可以关闭Swap
-
GroupBY 后 Limit 结果不对,返回行数跳变
- Github Issue: https://github.com/StarRocks/starrocks/issues/11274
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/11330
- 问题版本:
- 2.4.0 ~ 2.4.2
- 修复版本:
- 2.4.3+
- 临时规避方法:
- 无
- 问题原因:
- Limit Push Down 有问题
-
主键模型 Compaction crash
start time: Tue Dec 27 17:12:33 CST 2022
*** Aborted at 1672132354 (unix time) try "date -d @1672132354" if you are using GNU date ***
PC: @ 0x17eb7f7 starrocks::vectorized::ChunkHelper::column_from_field()
*** SIGSEGV (@0x0) received by PID 34830 (TID 0x7f181273b700) from PID 0; stack trace: ***
@ 0x34fe482 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f18767605e0 (unknown)
@ 0x17eb7f7 starrocks::vectorized::ChunkHelper::column_from_field()
@ 0x17ebdda starrocks::vectorized::ChunkHelper::new_chunk()
@ 0x18ff0ae starrocks::vectorized::RowsetMergerImpl<>::_do_merge_horizontally()
@ 0x19021b2 starrocks::vectorized::RowsetMergerImpl<>::do_merge()
@ 0x18ef267 starrocks::vectorized::compaction_merge_rowsets()
@ 0x17d88e8 starrocks::TabletUpdates::_do_compaction()
@ 0x17d9999 starrocks::TabletUpdates::compaction()
@ 0x176314c starrocks::StorageEngine::_perform_update_compaction()
@ 0x1757e9f starrocks::StorageEngine::_update_compaction_thread_callback()
@ 0x4fdb870 execute_native_thread_routine
@ 0x7f1876758e25 start_thread
@ 0x7f1875b6234d __clone
@ 0x0 (unknown)
- Github Issue: https://github.com/StarRocks/starrocks/issues/4452
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/4454
- Jira:
- 问题版本:
- 2.1.0 ~ 2.1.3
- 修复版本:
- 2.1.4+
- 临时规避方法:
- 无
- 问题原因:
- 主键模型 Array 支持的问题
-
Version already been compacted
version already been compacted
- Github Issue: https://github.com/StarRocks/starrocks/issues/3689
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/13830
- Jira:
- 问题版本:
- 2.2.0 ~ 2.2.10
- 2.3.0 ~ 2.3.4
- 2.4.0 ~ 2.4.1
- 修复版本:
- 2.2.11+
- 2.3.5+
- 2.4.2+
- 临时规避方法:
- 无
- 问题原因:
- 查询获取 Rowset 版本时,没获取锁
-
get_json_int 或 get_json_double crash
erminate called after throwing an instance of 'arangodb::velocypack::Exception'
what(): Expecting numeric type
query_id:3c5935cf-8299-11ed-8c5b-06d48912a230, fragment_instance:3c5935cf-8299-11ed-8c5b-06d48912a231
*** Aborted at 1671783017 (unix time) try "date -d @1671783017" if you are using GNU date ***
PC: @ 0x7fddd4449ca0 __GI_raise
*** SIGABRT (@0xbb2) received by PID 2994 (TID 0x7fdd59562700) from PID 2994; stack trace: ***
@ 0x481e332 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fddd4f208e0 (unknown)
@ 0x7fddd4449ca0 __GI_raise
@ 0x7fddd444b148 __GI_abort
@ 0x1c4acef _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x62a0af6 __cxxabiv1::__terminate()
@ 0x62a0b61 std::terminate()
@ 0x62a0cb4 __cxa_throw
@ 0x3a146e2 starrocks::vectorized::JsonFunctions::_json_query_impl<>()
@ 0x3a0f152 starrocks::vectorized::JsonFunctions::get_native_json_double()
@ 0x39af998 starrocks::vectorized::VectorizedFunctionCallExpr::evaluate()
@ 0x3454e2c starrocks::ExprContext::evaluate()
@ 0x2e26eb2 starrocks::pipeline::ProjectOperator::push_chunk()
@ 0x2e7868c starrocks::pipeline::PipelineDriver::process()
@ 0x2e6e5a3 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x2680a05 starrocks::ThreadPool::dispatch_thread()
@ 0x267bf2a starrocks::Thread::supervise_thread()
@ 0x7fddd4f1644b start_thread
@ 0x7fddd450556f __GI___clone
@ 0x0 (unknown)
- Github Issue: https://github.com/StarRocks/starrocks/issues/12987
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/12988
- Jira:
- 问题版本:
- 2.3.0 ~ 2.3.3
- 2.4.0
- 修复版本:
- 2.3.4+
- 2.4.1+
- 临时规避方法:
- 无
- 问题原因:
- 没有TryCatch Json解析的异常
-
使用 Hivecatelog 查询 Hive Crash
PC: 0x3510d8a starrocks::connector::HiveDataSource::_init_scanner()
SIGSEGV (@0x10) received by PID 407690(TID Qx7fodo1763700)from PID 16;stack trace:
0x403cce2 google::(anonymous namespace)::FailureSignalHandler()
0x7f0d52195852 os::Linux::chainedhandler()
0x7fed5219c676 JVM_handle_linux_signal
0x7f0d52192653 signalHandler()
0x7f39493f05d0 (unknown)
0x3510d8a starrocks::vectorized::HdfsScanner::_build_scanner_context()
0x35117ef starrocks::vectorized::HdfsScanner::open()
0x34822eb starrocks::connector::HiveDatasource::_init_scanner()
9x3484a33 starrocks::connector::HiveDataSource::open()
0x28bofdc starrocks::pipeline::ConnectorChunkSource::_read_chunk()
0x28b10c3 starrocks::pipeline::ConnectorChunkSource::buffer next batch_chunks_blocking()
0x28ac22c _ZNSt17_Function handlerIFvvEZN9starrocks8pipelinel2ScanOperator18_trigger_next_scanEPNS1_12RuntimeStateEiEULVEO_E9_M_invokeERKSt9_Any_date
0×2011c60 starrocks::PriorityThreadPool::work_thread()
0x3f92fa7 thread_proxy
0x7f0d51653dd5 start_thread
0x7f0d50c6eead _clone
0x0 (unknown)
- Github Issue: 无
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/15486
- Jira:
- 问题版本:
- 2.2.0 ~ 2.2.11
- 2.3.0 ~ 2.3.6
- 2.4.0 ~ 2.4.2
- 修复版本:
- 2.2.12+
- 2.3.7+
- 2.4.3+
- 临时规避方法:
- 无
- 问题原因:
- 见PR描述
-
Not found dict for cid
查询报错
Not found dict for cid
- Github Issue: 无
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/13185
- Jira:
- 问题版本:
- 2.2.0 ~ 2.2.9
- 2.3.0 ~ 2.3.4
- 2.4.0 ~ 2.4.1
- 修复版本:
- 2.2.10+
- 2.3.5+
- 2.4.2+
- 临时规避方法:
- set global cbo_enable_low_cardinality_optimize=false;
- 问题原因:
- 见PR描述
-
使用资源组查询卡住
pstack 有如下堆栈
pstack starrocks_be 进程号
#0 0x00007fe5bc2cca35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000005a229bc in __gthread_cond_wait (__mutex=<optimized out>, __cond=__cond@entry=0x37cf64bf8) at /var/local/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:865
#2 std::condition_variable::wait (this=this@entry=0x37cf64bf8, __lock=...) at ../../../.././libstdc++-v3/src/c++11/condition_variable.cc:53
#3 0x00000000028f3367 in starrocks::pipeline::QuerySharedDriverQueue::take (this=0x37cf64400) at /gaia/workspace-job/git.xiaojukeji.com/datainfra-hadoop/didi-starrock/be/src/exec/pipeline/pipeline_driver_queue.cpp:95
#4 0x00000000028f3d22 in starrocks::pipeline::WorkGroupDriverQueue::take (this=<optimized out>) at /gaia/workspace-job/git.xiaojukeji.com/datainfra-hadoop/didi-starrock/be/src/exec/pipeline/pipeline_driver_queue.cpp:244
#5 0x00000000028f0305 in starrocks::pipeline::GlobalDriverExecutor::_worker_thread (this=0xa892ee0) at /gaia/workspace-job/git.xiaojukeji.com/datainfra-hadoop/didi-starrock/be/src/exec/pipeline/pipeline_driver_executor.cpp:86
#6 0x000000000217fef9 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/10.3.0/bits/std_function.h:248
#7 starrocks::FunctionRunnable::run (this=<optimized out>) at /gaia/workspace-job/git.xiaojukeji.com/datainfra-hadoop/didi-starrock/be/src/util/threadpool.cpp:44
#8 starrocks::ThreadPool::dispatch_thread (this=0x19a50000) at /gaia/workspace-job/git.xiaojukeji.com/datainfra-hadoop/didi-starrock/be/src/util/threadpool.cpp:513
#9 0x000000000217baaa in std::function<void ()>::operator()() const (this=0x17fa08d8) at /usr/include/c++/10.3.0/bits/std_function.h:248
#10 starrocks::Thread::supervise_thread (arg=0x17fa08c0) at /gaia/workspace-job/git.xiaojukeji.com/datainfra-hadoop/didi-starrock/be/src/util/thread.cpp:326
#11 0x00007fe5bc2c8ea5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fe5bb8e3b0d in clone () from /lib64/libc.so.6
同时有两个take
#3 0x00000000028f3367 in starrocks::pipeline::QuerySharedDriverQueue::take (this=0x37cf64400) at /gaia/workspace-job/git.xiaojukeji.com/datainfra-hadoop/didi-starrock/be/src/exec/pipeline/pipeline_driver_queue.cpp:95
#4 0x00000000028f3d22 in starrocks::pipeline::WorkGroupDriverQueue::take (this=<optimized out>) at /gaia/workspace-job/git.xiaojukeji.com/datainfra-hadoop/didi-starrock/be/src/exec/pipeline/pipeline_driver_queue.cpp:244
- Github Issue:
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/14859
- Jira: https://starrocks.atlassian.net/browse/SR-14791
- 问题版本:
- 2.3.0 ~ 2.3.5
- 2.4.0 ~ 2.4.1
- 修复版本:
- 2.3.6+
- 2.4.2+
- 临时规避方法:
- 重启BE
- 问题原因:
- 见PR描述
-
_statistics.column_statistics 表 StatisticsCollectJob Too many versions
2023-01-05 10:54:05,173 WARN (thrift-server-pool-39|12567) [Coordinator.updateFragmentExecStatus():2174] one instance report fail errorCode SERVICE_UNAVAILABLE Too many versions. tablet_id: 10226, version_count: 1001, limit: 1000: be:XXX.XXX.XXX.26, query_id=3772d178-8ca4-11ed-854d-6cfe54388271 instance_id=3772d178-8ca4-11ed-854d-6cfe54388275
2023-01-05 10:54:05,173 WARN (thrift-server-pool-39|12567) [Coordinator.updateStatus():1249] one instance report fail throw updateStatus(), need cancel. job id: -1, query id: 3772d178-8ca4-11ed-854d-6cfe54388271, instance id: 3772d178-8ca4-11ed-854d-6cfe54388275
2023-01-05 10:54:05,174 WARN (AutoStatistic|38) [StmtExecutor.handleDMLStmt():1338] insert failed: Too many versions. tablet_id: 10226, version_count: 1001, limit: 1000: be:XXX.XXX.XXX.26
2023-01-05 10:54:05,174 WARN (AutoStatistic|38) [StmtExecutor.handleDMLStmt():1415] handle insert stmt fail: insert_3772d178-8ca4-11ed-854d-6cfe54388271
com.starrocks.common.DdlException: Too many versions. tablet_id: 10226, version_count: 1001, limit: 1000: be:XXX.XXX.XXX.26
at com.starrocks.common.ErrorReport.reportDdlException(ErrorReport.java:80) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:1339) [starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:471) [starrocks-fe.jar:?]
at com.starrocks.statistic.StatisticsCollectJob.collectStatisticSync(StatisticsCollectJob.java:92) [starrocks-fe.jar:?]
at com.starrocks.statistic.FullStatisticsCollectJob.collect(FullStatisticsCollectJob.java:62) [starrocks-fe.jar:?]
at com.starrocks.statistic.StatisticExecutor.collectStatistics(StatisticExecutor.java:190) [starrocks-fe.jar:?]
at com.starrocks.statistic.StatisticAutoCollector.runAfterCatalogReady(StatisticAutoCollector.java:61) [starrocks-fe.jar:?]
at com.starrocks.common.util.LeaderDaemon.runOneCycle(LeaderDaemon.java:60) [starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) [starrocks-fe.jar:?]
- Github Issue: https://github.com/StarRocks/starrocks/issues/13261
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/13235
- Jira:
- 问题版本:
- 2.4.0 ~ 2.4.1
- 修复版本:
- 2.4.2+
- 临时规避方法:
- 无
- 问题原因:
- 收集统计信息写入太快报错,这个PR修复后,有可能还会有零星报错,只是加了一个反压机制
-
insert 内存泄漏 (insert 或是 insert into select)
FE Follower 内存泄漏,Leader正常,看内存分布 TxnStateCallbackFactory使用内存比较多
jmap -histo pid
num #instances #bytes class name
----------------------------------------------
1: 65039949 6979006048 [C
2: 4022619 2525925768 [B
3: 51632292 2478350016 java.util.HashMap
4: 73877356 1773056544 java.lang.String
5: 10172354 1546197808 com.starrocks.load.loadv2.InsertLoadJob
6: 20355243 977051664 com.google.gson.internal.LinkedTreeMap$Node
7: 20352822 976935456 com.google.gson.internal.LinkedTreeMap
8: 10727986 935362088 [Ljava.util.HashMap$Node;
9: 38312936 919510464 java.lang.Long
10: 10172713 813817488 [Lorg.apache.commons.collections.map.AbstractHashedMap$HashEntry;
11: 22247256 711912192 java.util.HashMap$Node
12: 10230960 654781440 java.util.concurrent.ConcurrentHashMap
13: 10461293 585832408 java.util.LinkedHashMap
14: 10172712 569671872 com.starrocks.load.EtlStatus
15: 10172712 569671872 org.apache.commons.collections.map.HashedMap
16: 10172400 488275200 java.util.concurrent.locks.ReentrantReadWriteLock$FairSync
com.starrocks.load.loadv2.InsertLoadJob 这个占用比较多的,说明是这个问题
- Github Issue: https://github.com/StarRocks/starrocks/issues/14717
- Github Fix PR: https://github.com/StarRocks/starrocks/pull/14718
- Jira:
- 问题版本:
- 2.2.0 ~ 2.2.11
- 2.3.0 ~ 2.3.6
- 2.4.0 ~ 2.4.2
- 修复版本:
- 2.2.12+
- 2.3.7+
- 2.4.3+
- 临时规避方法:
- 无
- 问题原因:
- 见 issue 描述