常见 Crash / BUG / 优化 查询

  1. StarRocks写的Parquet文件,Hive读不了

Failed with exception java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 3.2.0~latest

    • 3.3.0~3.3.4

  • 修复版本:

    • 3.2未修复

    • 3.3.5+

  • 问题原因:

    • StarRocks写的是新版本的Parquet格式,Hive <=3.0 的版本识别不了
  • 临时解决办法:

  1. Morsel queue crash

*** SIGSEGV (@0xa0) received by PID 1328927 (TID 0x14f3e4c3a640) from PID 160; stack trace: ***
    @     0x14f509070ee8 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x99ee7)
    @          0xa37c049 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x14f509e7835d os::Linux::chained_handler(int, siginfo*, void*)
    @     0x14f509e7df5f JVM_handle_linux_signal
    @     0x14f509e6f968 signalHandler(int, siginfo*, void*)
    @     0x14f509019520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x4251f)
    @          0x6d622bc starrocks::TabletReader::_to_seek_tuple(std::shared_ptr<starrocks::TabletSchema const> const&, starrocks::OlapTuple const&, starrocks::SeekTuple*, starrocks::MemPool*)
    @          0x6d62ebe starrocks::TabletReader::parse_seek_range(std::shared_ptr<starrocks::TabletSchema const> const&, starrocks::TabletReaderParams::RangeStartOperation, starrocks::TabletReaderParams::RangeEndOperation, std::vector<starrocks::OlapTuple, std::allocator<starrockP^Y
    @          0x758ae83 starrocks::pipeline::PhysicalSplitMorselQueue::_init_segment()
    @          0x758b46f starrocks::pipeline::PhysicalSplitMorselQueue::_try_get_split_from_single_tablet()
    @          0x758bd67 starrocks::pipeline::PhysicalSplitMorselQueue::try_get()
    @          0x7587e53 starrocks::pipeline::BucketSequenceMorselQueue::try_get()
    @          0x5456ace starrocks::pipeline::ScanOperator::_pickup_morsel(starrocks::RuntimeState*, int)
    @          0x54555bc starrocks::pipeline::ScanOperator::_try_to_trigger_next_scan(starrocks::RuntimeState*)
    @          0x545584a starrocks::pipeline::ScanOperator::pull_chunk(starrocks::RuntimeState*)
    @          0x544c8b8 starrocks::pipeline::PipelineDriver::process(starrocks::RuntimeState*, int)
    @          0x7ebce58 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x8c00b73 starrocks::ThreadPool::dispatch_thread()
    @          0x8bf81c9 starrocks::Thread::supervise_thread(void*)
    @     0x14f50906bac3 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x94ac2)
    @     0x14f5090fd850 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x12684f)
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 3.3.0~3.3.17

    • 3.4.0~3.4.16

    • 3.5.0~3.5.4

  • 修复版本:

    • 3.3.18+

    • 3.4.17+

    • 3.5.5+

  • 问题原因:

  • 临时解决办法:

    • set global enable_per_bucket_optimize = false
  1. Arm parquet reader crash

*** SIGSEGV (@0x0) received by PID 28 (TID 0xfffea329fe00) LWP(602) from PID 0; stack trace: ***
    @     0xffffb81e25c4 (/usr/lib/aarch64-linux-gnu/libc.so.6+0x825c3)
    @          0xf73dd28 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0xffffb989c850 ([vdso]+0x84f)
    @     0xffffb81f7bbc (/usr/lib/aarch64-linux-gnu/libc.so.6+0x97bbb)
    @          0xb350584 starrocks::parquet::Int32ToDateConverter::convert(starrocks::Cow<starrocks::Column>::ImmutPtr<starrocks::Column> const&, starrocks::Column*)
    @          0xb37c948 starrocks::parquet::StatisticsHelper::decode_value_into_column(starrocks::Cow<starrocks::Column>::MutPtr<starrocks::Column> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::ba
    @          0xb29af2c starrocks::parquet::RawColumnReader::_row_group_zone_map_filter(std::vector<starrocks::ColumnPredicate const*, std::allocator<starrocks::ColumnPredicate const*> > const&, starrocks::CompoundNodeType, starrocks::TypeDescriptor const&, unsigned long, unsigne
    @          0xb29f114 starrocks::parquet::ScalarColumnReader::row_group_zone_map_filter(std::vector<starrocks::ColumnPredicate const*, std::allocator<starrocks::ColumnPredicate const*> > const&, starrocks::CompoundNodeType, unsigned long, unsigned long) const
    @          0xb254340 starrocks::StatusOr<std::optional<starrocks::SparseRange<unsigned long> > > starrocks::parquet::PredicateFilterEvaluator::visit_for_rowgroup_zonemap<(starrocks::CompoundNodeType)0>(starrocks::PredicateCompoundNode<(starrocks::CompoundNodeType)0> const&)
    @          0xb2573cc starrocks::StatusOr<std::optional<starrocks::SparseRange<unsigned long> > > starrocks::parquet::PredicateFilterEvaluator::operator()<(starrocks::CompoundNodeType)0>(starrocks::PredicateCompoundNode<(starrocks::CompoundNodeType)0> const&, starrocks::parquet
    @          0xb22ac94 starrocks::parquet::FileReader::_filter_group(std::shared_ptr<starrocks::parquet::GroupReader> const&)
    @          0xb22b3a8 starrocks::parquet::FileReader::_init_group_readers()
    @          0xb22c3ac starrocks::parquet::FileReader::init(starrocks::HdfsScannerContext*)
    @          0xab53f4c starrocks::HdfsParquetScanner::do_open(starrocks::RuntimeState*)
    @          0xaa038a8 starrocks::HdfsScanner::open(starrocks::RuntimeState*)
    @          0xa9de95c starrocks::connector::HiveDataSource::_init_scanner(starrocks::RuntimeState*)
    @          0xa9df6d8 starrocks::connector::HiveDataSource::open(starrocks::RuntimeState*)
    @          0xa9bcf88 starrocks::pipeline::ConnectorChunkSource::_open_data_source(starrocks::RuntimeState*, bool*)
    @          0xa9bd9fc starrocks::pipeline::ConnectorChunkSource::_read_chunk(starrocks::RuntimeState*, std::shared_ptr<starrocks::Chunk>*)
    @          0xa9c399c starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking(starrocks::RuntimeState*, unsigned long, starrocks::workgroup::WorkGroup const*)
    @          0xa299bf0 auto starrocks::pipeline::ScanOperator::_trigger_next_scan(starrocks::RuntimeState*, int)::{lambda(auto:1&)#1}::operator()<starrocks::workgroup::YieldContext>(starrocks::workgroup::YieldContext&) const [clone .constprop.0]
    @          0xa92c89c starrocks::workgroup::ScanExecutor::worker_thread()
    @          0xc411dc4 starrocks::ThreadPool::dispatch_thread()
    @          0xc408a68 starrocks::Thread::supervise_thread(void*)
  1. min_by/max_by 函数 crash

*** Aborted at 1754458860 (unix time) try "date -d @1754458860" if you are using GNU date ***
PC: @     0x7f0ee04d5711 __memcpy_ssse3_back
*** SIGSEGV (@0x7f0e74196ff1) received by PID 16389 (TID 0x7f0e5895e700) from PID 1947824113; stack trace: ***
    @     0x7f0ee107b20b __pthread_once_slow
    @          0x7dc3dc0 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x7f0ee1f0935d os::Linux::chained_handler(int, siginfo*, void*)
    @     0x7f0ee1f0ef5f JVM_handle_linux_signal
    @     0x7f0ee1f00968 signalHandler(int, siginfo*, void*)
    @     0x7f0ee1084630 (/usr/lib64/libpthread-2.17.so+0xf62f)
    @     0x7f0ee04d5711 __memcpy_ssse3_back
    @          0x4ce607d starrocks::AggregateFunctionBatchHelper<starrocks::MinByAggregateData<(starrocks::LogicalType)17, true, int>, starrocks::MaxMinByAggregateFunction<(starrocks::LogicalType)17, starrocks::MinByAggregateData<(starrocks::LogicalType)17, true, int>, starrocks::����
    @          0x46200dc starrocks::Aggregator::compute_batch_agg_states(starrocks::Chunk*, unsigned long)
    @          0x451ffcd starrocks::pipeline::AggregateBlockingSinkOperator::push_chunk(starrocks::RuntimeState*, std::shared_ptr<starrocks::Chunk> const&)
    @          0x452f3e1 starrocks::pipeline::BucketProcessSinkOperator::push_chunk(starrocks::RuntimeState*, std::shared_ptr<starrocks::Chunk> const&)
    @          0x44f49d4 starrocks::pipeline::PipelineDriver::process(starrocks::RuntimeState*, int)
    @          0x47b85e3 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x398e8e3 starrocks::ThreadPool::dispatch_thread()
    @          0x3985f66 starrocks::Thread::supervise_thread(void*)
    @     0x7f0ee107cea5 start_thread
    @     0x7f0ee047db0d __clone
  1. RecoverableStub crash

*** SIGABRT (@0x272000296e6) received by PID 169702 (TID 0x15162f6fb640) from PID 169702; stack trace: ***
    @     0x151dd468ee18 __pthread_once_slow
    @          0x7da2500 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x151dd463e6f0 (/usr/lib64/libc.so.6+0x3e6ef)
    @     0x151dd468b94c __pthread_kill_implementation
    @     0x151dd463e646 __GI_raise
    @     0x151dd46287f3 __GI_abort
    @          0x3328555 __gnu_cxx::__verbose_terminate_handler() [clone .cold]
    @          0xc480c56 __cxxabiv1::__terminate(void (*)())
    @          0xc480cc1 std::terminate()
    @          0xc48144f __cxa_pure_virtual
    @          0x3821545 starrocks::LocalTabletsChannel::_abort_replica_tablets(starrocks::PTabletWriterAddChunkRequest const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<long, std::vector<long, std::allocator<long> >X^S
    @          0x38237f0 starrocks::LocalTabletsChannel::add_chunk(starrocks::Chunk*, starrocks::PTabletWriterAddChunkRequest const&, starrocks::PTabletWriterAddBatchResult*)
    @          0x3815939 starrocks::LoadChannel::_add_chunk(starrocks::Chunk*, starrocks::PTabletWriterAddChunkRequest const&, starrocks::PTabletWriterAddBatchResult*)
    @          0x3816b2c starrocks::LoadChannel::add_chunks(starrocks::PTabletWriterAddChunksRequest const&, starrocks::PTabletWriterAddBatchResult*)
    @          0x38101c3 starrocks::LoadChannelMgr::add_chunks(starrocks::PTabletWriterAddChunksRequest const&, starrocks::PTabletWriterAddBatchResult*)
    @          0x38d163b starrocks::BackendInternalServiceImpl<starrocks::PInternalService>::tablet_writer_add_chunks(google::protobuf::RpcController*, starrocks::PTabletWriterAddChunksRequest const*, starrocks::PTabletWriterAddBatchResult*, google::protobuf::Closure*)
    @          0x802f2d4 brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*)
    @          0x7f5b677 brpc::ProcessInputMessage(void*)
    @          0x7f5c9f5 brpc::InputMessenger::OnNewMessages(brpc::Socket*)
    @          0x7f4acee brpc::Socket::ProcessEvent(void*)
    @          0x7f1bd72 bthread::TaskGroup::task_runner(long)
    @          0x8071201 bthread_make_fcontext
  1. AggHashMapWithSerializedKey crash

*** SIGSEGV (@0x10) received by PID 61982 (TID 0x151907171640) LWP(66306) from PID 16; stack trace: ***
    @     0x1522e1c8ef38 __pthread_once_slow
    @          0xbed4e94 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x1522e318c519 PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
    @     0x1522e318cf6e JVM_handle_linux_signal
    @     0x1522e1c3e730 (/usr/lib64/libc.so.6+0x3e72f)
    @          0x565de1e void starrocks::AggHashMapWithSerializedKey<phmap::flat_hash_map<starrocks::Slice, unsigned char*, starrocks::SliceHashWithSeed<(starrocks::PhmapSeed)0>, starrocks::SliceEqual, std::allocator<std::pair<starrocks::Slice const, unsigned char*> > > >::compute@
    @          0x5687d74 starrocks::Aggregator::build_hash_map_with_selection(unsigned long)
    @          0x54667fc starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_selective_preaggregation(std::shared_ptr<starrocks::Chunk> const&, unsigned long, bool)
    @          0x54671f2 starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_auto(std::shared_ptr<starrocks::Chunk> const&, unsigned long)
    @          0x54682d2 starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk(starrocks::RuntimeState*, std::shared_ptr<starrocks::Chunk> const&)
    @          0x5426a49 starrocks::pipeline::PipelineDriver::process(starrocks::RuntimeState*, int)
    @          0x58bfc71 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x45fee3f starrocks::ThreadPool::dispatch_thread()
    @          0x45f5c30 starrocks::Thread::supervise_thread(void*)
    @     0x1522e1c89d22 start_thread
    @     0x1522e1d0ed40 __clone3
  1. Field 函数 crash

*** Aborted at 1770806357 (unix time) try "date -d @1770806357" if you are using GNU date ***
PC: @          0x4e40660 starrocks::BinaryColumnBase<unsigned int>::_build_slices() const
*** SIGSEGV (@0x0) received by PID 19649 (TID 0x1530d22c5640) LWP(24061) from PID 0; stack trace: ***
    @     0x1538ba08ef38 __pthread_once_slow
    @          0xbf0a514 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x1538bb58c519 PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
    @     0x1538bb58cf6e JVM_handle_linux_signal
    @     0x1538ba03e730 (/usr/lib64/libc.so.6+0x3e72f)
    @          0x4e40660 starrocks::BinaryColumnBase<unsigned int>::_build_slices() const
    @          0x8722837 starrocks::StatusOr<starrocks::Cow<starrocks::Column>::ImmutPtr<starrocks::Column> > starrocks::StringFunctions::field<(starrocks::LogicalType)17>(starrocks::FunctionContext*, std::vector<starrocks::Cow<starrocks::Column>::ImmutPtr
<starrocks::Column>, std:@
    @          0x86e702a std::_Function_handler<starrocks::StatusOr<starrocks::Cow<starrocks::Column>::ImmutPtr<starrocks::Column> > (starrocks::FunctionContext*, std::vector<starrocks::Cow<starrocks::Column>::ImmutPtr<starrocks::Column>, std::allocator<st
arrocks::Cow<starrocks::C@
    @          0x745a68d starrocks::VectorizedFunctionCallExpr::evaluate_checked(starrocks::ExprContext*, starrocks::Chunk*)
    @          0x665dbdb starrocks::ExprContext::evaluate(starrocks::Expr*, starrocks::Chunk*, unsigned char*)
    @          0x665df1b starrocks::ExprContext::evaluate(starrocks::Chunk*, unsigned char*)
    @          0x538b16c starrocks::pipeline::ProjectOperator::push_chunk(starrocks::RuntimeState*, std::shared_ptr<starrocks::Chunk> const&)
    @          0x544347a starrocks::pipeline::PipelineDriver::process(starrocks::RuntimeState*, int)
    @          0x58e2aed starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x460d0d7 starrocks::ThreadPool::dispatch_thread()
    @          0x4603eb0 starrocks::Thread::supervise_thread(void*)
    @     0x1538ba089d22 start_thread
    @     0x1538ba10ed40 __clone3
  1. get_fe_metrics crash

*** Aborted at 1768557257 (unix time) try "date -d @1768557257" if you are using GNU date ***
PC: @     0x1498c0c8ba6c __pthread_kill_implementation
*** SIGABRT (@0x1f20001129f) received by PID 70303 (TID 0x148d3ff87640) LWP(76947) from PID 70303; stack trace: ***
    @     0x1498c0c8ef38 __pthread_once_slow
    @          0xbed4e94 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x1498c0c3e730 (/usr/lib64/libc.so.6+0x3e72f)
    @     0x1498c0c8ba6c __pthread_kill_implementation
    @     0x1498c0c3e686 __GI_raise
    @     0x1498c0c28833 __GI_abort
    @          0x3f76f21 __gnu_cxx::__verbose_terminate_handler() [clone .cold]
    @          0xfb1d096 __cxxabiv1::__terminate(void (*)())
    @          0x3f76db9 std::terminate()
    @          0xfb1d233 __cxa_throw
    @          0x45d8de8 __wrap___cxa_throw
    @          0x5860ba1 starrocks::SchemaFeMetricsScanner::_get_fe_metrics(starrocks::RuntimeState*)
    @          0x5864713 starrocks::SchemaFeMetricsScanner::start(starrocks::RuntimeState*)
    @          0x58b8fea std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<starrocks::pipeline::SchemaChunkSource::start(starrocks::RuntimeState*)::{lambda()#1}>(std::once_flag&, starrocks::pipeline::SchemaChunkSo
urce::start(starrocks::RuntimeState*)::{lambda()@
    @     0x1498c0c8ef38 __pthread_once_slow
    @          0x58b90d4 starrocks::pipeline::SchemaChunkSource::start(starrocks::RuntimeState*)
    @          0x53914d6 auto starrocks::pipeline::ScanOperator::_trigger_next_scan(starrocks::RuntimeState*, int)::{lambda(auto:1&)#1}::operator()<starrocks::workgroup::YieldContext>(starrocks::workgroup::YieldContext&) const [clone
 .isra.0]
    @          0x54d7a29 starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x45fee3f starrocks::ThreadPool::dispatch_thread()
    @          0x45f5c30 starrocks::Thread::supervise_thread(void*)
    @     0x1498c0c89d22 start_thread
    @     0x1498c0d0ed40 __clone3
  1. Tablet channel use-after-free

    @          0x406ef03 starrocks::FixedLengthColumnBase<starrocks::TimestampValue>::append_selective(starrocks::Column const&, unsigned int const*, unsigned int, unsigned int)
    @          0x635da25 starrocks::MemTable::insert(starrocks::Chunk const&, unsigned int const*, unsigned int, unsigned int)
    @          0x63529c5 starrocks::DeltaWriter::write(starrocks::Chunk const&, unsigned int const*, unsigned int, unsigned int)
    @          0x62c92a6 starrocks::AsyncDeltaWriter::_execute(void*, bthread::TaskIterator<starrocks::AsyncDeltaWriter::Task>&)
    @          0x7f2a85c bthread::ExecutionQueueBase::_execute(bthread::TaskNode*, bool, int*)
    @          0x7f2b84b bthread::ExecutionQueueBase::_execute_tasks(void*)
    @          0x398b053 starrocks::ThreadPool::dispatch_thread()
    @          0x3983296 starrocks::Thread::supervise_thread(void*)
    @     0x14914ac89d22 start_thread
    @     0x14914ad0ed40 __clone3
  • Github Issue:

  • Github Fix PR:

  • Jira

  • 问题版本:

    • 3.1.0 ~ latest

    • 3.2.0 ~ latest

    • 3.3.0 ~ 3.3.19

    • 3.4.0 ~ 3.4.8

    • 3.5.0 ~ 3.5.7

    • 4.0.0

  • 修复版本:

    • 3.1 未修复

    • 3.2 未修复

    • 3.3.20+

    • 3.5.9+

    • 3.5.8+

    • 4.0.1+

  • 问题原因:

  • 临时解决办法:

*** Aborted at 1772777751 (unix time) try "date -d @1772777751" if you are using GNU date ***
PC: @          0x637e83f starrocks::lake::AsyncDeltaWriter::close()
*** SIGSEGV (@0x0) received by PID 4101738 (TID 0x7f7ddb8ff640) from PID 0; stack trace: ***
    @     0x7f80534904f8 __pthread_once_slow
    @          0x7d0ad20 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x7f805441c25a os::Linux::chained_handler(int, siginfo*, void*)
    @     0x7f805442185e JVM_handle_linux_signal
    @     0x7f8054415748 signalHandler(int, siginfo*, void*)
    @     0x7f805343fc30 (/usr/lib64/libc.so.6+0x3fc2f)
    @          0x637e83f starrocks::lake::AsyncDeltaWriter::close()
    @          0x384e239 starrocks::LakeTabletsChannel::abort()
    @          0x37f924b starrocks::LoadChannel::abort()
    @          0x37f4590 starrocks::LoadChannelMgr::cancel(brpc::Controller*, starrocks::PTabletWriterCancelRequest const&, starrocks::PTabletWriterCancelResult*, google::protobuf::Closure*)
    @          0x7f97af4 brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*)
    @          0x7ec3e97 brpc::ProcessInputMessage(void*)
    @          0x7ec5215 brpc::InputMessenger::OnNewMessages(brpc::Socket*)
    @          0x7eb350e brpc::Socket::ProcessEvent(void*)
    @          0x7e84592 bthread::TaskGroup::task_runner(long)
    @          0x7fd9a21 bthread_make_fcontext
  • Github Issue:
  • Github Fix PR:
  • Jira
  • 问题版本:
  • 修复版本:
    • 3.3.19+
    • 3.4.8+
    • 3.5.6+
  • 问题原因:
    LoadChannel被重新打开, 但对应的 tablet writer 没有重新打开, 所以 nullptr 了
  • 临时解决办法:
1赞
  1. AsyncFlushOutputStream use-after-free 导致 BE Crash
PC: @ 0xd5a81ed starrocks::io::AsyncFlushOutputStream::write(unsigned char const*, long)
*** SIGSEGV (@0x0) received by PID 76996 (TID 0x14bbe3cd6640) LWP(77375) from PID 0; stack trace: ***
    @ 0x14bc0e41bee8 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x99ee7)
    @ 0x14ac8d46 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @ 0x14bc0e3c4520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x4251f)
    @ 0xd5a81ed starrocks::io::AsyncFlushOutputStream::write(unsigned char const*, long)
    @ 0xd5696c8 starrocks::parquet::AsyncParquetOutputStream::Write(void const*, long)
    @ 0x1615e7f6 parquet::FileMetaData::FileMetaDataImpl::WriteTo(arrow::io::OutputStream*, std::shared_ptr<parquet::Encryptor> const&) const
    @ 0x160d650e parquet::WriteFileMetaData(parquet::FileMetaData const&, arrow::io::OutputStream*)
    @ 0x160d96aa parquet::FileSerializer::Close()
    @ 0x160d6d40 parquet::ParquetFileWriter::Close()
    @ 0x160d6e8f parquet::ParquetFileWriter::~ParquetFileWriter()
    @ 0xd33b8da std::_Sp_counted_deleter<parquet::ParquetFileWriter*, ...>::_M_dispose()
    @ 0xd336ee8 starrocks::formats::ParquetFileWriter::~ParquetFileWriter()
    @ 0xc6e9b38 std::_Sp_counted_ptr_inplace<starrocks::connector::BufferPartitionChunkWriter, ...>::_M_dispose()
    @ 0xc6f9b29 starrocks::connector::ConnectorChunkSink::write_partition_chunk(...)
    @ 0xc6fa222 starrocks::connector::ConnectorChunkSink::add(std::shared_ptr<starrocks::Chunk> const&)
    @ 0xfd4e805 starrocks::pipeline::ConnectorSinkOperator::push_chunk(starrocks::RuntimeState*, std::shared_ptr<starrocks::Chunk> const&)
    @ 0xc5f40ff starrocks::pipeline::PipelineDriver::process(starrocks::RuntimeState*, int)
    @ 0xc669dd1 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @ 0x11034ebb starrocks::ThreadPool::dispatch_thread()
    @ 0x1102beef starrocks::Thread::supervise_thread(void*)
  • Github Issue:
  • Github Fix PR:
  • Jira:
  • 问题版本:
    • 3.5.0 ~ 3.5.14
    • 4.0.0 ~ 4.0.6
  • 修复版本:
    • 3.5.15+
    • 4.0.7+
  • 问题原因:
    _filter_writer 析构时会触发 _out_stream 的 flush,若 _out_stream 先于 _filter_writer 被销毁,则会产生 use-after-free 问题导致 Crash。修复方式为调整成员变量声明顺序,确保 _out_stream_filter_writer 之后销毁。
  • 临时解决办法:
  1. ParquetFileWriter::close 抛出非 ParquetStatusException 异常导致 BE Crash
*** SIGABRT (@0x74ee) received by PID 29934 (TID 0x2ad55605e700) from PID 29934; stack trace: ***
    @ 0x2ad416033e20 __GI___pthread_once
    @ 0x7d2b0c0 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @ 0x2ad4160365e0 (/usr/lib64/libpthread-2.17.so+0xf5df)
    @ 0x2ad416ba01f7 __GI_raise
    @ 0x2ad416ba18e8 __GI_abort
    @ 0x330c327 __gnu_cxx::__verbose_terminate_handler() [clone .cold]
    @ 0xc408c86 __cxxabiv1::__terminate(void (*)())
    @ 0xc408cf1 std::terminate()
    @ 0xc408e44 __cxa_throw
    @ 0x36faf61 __wrap___cxa_throw
    @ 0x8bf930e parquet::ThrowRowsMisMatchError(int, long, long)
    @ 0x8bfa9e8 parquet::FileSerializer::Close()
    @ 0x8bf7ad0 parquet::ParquetFileWriter::Close() [clone .localalias]
    @ 0x7183400 starrocks::formats::ParquetFileWriter::commit()
    @ 0x6f6e7df starrocks::connector::ConnectorChunkSink::finish()
    @ 0x6fd6714 starrocks::pipeline::ConnectorSinkOperator::set_finishing(starrocks::RuntimeState*)
    @ 0x44b4154 starrocks::pipeline::PipelineDriver::_mark_operator_finishing(...)
    @ 0x44b43ac starrocks::pipeline::PipelineDriver::_mark_operator_finished(...)
    @ 0x44b4b03 starrocks::pipeline::PipelineDriver::_mark_operator_cancelled(...)
    @ 0x44b5025 starrocks::pipeline::PipelineDriver::cancel_operators(starrocks::RuntimeState*)
    @ 0x477b524 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @ 0x3966ca3 starrocks::ThreadPool::dispatch_thread()
    @ 0x395e326 starrocks::Thread::supervise_thread(void*)
  1. 修复 hadoop-client 导致 FE 死锁的 bug
jdk.internal.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
java.util.concurrent.ForkJoinTask.awaitDone(ForkJoinTask.java:468)
java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:687)
java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:927)
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
org.apache.hadoop.fs.statistics.impl.EvaluatingStatisticsMap.entrySet(EvaluatingStatisticsMap.java:166)
java.util.Collections$UnmodifiableMap.entrySet(Collections.java:1529)
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.copyMap(IOStatisticsBinding.java:172)
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.snapshotMap(IOStatisticsBinding.java:216)
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.snapshotMap(IOStatisticsBinding.java:199)
org.apache.hadoop.fs.statistics.IOStatisticsSnapshot.snapshot(IOStatisticsSnapshot.java:165)
org.apache.hadoop.fs.statistics.IOStatisticsSnapshot.<init>(IOStatisticsSnapshot.java:125)
org.apache.hadoop.fs.statistics.IOStatisticsSupport.snapshotIOStatistics(IOStatisticsSupport.java:49)
  1. field 函数并发执行导致 BE Crash(BinaryColumn slice cache 线程不安全)
PC: @ 0x4e40660 starrocks::BinaryColumnBase<unsigned int>::_build_slices() const
*** SIGSEGV (@0x0) received by PID 19649 (TID 0x1530d22c5640) LWP(24061) from PID 0; stack trace: ***
    @ 0x1538ba08ef38 __pthread_once_slow
    @ 0xbf0a514 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @ 0x1538bb58c519 PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
    @ 0x1538bb58cf6e JVM_handle_linux_signal
    @ 0x1538ba03e730 (/usr/lib64/libc.so.6+0x3e72f)
    @ 0x4e40660 starrocks::BinaryColumnBase<unsigned int>::_build_slices() const
    @ 0x8722837 starrocks::StringFunctions::field<(starrocks::LogicalType)17>(starrocks::FunctionContext*, ...)
    @ 0x86e702a std::_Function_handler<...>::_M_invoke(...)
    @ 0x745a68d starrocks::VectorizedFunctionCallExpr::evaluate_checked(starrocks::ExprContext*, starrocks::Chunk*)
    @ 0x665dbdb starrocks::ExprContext::evaluate(starrocks::Expr*, starrocks::Chunk*, unsigned char*)
    @ 0x665df1b starrocks::ExprContext::evaluate(starrocks::Chunk*, unsigned char*)
    @ 0x538b16c starrocks::pipeline::ProjectOperator::push_chunk(starrocks::RuntimeState*, std::shared_ptr<starrocks::Chunk> const&)
    @ 0x544347a starrocks::pipeline::PipelineDriver::process(starrocks::RuntimeState*, int)
    @ 0x58e2aed starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @ 0x460d0d7 starrocks::ThreadPool::dispatch_thread()
    @ 0x4603eb0 starrocks::Thread::supervise_thread(void*)
  • Github Issue:
  • Github Fix PR:
  • Jira:
  • 问题版本:
    • 3.5.0~3.5.13
    • 4.0.0~4.0.6
  • 修复版本:
    • 3.5.14+
    • 4.0.7+
  • 问题原因:
    field 函数在多线程并发执行时,会调用 BinaryColumn::get_data() 来构建 slice cache(懒加载,修改 _slices_cache_slices 等可变状态),该操作非线程安全,导致并发访问时 Crash。修复方式是将常量参数的值在 prepare 阶段提前获取并缓存到 FieldFuncState 中,evaluate 阶段直接使用缓存值,避免并发修改。
  • 临时解决办法:
  1. 修复 Sync MV 中全 NULL 值处理导致的 Crash(ScalarColumnWriter::append SIGSEGV)
*** Aborted at 1770689188 (unix time) try "date -d @1770689188" if you are using GNU date ***
PC: @ 0x97cfe8c starrocks::ScalarColumnWriter::append(starrocks::Column const&)
*** SIGSEGV (@0x11) received by PID 9202 (TID 0x154ae3f71640) LWP(13831) from PID 17; stack trace: ***
    @ 0x97cfe8c starrocks::ScalarColumnWriter::append(starrocks::Column const&)
    @ 0x97bc041 starrocks::SegmentWriter::append_chunk(starrocks::Chunk const&)
    @ 0xa56ab7f starrocks::HorizontalRowsetWriter::add_chunk(starrocks::Chunk const&, ...)
    @ 0xa70fd58 starrocks::SchemaChangeDirectly::process(...)
    @ 0xa717343 starrocks::SchemaChangeHandler::_convert_historical_rowsets(...)
    @ 0xa71b34a starrocks::SchemaChangeHandler::_do_process_alter_tablet_normal(...)
    @ 0xa71d200 starrocks::SchemaChangeHandler::_do_process_alter_tablet(...)
    @ 0xa71e18f starrocks::SchemaChangeHandler::process_alter_tablet(...)
    @ 0xbc4eeab starrocks::EngineAlterTabletTask::execute()
    @ 0xc0f245b starrocks::ThreadPool::dispatch_thread()
    @ 0xc0e8c49 starrocks::Thread::supervise_thread(void*)
  • Github Issue:
  • Github Fix PR:
  • Jira:
  • 问题版本:
    • 3.4.0~3.4.10
    • 3.5.0~3.5.13
    • 4.0.0~4.0.6
  • 修复版本:
    • 3.4.11+
    • 3.5.14+
    • 4.0.7+
  • 问题原因:
    在同步物化视图(Sync MV)场景下,当 MV 的表达式计算结果为全 NULL 的常量列时,ColumnWriter 要求输入必须是非 const 列。通过 unpack_and_duplicate_const_column 展开 only-null 列会得到 FixedLengthColumn<uint8_t> 类型而非实际目标类型,导致写入时内存访问异常。修复方式是使用 ColumnHelper::unfold_const_column(type_desc, ...) 传入正确的 TypeDescriptor 进行展开。
  • 临时解决办法:
  1. Hash Join Spill 在 set finishing task 失败时导致 Crash
*** SIGSEGV (@0x0) received by PID 25 (TID 0x7f2676133640) from PID 0; stack trace: ***
    @     0x7f27de726ee8 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x99ee7)
    @          0x9b0fc69 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x7f27df5ba526 os::Linux::chained_handler(int, siginfo_t*, void*)
    @     0x7f27df5c021b JVM_handle_linux_signal
    @     0x7f27df5b307c signalHandler(int, siginfo_t*, void*)
    @     0x7f27de6cf520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x4251f)
    @          0x506ba30 void std::vector<unsigned char, starrocks::raw::RawAllocator<unsigned char, 16ul, std::allocator<unsigned char> > >::_M_range_insert<unsigned char const*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, starrocks::raw::RawAllocator<}K
    @          0x506e803 starrocks::BinaryColumnBase<unsigned int>::append(starrocks::Column const&, unsigned long, unsigned long)
    @          0x5092e16 starrocks::NullableColumn::append(starrocks::Column const&, unsigned long, unsigned long)
    @          0x7283cd6 starrocks::JoinHashTable::append_chunk(std::shared_ptr<starrocks::Chunk> const&, std::vector<std::shared_ptr<starrocks::Column>, std::allocator<std::shared_ptr<starrocks::Column> > > const&)
    @          0x726a2af starrocks::HashJoinBuilder::append_chunk(std::shared_ptr<starrocks::Chunk> const&)
    @          0x724f9fa starrocks::pipeline::SpillableHashJoinProbeOperator::_load_partition_build_side(starrocks::workgroup::YieldContext&, starrocks::RuntimeState*, std::shared_ptr<starrocks::spill::SpillerReader> const&, unsigned long)
    @          0x724fc78 std::_Function_handler<void (starrocks::workgroup::YieldContext&), starrocks::pipeline::SpillableHashJoinProbeOperator::_load_all_partition_build_side(starrocks::RuntimeState*)::{lambda(auto:1&)#1}>::_M_invoke(std::_Any_data const&, starrocks::workgroup::Y}K
    @          0x4fbaede starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x87563fe starrocks::ThreadPool::dispatch_thread()
    @          0x874ce19 starrocks::Thread::supervise_thread(void*)
    @     0x7f27de721ac3 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x94ac2)
    @     0x7f27de7b2a74 clone
  • Github Issue:
  • Github Fix PR: https://github.com/StarRocks/starrocks/pull/65027
  • Jira:
  • 问题版本:
    • 3.3.0~3.3.19
    • 3.4.0 ~3.4.8
    • 3.5.0~3.5.8
    • 4.0.0~4.0.1
  • 修复版本:
    • 3.3.20+
    • 3.4.9+
    • 3.5.9+
    • 4.0.1+
  • 问题原因: SpillableHashJoinBuildOperator::set_finishing 方法会提交一个异步任务,若该任务执行失败,会将错误状态写入 spiller,但 set_finishing 本身仍返回成功。随后 SpillableHashJoinProbeOperator 继续运行,可能触发 SIGSEGV crash 或死循环。
  • 临时解决办法:
  1. AsyncFlushOutputStream 中 MemTracker use-after-free 导致 Crash(SIGSEGV)
*** Aborted at 1755550827 (unix time) try "date -d @1755550827" if you are using GNU date ***
PC: @ 0x4fafef1 starrocks::CurrentThread::MemCacheManager::commit(bool)
*** SIGSEGV (@0x0) received by PID 25 (TID 0x7f86b0a64640) from PID 0; stack trace: ***
@ 0x7f87c21bfee8 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x99ee7)
@ 0x9b1ba89 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x4fafef1 starrocks::CurrentThread::MemCacheManager::commit(bool)
@ 0x748d3e2 std::_Function_handler<void (), starrocks::io::AsyncFlushOutputStream::close()::{lambda()#1}>::_M_invoke(std::_Any_data const&)
@ 0x4f8007d starrocks::PriorityThreadPool::work_thread(int)
@ 0x9ace5bb thread_proxy
@ 0x7f87c21baac3 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x94ac2)
@ 0x7f87c224ba04 clone
  • Github Issue:
  • Github Fix PR: https://github.com/StarRocks/starrocks/pull/64735
  • Jira:
  • 问题版本:
    • 3.3.0~3.3.19
    • 3.4.0~3.4.8
    • 3.5.0~3.5.7
    • 4.0.0
  • 修复版本:
    • 3.3.20+
    • 3.4.9+
    • 3.5.8+
    • 4.0.1+
  • 问题原因: SCOPED_THREAD_LOCAL_MEM_TRACKER_SETTER 在析构时会将缓存的内存使用量提交给 mem_tracker。但在 AsyncFlushOutputStream::close() 的异步 lambda 中,_promise.set_value(_io_status) 调用完成后,query_context 已被析构,mem_tracker 随之释放,导致后续异步线程中对 mem_tracker 的访问产生 use-after-free,引发 SIGSEGV。
  • 临时解决办法: