BE节点连续挂掉

【详述】测试环境BE节点连续挂掉
【背景】新版本2.2.13,8C/32G服务器上混布的东西较多,FE分配了4G内存、BE设置了6G内存。
【业务影响】影响产品开发测试,挂掉的频率较高,每天挂几次
【StarRocks版本】2.2.13
【集群规模】3fe(3 follower)+3be(fe与be混部)
【机器信息】虚拟机 见背景那行
【联系方式】StarRocks社区微信群6-土豆

  • be.out

src/central_freelist.cc:333] tcmalloc: allocation failed 262144
terminate called after throwing an instance of ‘terminate called recursively
std::bad_alloc’
terminate called recursively
what(): std::bad_allocterminate called recursively
query_id:00000000-0000-0000-0000-000000000000

*** Aborted at 1681807390 (unix time) try “date -d @1681807390” if you are using GNU date ***
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
PC: @ 0x7f8aad88c387 __GI_raise
*** SIGABRT (@0x271a0000bd0e) received by PID 48398 (TID 0x7f89f0356700) from PID 48398; stack trace: ***
@ 0x3db8592 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f8aae341630 (unknown)
@ 0x7f8aad88c387 __GI_raise
@ 0x7f8aad88da78 __GI_abort
@ 0x57779d2 __gnu_cxx::__verbose_terminate_handler()
@ 0x5776486 __cxxabiv1::__terminate()
@ 0x57764f1 std::terminate()
@ 0x5776644 __cxa_throw
@ 0x183cd14 _Znwm.cold
@ 0x1e3fca4 starrocks::vectorized::ReplaceAggregator<>::append_data()
@ 0x1e4bb42 starrocks::vectorized::ValueColumnAggregator<>::aggregate_values()
@ 0x1e3c17e starrocks::vectorized::ReplaceNullableColumnAggregator::aggregate_values()
@ 0x1b5ac5f starrocks::vectorized::ChunkAggregator::aggregate()
@ 0x1b6435e starrocks::vectorized::MemTable::_aggregate()
@ 0x1b6b5b7 starrocks::vectorized::MemTable::_merge()
@ 0x1b6bc00 starrocks::vectorized::MemTable::finalize()
@ 0x3284a2c starrocks::vectorized::DeltaWriter::_flush_memtable_async()
@ 0x328522c starrocks::vectorized::DeltaWriter::close()
@ 0x327678c starrocks::vectorized::AsyncDeltaWriter::_execute()
@ 0x3e88dac bthread::ExecutionQueueBase::_execute()
@ 0x3e89b78 bthread::ExecutionQueueBase::_execute_tasks()
@ 0x2001889 starrocks::ThreadPool::dispatch_thread()
@ 0x1ffd43a starrocks::thread::supervise_thread()
@ 0x7f8aae339ea5 start_thread
@ 0x7f8aad954b0d __clone
@ 0x0 (unknown)
start time: Tue Apr 18 17:04:42 CST 2023
terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
query_id:00000000-0000-0000-0000-000000000000
*** Aborted at 1681871983 (unix time) try “date -d @1681871983” if you are using GNU date ***
PC: @ 0x7f2efae1b387 __GI_raise
*** SIGABRT (@0x271a00018fb0) received by PID 102320 (TID 0x7f2ecc9a5700) from PID 102320; stack trace: ***
@ 0x3db8592 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f2efb8d0630 (unknown)
@ 0x7f2efae1b387 __GI_raise
@ 0x7f2efae1ca78 __GI_abort
@ 0x183ce0d _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x5776486 __cxxabiv1::__terminate()
@ 0x57764f1 std::terminate()
@ 0x5776644 __cxa_throw
@ 0x183cd14 _Znwm.cold
@ 0x19b408e std::vector<>::_M_range_insert<>()
@ 0x19af175 starrocks::vectorized::BinaryColumn::append_continuous_strings()
@ 0x237ed8e starrocks::vectorized::NullableColumn::append_continuous_strings()
@ 0x1e5af2d starrocks::BinaryPlainPageDecoder<>::next_batch()
@ 0x1cab96a starrocks::ParsedPageV2::read()
@ 0x1c828a2 starrocks::ScalarColumnIterator::next_batch()
@ 0x1ac0320 starrocks::vectorized::SegmentIterator::_read()
@ 0x1ab9d16 starrocks::vectorized::SegmentIterator::_do_get_next()
@ 0x1abcc71 starrocks::vectorized::SegmentIterator::do_get_next()
@ 0x1b1b242 starrocks::vectorized::ProjectionIterator::do_get_next()
terminate called recursively
@ 0x1b592e4 starrocks::vectorized::UnionIterator::do_get_next()
@ 0x1e7dada starrocks::SegmentIteratorWrapper::do_get_next()
@ 0x1b538bb starrocks::vectorized::TimedChunkIterator::do_get_next()
@ 0x1b592e4 starrocks::vectorized::UnionIterator::do_get_next()
@ 0x1b4c0ae starrocks::vectorized::TabletReader::do_get_next()
@ 0x282f32d starrocks::pipeline::OlapChunkSource::_read_chunk_from_storage()
@ 0x282f9b0 starrocks::pipeline::OlapChunkSource::buffer_next_batch_chunks_blocking()
@ 0x2833103 _ZNSt17_Function_handlerIFvvEZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS1_12RuntimeStateEiEUlvE0_E9_M_invokeERKSt9_Any_data
@ 0x1ea52d0 starrocks::PriorityThreadPool::work_thread()
@ 0x3d53bc7 thread_proxy
@ 0x7f2efb8c8ea5 start_thread
@ 0x7f2efaee3b0d __clone
@ 0x0 (unknown)

您好 看堆栈这个是一个已知问题 , 当前主键模型导入,需要从存储层按行读取所有数据,无法按列Group读取,当列比较多,并且一次性导入大量数据时,会消耗大量内存。

现像上就是导入占用了大量内存,您资源充足的情况下可以调大点内存,或通过下面的方式来缓解:

  1. 减少一次导入的数据量
  2. 增加分桶数
  3. 修改 be.conf load_process_max_memory_limit_percent为一个比较小的值

没用主键模型,只有明细模型和更新模型;
我参考您的建议调整下参数。