千万级表的大查询导致be节点挂掉

【StarRocks版本】例如:2.2.1
【集群规模】例如:3fe(1 follower+2observer)+3be(fe与be混部)
【机器信息】8C/16G/千兆

服务器16G内存,FE设置最大内存3G,BE设置最大内存6G。

一个100多G的千万级的表的大查询,执行sql后无反应,一段时间以后报错如下:

Error 1064: Memory of process exceed limit. read and decompress page Used: 5798259544, Limit: 5798205849. Mem usage has exceed the limit of BE

然后发现一个BE节点崩溃。

系统日志中并未发现OOM信息。

【附件】
be.warn.log和be.INFO的日志基本都是如下:

I0629 14:01:08.323690 10974 task_worker_pool.cpp:863] get publish version task, signature:271593 txn_id: 271593 priority queue size: 1
I0629 14:01:08.324478 10974 task_worker_pool.cpp:791] Publish version on partition. partition: 80120, txn_id: 271593, version: 113376
I0629 14:01:08.324502 10974 task_worker_pool.cpp:892] publish_version success. signature:271593 txn_id: 271593 related tablet num: 0 ti
me: 1ms
I0629 14:01:08.434604 10956 compaction.cpp:139] succeed to do cumulative compaction. tablet=86548, output_version=628-638, input infos
[segments=11, rows=9206, disk size=18193189], output infos [segments=1, rows=9206, disk size=17523775]. elapsed time=3.90068s.
I0629 14:01:08.451534 10956 tablet_manager.cpp:598] Found the best tablet to compact. compaction_type=cumulative tablet_id=81761 highes
t_score=3
W0629 14:01:09.129861 10912 pipeline_driver.cpp:146] pull_chunk returns not ok status Invalid argument: Fail to do LZ4FRAME decompress,
** res=ERROR_allocation_failed**
/root/starrocks/be/src/storage/rowset/page_io.cpp:187 opts.codec->decompress(compressed_body, &decompressed_body)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:299 _reader->read_page(_opts, iter.page(), &handle, &page_body, &foote
r)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:101 _read_data_page(_page_iter)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:106 iter->seek_to_ordinal(pos)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:599 _context->seek_columns(_cur_rowid)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:675 _read(chunk, rowid, chunk_capacity - chunk_start)
/root/starrocks/be/src/storage/vectorized/tablet_reader.cpp:86 _collect_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/olap_chunk_source.cpp:430 _prj_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/scan_operator.cpp:140 get_scan_status()
W0629 14:01:09.134925 10913 pipeline_driver.cpp:146] pull_chunk returns not ok status Invalid argument: Fail to do LZ4FRAME decompress,
** res=ERROR_allocation_failed**
/root/starrocks/be/src/storage/rowset/page_io.cpp:187 opts.codec->decompress(compressed_body, &decompressed_body)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:299 _reader->read_page(_opts, iter.page(), &handle, &page_body, &foote
r)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:259 _read_data_page(_page_iter)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:208 _load_next_page(&eos)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:126 _column_iterators[i]->next_batch(range, col.get())
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:608 _context->read_columns(chunk, range)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:675 _read(chunk, rowid, chunk_capacity - chunk_start)
/root/starrocks/be/src/storage/vectorized/tablet_reader.cpp:86 _collect_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/olap_chunk_source.cpp:430 _prj_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/scan_operator.cpp:140 get_scan_status()
W0629 14:01:09.136391 10909 pipeline_driver.cpp:146] pull_chunk returns not ok status Invalid argument: Fail to do LZ4FRAME decompress,
** res=ERROR_allocation_failed**
/root/starrocks/be/src/storage/rowset/page_io.cpp:187 opts.codec->decompress(compressed_body, &decompressed_body)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:299 _reader->read_page(_opts, iter.page(), &handle, &page_body, &foote
r)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:101 _read_data_page(_page_iter)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:106 iter->seek_to_ordinal(pos)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:599 _context->seek_columns(_cur_rowid)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:675 _read(chunk, rowid, chunk_capacity - chunk_start)
/root/starrocks/be/src/storage/vectorized/tablet_reader.cpp:86 _collect_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/olap_chunk_source.cpp:430 _prj_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/scan_operator.cpp:140 get_scan_status()
W0629 14:01:09.214877 10909 pipeline_driver_executor.cpp:119] [Driver] Process error, query_id=db0db8b6-f770-11ec-86a1-00163e35239f, in
stance_id=db0db8b6-f770-11ec-86a1-00163e3523a0, error=Invalid argument: Fail to do LZ4FRAME decompress, res=ERROR_allocation_failed
/root/starrocks/be/src/storage/rowset/page_io.cpp:187 opts.codec->decompress(compressed_body, &decompressed_body)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:299 _reader->read_page(_opts, iter.page(), &handle, &page_body, &foote
r)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:101 _read_data_page(_page_iter)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:106 iter->seek_to_ordinal(pos)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:599 _context->seek_columns(_cur_rowid)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:675 _read(chunk, rowid, chunk_capacity - chunk_start)
/root/starrocks/be/src/storage/vectorized/tablet_reader.cpp:86 _collect_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/olap_chunk_source.cpp:430 _prj_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/scan_operator.cpp:140 get_scan_status()
W0629 14:01:09.214761 10912 pipeline_driver_executor.cpp:119] [Driver] Process error, query_id=db0db8b6-f770-11ec-86a1-00163e35239f, in
stance_id=db0db8b6-f770-11ec-86a1-00163e3523a0, error=Invalid argument: Fail to do LZ4FRAME decompress, res=ERROR_allocation_failed
/root/starrocks/be/src/storage/rowset/page_io.cpp:187 opts.codec->decompress(compressed_body, &decompressed_body)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:299 _reader->read_page(_opts, iter.page(), &handle, &page_body, &foote
r)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:101 _read_data_page(_page_iter)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:106 iter->seek_to_ordinal(pos)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:599 _context->seek_columns(_cur_rowid)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:675 _read(chunk, rowid, chunk_capacity - chunk_start)
/root/starrocks/be/src/storage/vectorized/tablet_reader.cpp:86 _collect_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/olap_chunk_source.cpp:430 _prj_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/scan_operator.cpp:140 get_scan_status()
W0629 14:01:09.214823 10913 pipeline_driver_executor.cpp:119] [Driver] Process error, query_id=db0db8b6-f770-11ec-86a1-00163e35239f, in
stance_id=db0db8b6-f770-11ec-86a1-00163e3523a0, error=Invalid argument: Fail to do LZ4FRAME decompress, res=ERROR_allocation_failed
/root/starrocks/be/src/storage/rowset/page_io.cpp:187 opts.codec->decompress(compressed_body, &decompressed_body)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:299 _reader->read_page(_opts, iter.page(), &handle, &page_body, &foote
r)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:259 _read_data_page(_page_iter)
/root/starrocks/be/src/storage/rowset/scalar_column_iterator.cpp:208 _load_next_page(&eos)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:126 _column_iterators[i]->next_batch(range, col.get())
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:608 _context->read_columns(chunk, range)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:675 _read(chunk, rowid, chunk_capacity - chunk_start)
/root/starrocks/be/src/storage/vectorized/tablet_reader.cpp:86 _collect_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/olap_chunk_source.cpp:430 _prj_iter->get_next(chunk)
/root/starrocks/be/src/exec/pipeline/scan_operator.cpp:140 get_scan_status()
I0629 14:01:09.153100 10950 load_channel_mgr.cpp:175] Memory consumption(bytes) limit=1739461754 current=46084032 peak=167423568
I0629 14:01:09.249935 10907 pipeline_driver_executor.cpp:203] [Driver] Succeed to report exec state: fragment_instance_id=db0db8b6-f770
-11ec-86a1-00163e3523a0
I0629 14:01:09.463865 10956 tablet_manager.cpp:598] Found the best tablet to compact. compaction_type=cumulative tablet_id=81764 highes
t_score=3
I0629 14:17:50.295936 27912 daemon.cpp:254] version 2.2.1 RELEASE (build 147f178)
Built on 2022-06-01 19:36:57 by StarRocks@docker
I0629 14:17:50.394028 27912 mem_info.cpp:78] Physical Memory: 15.51 GB

  • Profile信息
  • 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
    image
  • cbo是否开启:show variables like ‘%cbo%’;
    image
  • be节点cpu和内存使用率截图
    内存监控曲线:

CPU略有上升。

碰到这种情况,只能扩大BE的分配内存吗?

be.out 日志提供一下

be.out里只要这个

src/central_freelist.cc:333] tcmalloc: allocation failed 49152
terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
*** Aborted at 1656482469 (unix time) try “date -d @1656482469” if you are using GNU date ***
PC: @ 0x7f4901486387 __GI_raise
*** SIGABRT (@0x1f4000029ba) received by PID 10682 (TID 0x7f4840149700) from PID 10682; stack trace: ***
@ 0x3ca57d2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f4901f3b630 (unknown)
@ 0x7f4901486387 __GI_raise
@ 0x7f4901487a78 __GI_abort
@ 0x17b3f6d _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x5663446 __cxxabiv1::__terminate()
@ 0x56634b1 std::terminate()
@ 0x5663604 __cxa_throw
@ 0x17b3e74 _Znwm.cold
@ 0x3121c44 starrocks::vectorized::add_binary_column()
@ 0x310db25 starrocks::vectorized::add_nullable_column()
@ 0x310ebf3 starrocks::vectorized::add_nullable_column()
@ 0x25ad686 starrocks::vectorized::JsonReader::_construct_row_in_slot_order()
@ 0x25ae445 starrocks::vectorized::JsonReader::_construct_row()
@ 0x25b216b starrocks::vectorized::JsonReader::_read_rows<>()
@ 0x25ae781 starrocks::vectorized::JsonReader::read_chunk()
@ 0x25aeaec starrocks::vectorized::JsonScanner::get_next()
@ 0x259c1e0 starrocks::vectorized::FileScanNode::_scanner_scan()
@ 0x259db3f starrocks::vectorized::FileScanNode::_scanner_worker()
@ 0x56dd7d0 execute_native_thread_routine
@ 0x7f4901f33ea5 start_thread
@ 0x7f490154eb0d __clone
@ 0x0 (unknown)