升级2.0.1后be挂掉

再从1.19.2升级到2.0.1后执行大sql后,be会经常挂掉,三台会挂掉2台,测试集群内存为16G
这是什么情况呢? be.info.txt (24.4 KB) be.out.txt (4.5 KB)

dmesg -T | grep starrocks 查看是否有OOM

*** SIGSEGV (@0x0) received by PID 2349 (TID 0x7f47433b8700) from PID 0; stack trace: ***
@ 0x33a1be2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f4768b48340 (unknown)
@ 0x2ca8ae7 starrocks::segment_v2::ColumnDecoder::encode_to_global_id()
@ 0x2c105ab starrocks::vectorized::SegmentIterator::_encode_to_global_id()
@ 0x2c17b5f starrocks::vectorized::SegmentIterator::_do_get_next()
@ 0x2c1ace1 starrocks::vectorized::SegmentIterator::do_get_next()
@ 0x2c3d7f9 starrocks::vectorized::ProjectionIterator::do_get_next()
@ 0x1b0a40a starrocks::SegmentIteratorWrapper::do_get_next()
@ 0x189116b starrocks::vectorized::TimedChunkIterator::do_get_next()
@ 0x18d294b starrocks::vectorized::UnionIterator::do_get_next()
@ 0x18c701a starrocks::vectorized::TabletReader::do_get_next()
@ 0x2364024 starrocks::vectorized::TabletScanner::get_chunk()
@ 0x20e400b starrocks::vectorized::OlapScanNode::_scanner_thread()
@ 0x1b3213d starrocks::PriorityThreadPool::work_thread()
@ 0x3349b37 thread_proxy
@ 0x7f4768b40182 start_thread
@ 0x7f4767f4247d clone
@ 0x0 (unknown)

可能是合局字典优化导致的

可以先 set global cbo_enable_low_cardinality_optimize = false;

如果有OOM的话, 并且是FE/BE混部的话, 配置下be.conf中的mem_limit=(机器内存 减去 预留给FE的内存)

能提供一下SQL吗

好的,我设置一下,这个我后来调整了一下fe的内存,现在不会挂掉了,我在观察下。

这个是有的,昨天我调整了下fe内存 就不会在挂掉了,最近在观察下

需要合理配置下be.conf的mem_limit