be节点启动卡住

【详述】问题详细描述
2023-01-13 13:41:10 反馈此be节点不可用
2023-01-13 14:40:49 进行重启be. 但是卡住。

be.INFO 日志
I0113 14:32:44.677588 13201 tablet_updates.cpp:1037] apI0113 14:32:52.383054 14329 daemon.cpp:280] version 2.4.2 RELEASE (build 3994421)
Built on 2022-12-14 16:19:02 by StarRocks@docker
I0113 14:32:52.384732 14329 mem_info.cpp:74] Physical Memory: 30.33 GB
I0113 14:32:52.384743 14329 daemon.cpp:286] Cpu Info:
Model: Intel® Xeon® Platinum 8378A CPU @ 3.00GHz
Cores: 16
Max Possible Cores: 16
L1 Cache: 48.00 KB (Line: 64.00 B)
L2 Cache: 1.25 MB (Line: 64.00 B)
L3 Cache: 48.00 MB (Line: 64.00 B)
Hardware Supports:
ssse3
sse4_1
sse4_2
popcnt
avx
avx2
Numa Nodes: 1
Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->0 | 7->0 | 8->0 | 9->0 | 10->0 | 11->0 | 12->0 | 13->0 | 14->0 | 15->0 |
I0113 14:32:52.384778 14329 daemon.cpp:287] Disk Info:
Num disks 3: sda, sdb, sdc
I0113 14:32:52.384781 14329 daemon.cpp:288] Mem Info: 30.33 GB
I0113 14:32:52.477399 14329 daemon.cpp:263] Minidump is disabled
I0113 14:32:52.477434 14329 backend_options.cpp:100] priority cidrs in conf: 10.125.152.0/24
I0113 14:32:52.477490 14329 backend_options.cpp:77] localhost 10.125.152.18
I0113 14:32:52.478034 14329 exec_env.cpp:403] Set storage page cache size 3478923509
I0113 14:32:52.478029 14341 daemon.cpp:191] Current memory statistics: process(28279584), query_pool(0), load(0), metadata(0), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0113 14:32:52.479008 14345 data_dir.cpp:121] path: /data/software/starrocks2.4.2/be/storage, hash: 3679224121446567612
I0113 14:32:52.612040 14413 data_dir.cpp:245] start to load tablets from /data/software/starrocks2.4.2/be/storage
I0113 14:32:52.612057 14413 data_dir.cpp:251] begin loading rowset from meta
I0113 14:32:56.292480 14413 data_dir.cpp:269] load rowset from meta finished, data dir: /data/software/starrocks2.4.2/be/storage
I0113 14:32:56.292495 14413 data_dir.cpp:274] begin loading tablet from meta
I0113 14:33:07.479049 14341 daemon.cpp:191] Current memory statistics: process(2183224368), query_pool(0), load(0), metadata(1482709103), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0113 14:33:11.505074 14413 data_dir.cpp:310] load tablet from meta finished, loaded tablet: 399917, error tablet: 0, path: /data/software/starrocks2.4.2/be/storage
I0113 14:33:12.253716 14544 fragment_mgr.cpp:507] FragmentMgr cancel worker start working.
I0113 14:33:12.255510 14329 exec_env.cpp:163] [PIPELINE] Exec thread pool: thread_num=16
I0113 14:33:12.309255 14772 runtime_filter_worker.cpp:735] RuntimeFilterWorker start working.
I0113 14:33:12.309348 14774 result_buffer_mgr.cpp:135] result buffer manager cancel thread begin.

be.out 日志
start time: Fri Jan 13 14:40:49 CST 2023

be.WARNING.log
日志中没有 此次启动时间点Fri Jan 13 14:40:49 CST 2023以后的日志,

但是在此之前有如下报错
W0113 13:41:03.373188 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.373208 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.373221 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.373230 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.373245 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.373257 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.373279 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.373291 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.373299 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.375905 28957 tablet_sink.cpp:975] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.380998 28957 tablet_sink.cpp:1043] close channel failed. channel_name=NodeChannel[34635016-10003], load_info=load_id=de0fc0ed-9304-11ed-8617-fa163e5f7af3, txn_id: 9817341, parallel=1, compress_type=2, error_msg=Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
E0113 13:41:03.705703 29192 delta_writer.cpp:110] Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
W0113 13:41:03.705720 29192 load_channel.cpp:76] Fail to open index 34635016 of load de48d25e930411ed-8617fa163e5f7af3: Service unavailable: Too many versions. tablet_id: 34635017, version_count: 23912, limit: 1000
/root/starrocks/be/src/storage/delta_writer.cpp:22 writer->_init()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:482 res.status()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:228 _open_all_writers(params)
E0113 13:41:08.898676 29197 delta_writer.cpp:110] Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
W0113 13:41:08.898715 29197 load_channel.cpp:76] Fail to open index 34635016 of load e14dd51e930411ed-8617fa163e5f7af3: Service unavailable: Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
/root/starrocks/be/src/storage/delta_writer.cpp:22 writer->_init()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:482 res.status()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:228 _open_all_writers(params)


E0113 13:41:09.950217 29239 delta_writer.cpp:110] Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
W0113 13:41:09.950255 29239 load_channel.cpp:76] Fail to open index 34635016 of load e20453e2930411ed-8617fa163e5f7af3: Service unavailable: Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
/root/starrocks/be/src/storage/delta_writer.cpp:22 writer->_init()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:482 res.status()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:228 _open_all_writers(params)
E0113 13:41:10.107261 29189 delta_writer.cpp:110] Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
W0113 13:41:10.107287 29189 load_channel.cpp:76] Fail to open index 34635016 of load e218ed57930411ed-8617fa163e5f7af3: Service unavailable: Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
/root/starrocks/be/src/storage/delta_writer.cpp:22 writer->_init()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:482 res.status()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:228 _open_all_writers(params)
E0113 13:41:10.259924 29195 delta_writer.cpp:110] Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
W0113 13:41:10.259943 29195 load_channel.cpp:76] Fail to open index 34635016 of load e230e22d930411ed-8617fa163e5f7af3: Service unavailable: Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
/root/starrocks/be/src/storage/delta_writer.cpp:22 writer->_init()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:482 res.status()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:228 _open_all_writers(params)
E0113 13:41:10.567404 29194 delta_writer.cpp:110] Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
W0113 13:41:10.567423 29194 load_channel.cpp:76] Fail to open index 34635016 of load e262a099930411ed-8617fa163e5f7af3: Service unavailable: Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
/root/starrocks/be/src/storage/delta_writer.cpp:22 writer->_init()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:482 res.status()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:228 _open_all_writers(params)
E0113 13:41:11.041436 29237 delta_writer.cpp:110] Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
W0113 13:41:11.041477 29237 load_channel.cpp:76] Fail to open index 34635016 of load e2a85c4d930411ed-8617fa163e5f7af3: Service unavailable: Too many versions. tablet_id: 34635017, version_count: 23994, limit: 1000
/root/starrocks/be/src/storage/delta_writer.cpp:22 writer->_init()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:482 res.status()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:228 _open_all_writers(params)
W0113 13:41:11.346036 28937 tablet_sink.cpp:955] NodeChannel[37579350-10004], tablet add chunk failed, load_id=d7493b16-65f1-8385-9250-d64ae29795bd, txn_id: 9817412, parallel=1, compress_type=2, node=10.125.152.156:8060, errmsg=Memory of process exceed limit. try consume:917504 Used: 22632846408, Limit: 17394617548. Mem usage has exceed the limit of BE
W0113 13:41:11.346680 28937 tablet_sink.cpp:955] NodeChannel[37579350-10003], tablet add chunk failed, load_id=d7493b16-65f1-8385-9250-d64ae29795bd, txn_id: 9817412, parallel=1, compress_type=2, node=10.125.152.239:8060, errmsg=Memory of process exceed limit. try consume:917504 Used: 22632846408, Limit: 17394617548. Mem usage has exceed the limit of BE
W0113 13:41:11.346707 28937 plan_fragment_executor.cpp:178] Fail to open fragment, instance_id=d7493b16-65f1-8385-9250-d64ae29795be, status=Memory limit exceeded: Memory of process exceed limit. try consume:917504 Used: 22632846408, Limit: 17394617548. Mem usage has exceed the limit of BE
/root/starrocks/be/src/exec/tablet_sink.cpp:377 _serialize_chunk(chunk.get(), pchunk)
/root/starrocks/be/src/exec/tablet_sink.cpp:922 _send_chunk_by_node(chunk, _channels[i].get(), _validate_select_idx)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:224 _sink->send_chunk(runtime_state(), chunk.get())
W0113 13:41:11.392951 28937 fragment_mgr.cpp:182] Fail to open fragment d7493b16-65f1-8385-9250-d64ae29795be: Memory limit exceeded: Memory of process exceed limit. try consume:917504 Used: 22632846408, Limit: 17394617548. Mem usage has exceed the limit of BE
/root/starrocks/be/src/exec/tablet_sink.cpp:377 _serialize_chunk(chunk.get(), pchunk)
/root/starrocks/be/src/exec/tablet_sink.cpp:922 _send_chunk_by_node(chunk, _channels[i].get(), _validate_select_idx)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:224 _sink->send_chunk(runtime_state(), chunk.get())
W0113 13:41:11.393337 28937 stream_load_executor.cpp:92] fragment execute failed, query_id=d7493b1665f18385-9250d64ae29795bd, err_msg=Memory of process exceed limit. try consume:917504 Used: 22632846408, Limit: 17394617548. Mem usage has exceed the limit of BE, id=d7493b1665f18385-9250d64ae29795bd, job_id=-1, txn_id: 9817412, label=13a7a4f4-65ba-47b7-ba45-c5ae910ce788, db=ods_gpmpp_tenant_org_v3
W0113 13:41:11.393378 29276 stream_load.cpp:135] Fail to handle streaming load, id=d7493b1665f18385-9250d64ae29795bd errmsg=Memory of process exceed limit. try consume:917504 Used: 22632846408, Limit: 17394617548. Mem usage has exceed the limit of BE

集群的相关配置麻烦告知下几fe几be,是否是混合部署,看be.out日志里面的start time: Fri Jan 13 14:40:49 CST 2023,be是启动成功了的,反馈节点不可用是be挂了还是?您这边导入频率多高?看着有超版本和超内存的错误,be有没有oom?混合部署的话有没有在be.conf里面配置mem_limit?