创建异步多表物化视图后be节点掉线、无法重启成功,启动时进程内存持续飙升

【详述】创建异步多表物化视图后,be节点掉线,无法重新启动,启动时be时,进程内存持续飙升
【背景】集群故障前执行了创建异步多表物化视图操作,联表较多且数据量比较大,be节点被影响掉线。后面删除了该物化视图,be节点仍然无法启动成功,内存占用量持续上涨。(服务器额外部署的还有es)
【业务影响】
【StarRocks版本】2.5.0
【集群规模】例如:1fe+3be
【机器信息】 8C/32G
【附件】

  • /be.INFO相应截图

I0412 17:06:34.036810 25587 daemon.cpp:277] version 2.5.0 RELEASE (build 6abafd3)
Built on 2023-01-21 22:52:53 by StarRocks@docker
I0412 17:06:34.037530 25587 mem_info.cpp:90] Physical Memory: 31.26 GB
I0412 17:06:34.037542 25587 daemon.cpp:283] Cpu Info:
Model: Intel® Xeon® Platinum 8255C CPU @ 2.50GHz
Cores: 8
Max Possible Cores: 8
L1 Cache: 32.00 KB (Line: 64.00 B)
L2 Cache: 2.00 MB (Line: 64.00 B)
L3 Cache: 0 (Line: 0)
Hardware Supports:
ssse3
sse4_1
sse4_2
popcnt
avx
avx2
Numa Nodes: 1
Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->0 | 7->0 |
I0412 17:06:34.037575 25587 daemon.cpp:284] Disk Info:
Num disks 3: vda, vdb, sr
I0412 17:06:34.037580 25587 daemon.cpp:285] Mem Info: 31.26 GB
I0412 17:06:34.500223 25587 daemon.cpp:260] Minidump is disabled
I0412 17:06:34.500955 25599 daemon.cpp:188] Current memory statistics: process(0), query_pool(0), load(0), metadata(0), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:06:34.501024 25587 backend_options.cpp:77] localhost 10.120.154.7
I0412 17:06:34.501091 25587 exec_env.cpp:415] Set storage page cache size 906291928
I0412 17:06:34.502997 25601 data_dir.cpp:113] path: /apps/StarRocks-2.3.0/be/storage, hash: -1914924885135165005
I0412 17:06:34.637490 25668 data_dir.cpp:237] start to load tablets from /apps/StarRocks-2.3.0/be/storage
I0412 17:06:34.637507 25668 data_dir.cpp:243] begin loading rowset from meta
I0412 17:06:40.567783 25668 data_dir.cpp:261] load rowset from meta finished, data dir: /apps/StarRocks-2.3.0/be/storage
I0412 17:06:40.567803 25668 data_dir.cpp:266] begin loading tablet from meta
I0412 17:06:40.598260 25668 tablet_manager.cpp:709] Loaded shutdown tablet 10017
I0412 17:06:40.602175 25668 tablet_manager.cpp:709] Loaded shutdown tablet 10020
I0412 17:06:40.605949 25668 tablet_manager.cpp:709] Loaded shutdown tablet 10023
I0412 17:06:40.611582 25668 tablet_manager.cpp:709] Loaded shutdown tablet 10026
I0412 17:06:40.619177 25668 tablet_manager.cpp:709] Loaded shutdown tablet 10032
I0412 17:06:40.625247 25668 tablet_manager.cpp:709] Loaded shutdown tablet 10035
I0412 17:06:40.629135 25668 tablet_manager.cpp:709] Loaded shutdown tablet 10038
I0412 17:06:49.501686 25599 daemon.cpp:188] Current memory statistics: process(1387755328), query_pool(0), load(0), metadata(805302439), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:07:04.502326 25599 daemon.cpp:188] Current memory statistics: process(1823979976), query_pool(0), load(0), metadata(1075181877), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:07:19.503038 25599 daemon.cpp:188] Current memory statistics: process(2203673768), query_pool(0), load(0), metadata(1310690898), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:07:34.503759 25599 daemon.cpp:188] Current memory statistics: process(2594066304), query_pool(0), load(0), metadata(1552253905), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:07:49.504436 25599 daemon.cpp:188] Current memory statistics: process(3000922400), query_pool(0), load(0), metadata(1805073621), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:08:04.506279 25599 daemon.cpp:188] Current memory statistics: process(3400121488), query_pool(0), load(0), metadata(2050601638), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:08:19.507077 25599 daemon.cpp:188] Current memory statistics: process(3790198912), query_pool(0), load(0), metadata(2292483370), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:08:34.507892 25599 daemon.cpp:188] Current memory statistics: process(4176135848), query_pool(0), load(0), metadata(2531933908), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:08:49.508658 25599 daemon.cpp:188] Current memory statistics: process(4597672328), query_pool(0), load(0), metadata(2795348395), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:09:04.509507 25599 daemon.cpp:188] Current memory statistics: process(4979362280), query_pool(0), load(0), metadata(3032972033), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:09:19.510375 25599 daemon.cpp:188] Current memory statistics: process(5407190728), query_pool(0), load(0), metadata(3298725202), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:09:34.511139 25599 daemon.cpp:188] Current memory statistics: process(5831338600), query_pool(0), load(0), metadata(3555238175), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:09:49.511901 25599 daemon.cpp:188] Current memory statistics: process(6192166296), query_pool(0), load(0), metadata(3776846220), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0412 17:10:04.512692 25599 daemon.cpp:188] Current memory statistics: process(6492062040), query_pool(0), load(0), metadata(3961943290), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)

fe.log中可以看到be节点一直在自我修复tablet,tablet的数量一直在减少



那么问题来了 be节点tablet数是怎样彪到了360w的呢? 异步视图的分区为天,桶数15

感觉此时的be节点有大量无用的tablet,但是还未完成清理,我应该如何操作 可以加快清理tablet 完成meta恢复?

创建和删除都是异步过程,不是cancel行为,等他全部跑完吧。咱们三个BE,配置是8C/32G,磁盘可能是HDD?如果数据量很大,创建多表异步物化视图,可能压力会很大。

be节点混部的有es,已经占用了60%左右的内存, 现在be启动非常困难,一直吃内存,,即时成功启动,总内存占用近100, 不一会就把es挤掉线,我这里又要首先保证es的可用,mem_limit设置的内存上限限制不住,有别的办法可以强制控制be内存使用吗

生产配置请参考

另外如何跟其他服务混合部署,be.conf配置下mem_limit