3.1.5读写分离节点执行sql提示内存超限 实际提示不对还有内存

【详述】导入和执行sql报错 failed.msg:SQLSyntaxErrorException: Memory of process exceed limit. Start execute plan fragment. Used: 114615020088, Limit: 113770363944. Mem usage has exceed the limit of BE backend
实际


image

【背景】 已配置


【业务影响】执行报错
【是否存算分离】是
【StarRocks版本】3.1.5
【集群规模】例如:3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】16h128gb2 cn 16h32gb1fe
【联系方式】haoziu@163.com
【附件】
内存统计和具体使用不太一致哇

换成256gb也是 130多gb就报错

会预申请内存,在预申请的时候超限了就会报错。mem_tracker 中的内存统计是每15秒上报一次,top是瞬时的

预申请这么多哇 感觉不合理哇 2 3个g还行 256的时候 90个g都不够哇

有办法跳过这个报错不 反正也是报错 还不是让他实际一直用 现在白浪费 却查不了 256gb都不够 emm

这个报错是单个be达到内存限制了,报错中的used是当前be所有的内存占用,不是单个sql查询申请这么大

256g内存的时候,查询报错内容也发一下

256gb 和这个差不多SQLSyntaxErrorException: Memory of process exceed limit. Start execute plan fragment. Used: 227634391912, Limit: 227634303466. Mem usage has exceed the limit of BE backen 但是实际使用才130gb 每次这个时候就报错 太浪费了吧 报错的时候只有小查询能查询

是哇 但是 128gb的系统总内存才70gb使用 就报错 256gb 130就报错 这能避免不 主要报错时候会影响导入和稍大的查询 觉得这个不太合理 是因为分离架构 很多表开启了cache么

存算分离的cache 是缓存在磁盘上的

在报错的这个 be.INFO 中搜一下报错前后的 内存统计信息,Current memory statistics

I1218 16:34:50.746234 9848 daemon.cpp:211] Current memory statistics: process(105685322816), query_pool(0), load(0), metadata(437552748), compaction(1010120016), schema_change(0), column_pool(65496358273), page_cache(339230848), update(27580040573), chunk_allocator(157780136), clone(0), consistency(0)

I1218 16:35:05.750115 9848 daemon.cpp:211] Current memory statistics: process(111992735560), query_pool(0), load(0), metadata(440315868), compaction(538701352), schema_change(0), column_pool(71239417690), page_cache(344498672), update(27915786173), chunk_allocator(157780136), clone(0), consistency(0)

I1218 16:35:20.754274 9848 daemon.cpp:211] Current memory statistics: process(113704838904), query_pool(-47408704), load(0), metadata(442195867), compaction(1079213504), schema_change(0), column_pool(72221713582), page_cache(340346928), update(27915786173), chunk_allocator(157694120), clone(0), consistency(0)

I1218 16:35:35.761488 9848 daemon.cpp:211] Current memory statistics: process(112521752008), query_pool(1307016), load(0), metadata(444732188), compaction(1859840144), schema_change(0), column_pool(70517661507), page_cache(328100336), update(27932594901), chunk_allocator(157771944), clone(0), consistency(0)

W1218 16:35:50.757510 10122 pipeline_driver_executor.cpp:161] [Driver] Process error, query_id=730c3afd-9d80-11ee-b55a-1a5e7986dccb, instance_id=730c3afd-9d80-11ee-b55a-1a5e7986dccc, status=Memory limit exceeded: Memory of process exceed limit. Pipeline Backend: 172.31.143.160, fragment: 730c3afd-9d80-11ee-b55a-1a5e7986dccc Used: 117493358375, Limit: 117363112279. Mem usage has exceed the limit of BE

I1218 16:35:50.765421 9848 daemon.cpp:211] Current memory statistics: process(117392726264), query_pool(1420032), load(0), metadata(446900077), compaction(3322393272), schema_change(0), column_pool(70596151497), page_cache(344030720), update(27647350717), chunk_allocator(157780136), clone(0), consistency(0)

I1218 16:36:05.769121 9848 daemon.cpp:211] Current memory statistics: process(109173265984), query_pool(2126512), load(11984), metadata(447996526), compaction(2524388992), schema_change(0), column_pool(65461688539), page_cache(334846832), update(27412107613), chunk_allocator(157042856), clone(0), consistency(0)

是的 表全是这样建的 填的oss的桶
“datacache.enable” = “true”,
“datacache.partition_duration” = “1 MONTH”,
“enable_async_write_back” = “false”,
“replication_num” = “1”

在这个时间点附近报 查询超限了么,现在机器已经是 256G 内存了么

这个是128gb的 256的差不多哇 总之很多内存和这个使用不一致诶 image

主要应该是导入数据估计消耗点么

top看 每个节点的内存统计都是79G左右的样子么

curl -XGET -s http://be_ip:8040/metrics | grep “^starrocks_be_.*_mem_bytes|^starrocks_be_tcmalloc_bytes_in_use”
发一下两个cn的这个返回结果截图

返回是空的哇

这样就对了
curl -X GET -s http://172.31.143.161:8041/metrics | grep -E “starrocks_be_.*_mem_bytes|starrocks_be_tcmalloc_bytes_in_use”

TYPE starrocks_be_bitmap_index_mem_bytes gauge

starrocks_be_bitmap_index_mem_bytes 0

TYPE starrocks_be_bloom_filter_index_mem_bytes gauge

starrocks_be_bloom_filter_index_mem_bytes 0

TYPE starrocks_be_chunk_allocator_mem_bytes gauge

starrocks_be_chunk_allocator_mem_bytes 204283016

TYPE starrocks_be_clone_mem_bytes gauge

starrocks_be_clone_mem_bytes 0

TYPE starrocks_be_column_metadata_mem_bytes gauge

starrocks_be_column_metadata_mem_bytes 1045454667

TYPE starrocks_be_column_pool_mem_bytes gauge

starrocks_be_column_pool_mem_bytes 77461658913

TYPE starrocks_be_column_zonemap_index_mem_bytes gauge

starrocks_be_column_zonemap_index_mem_bytes 186247915

TYPE starrocks_be_compaction_mem_bytes gauge

starrocks_be_compaction_mem_bytes 990414504

TYPE starrocks_be_consistency_mem_bytes gauge

starrocks_be_consistency_mem_bytes 0

TYPE starrocks_be_load_mem_bytes gauge

starrocks_be_load_mem_bytes 0

TYPE starrocks_be_metadata_mem_bytes gauge

starrocks_be_metadata_mem_bytes 1163738761

TYPE starrocks_be_ordinal_index_mem_bytes gauge

starrocks_be_ordinal_index_mem_bytes 663445952

TYPE starrocks_be_process_mem_bytes gauge

starrocks_be_process_mem_bytes 110788966320

TYPE starrocks_be_query_mem_bytes gauge

starrocks_be_query_mem_bytes 243429477

TYPE starrocks_be_rowset_metadata_mem_bytes gauge

starrocks_be_rowset_metadata_mem_bytes 0

TYPE starrocks_be_schema_change_mem_bytes gauge

starrocks_be_schema_change_mem_bytes 0

TYPE starrocks_be_segment_metadata_mem_bytes gauge

starrocks_be_segment_metadata_mem_bytes 117788093

TYPE starrocks_be_segment_zonemap_mem_bytes gauge

starrocks_be_segment_zonemap_mem_bytes 96053244

TYPE starrocks_be_short_key_index_mem_bytes gauge

starrocks_be_short_key_index_mem_bytes 1179267

TYPE starrocks_be_storage_page_cache_mem_bytes gauge

starrocks_be_storage_page_cache_mem_bytes 337246752

TYPE starrocks_be_tablet_metadata_mem_bytes gauge

starrocks_be_tablet_metadata_mem_bytes 496001

TYPE starrocks_be_tablet_schema_mem_bytes gauge

starrocks_be_tablet_schema_mem_bytes 496001

TYPE starrocks_be_update_mem_bytes gauge

starrocks_be_update_mem_bytes 19996417541

改过http_port默认值么,截图看写的 8041,http_port 默认8040端口