关于OOM问题,
1.FE和BE是独立部署,BE有内存默认参数exec_limit是90%,
2.可以先查看/proc/sys/vm/overcommit_memory是否配置为1,若不是先修改该配置为1
3.当前单个be内存使用也会受限于并行度(parallel_fragment_exec_instance_num)* exec_mem_limit,可以根据BE节点内存调整这两个变量。
请问你最后解决了吗?我的也是be内存持续上涨,不下降,不过我的版本是2.2.10
curl -XGET -s http://BE_IP:BE_HTTP_PORT/metrics | grep “^starrocks_be_.*_mem_bytes|^starrocks_be_tcmalloc_bytes_in_use” 这样看下
这个看了的,starrocks_be_tcmalloc_bytes_in_use和starrocks_be_process_mem_bytes,这两个指标一直在增长
麻烦发下截图看看?be是混合部署的吗?机器内存是多大的?麻烦发下be.out文件看下?
be.out日志如下,中间重启过一次,2022-12-15重启过:
start time: Thu Sep 1 14:09:20 CST 2022
tcmalloc: large alloc 1306877952 bytes == 0x630d58000 @ 0x5493aef 0x572535c 0x1ecee0e 0x56755d5 0x1929c44 0x22ee0a6 0x22d8df2 0x22dd512 0x2562ae9 0x25138ec 0x2514b6f 0x241afd7 0x2319a5e 0x25d2c8b 0x1e948f4 0x1e955f7 0x1e294bb 0x1e2db9c 0x1e2e401 0x1f723d9 0x1f6df8a 0x7f14fc1d2e65
start time: Wed Nov 30 14:15:27 CST 2022
start time: Thu Dec 1 22:42:33 CST 2022
start time: Thu Dec 1 23:16:55 CST 2022
start time: Fri Dec 2 10:14:48 CST 2022
start time: Fri Dec 2 12:14:40 CST 2022
start time: Sat Dec 3 21:04:13 CST 2022
start time: Wed Dec 7 11:17:40 CST 2022
start time: Thu Dec 15 16:02:11 CST 2022
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
[warn] Error from accept() call: Invalid argument
*** Check failure stack trace: ***
@ 0x3d01fad google::LogMessage::Fail()
@ 0x3d0441f google::LogMessage::SendToLog()
@ 0x3d01afe google::LogMessage::Flush()
@ 0x3d04a29 google::LogMessageFatal::~LogMessageFatal()
@ 0x1fb3ab0 starrocks::ThreadPool::~ThreadPool()
@ 0x198ab14 starrocks::StorageEngine::~StorageEngine()
@ 0x17fbfdc main
@ 0x7f2a55957505 __libc_start_main
@ 0x18e971e (unknown)
@ (nil) (unknown)
start time: Thu Dec 15 16:03:58 CST 2022
可以加个微信好友,聊聊吗?
您在社区群里吗?在的话我加您一下聊聊?社区群+id您告知我下就行
社区群15,群昵称:老张
大佬,问题解决了吗? 一样的现象,版本 2.0.2
并没有,先重启了,你如果解决了,也可以分享一下哈
目前 只能重启能够解决,毕竟是大杀招了 。你们的数据模型是哪种比较多啊
都是unique和duplicated
重启之后,内存又会慢慢涨上去
是的 一样的现象 我一度怀疑是不是 在执行计划结束以后 gc 的时候 有bug 导致内存 leak,现在还在看代码找,
之前看到的一些 issue,一起看看
https://github.com/StarRocks/starrocks/pull/1800
我这边没有来得及试试 你看看 一起沟通下
好的,我先看看哈,
hi,请问下这个问题你找到原因了吗?
不过 starrocks_be_tablet_meta_mem_bytes这个指标确实不准,starrocks大佬也说过,不知道是不是因为元数据内存不释放引起的