虽然spill to disk功能GA了, 但是目前还是有些场景会出现内存溢出的问题.

  1. row_number 开窗函数去重取rn=1, 数据量大的时候会内存不足, 跑不出来.
  2. union去重, 执行计划上是group by 全部的列, 数据量不大的时候也会内存不足, 跑不出来.
  3. group_concat(distinct ) , 使用去重文本汇总时, 数据量稍微大一点就跑不出来.
  4. distinct 多列去重, 数据量大的时候内存不足, 跑不出来.
    希望这些场景能够再优化一下, 目前我们在用starrocks跑离线任务, 经常会有一些sql跑不出来.

场景3 应该不行其他的几个预期都可以,跑不出来的提供下profile

好的 我收集一下 跑不出来的profile

大佬帮忙看看, 这个row_number窗口函数去重, 数据量只有100万, 开启了spill 还是内存不够用 使用了30个G

SQLException: (conn=673) Memory of process exceed limit. read chunk from storage Backend: 172.16.48.14, fragment: 4a07c129-294c-11ef-9a8c-02425f883893 Used: 35678076816, Limit: 35672555520. Mem usage has exceed the limit of BE
row_number_oom.profile (53.1 KB)

这个应该是别的内存占用比较多,这个查询也就用了5个G

参考下这个获取一下heap profile https://github.com/StarRocks/starrocks/pull/35322

大佬 帮忙看一下 be节点一起重启

digraph “/opt/starrocks/be/lib/starrocks_be; 14235.7 MB” {
node [width=0.375,height=0.25];
Legend [shape=box,fontsize=24,shape=plaintext,label="/opt/starrocks/be/lib/starrocks_be\lTotal MB: 14235.7\lFocusing on: 14235.7\lDropped nodes with <= 71.2 abs(MB)\lDropped edges with <= 14.2 MB\l"];
N1 [label="__clone@@GLIBC_2.2.5\n0.0 (0.0%)\rof 14235.2 (100.0%)\r",shape=box,fontsize=8.0];
N2 [label=“pthread_condattr_setpshared@GLIBC_2.2.5\n0.0 (0.0%)\rof 14235.2 (100.0%)\r”,shape=box,fontsize=8.0];
N3 [label=“starrocks\nThread\nsupervise_thread\n0.0 (0.0%)\rof 14235.2 (100.0%)\r”,shape=box,fontsize=8.0];
N4 [label=“starrocks\nThreadPool\ndispatch_thread\n0.0 (0.0%)\rof 14235.2 (100.0%)\r”,shape=box,fontsize=8.0];
N5 [label=“starrocks\nTabletUpdates\ndo_apply\n0.0 (0.0%)\rof 12704.7 (89.2%)\r”,shape=box,fontsize=8.0];
N6 [label=“starrocks\nPersistentIndex\nget\n0.0 (0.0%)\rof 11264.0 (79.1%)\r”,shape=box,fontsize=8.0];
N7 [label=“starrocks\nPersistentIndex\ntry_replace\n0.0 (0.0%)\rof 11264.0 (79.1%)\r”,shape=box,fontsize=8.0];
N8 [label=“starrocks\nPrimaryIndex\n_replace_persistent_index\n0.0 (0.0%)\rof 11264.0 (79.1%)\r”,shape=box,fontsize=8.0];
N9 [label=“starrocks\nPrimaryIndex\ntry_replace\n0.0 (0.0%)\rof 11264.0 (79.1%)\r”,shape=box,fontsize=8.0];
N10 [label=“starrocks\nTabletUpdates\n_apply_compaction_commit\n0.0 (0.0%)\rof 11264.0 (79.1%)\r”,shape=box,fontsize=8.0];
N11 [label=“std\nvector\nemplace_back\n8252.0 (58.0%)\r”,shape=box,fontsize=46.1];
N12 [label=“starrocks\nFixedMutableIndex\nget\n0.0 (0.0%)\rof 8192.0 (57.5%)\r”,shape=box,fontsize=8.0];
N13 [label=“starrocks\nShardByLengthMutableIndex\nget\n0.0 (0.0%)\rof 8192.0 (57.5%)\r”,shape=box,fontsize=8.0];
N14 [label=“starrocks\nPersistentIndex\n_get_from_immutable_index_parallel\n3149.0 (22.1%)\r”,shape=box,fontsize=31.5];
N15 [label=“starrocks\nGetFromImmutableIndexTask\nrun\n0.0 (0.0%)\rof 1530.4 (10.8%)\r”,shape=box,fontsize=8.0];
N16 [label=“starrocks\nImmutableIndex\nget\n1.6 (0.0%)\rof 1530.4 (10.8%)\r”,shape=box,fontsize=8.5];
N17 [label=“starrocks\nPersistentIndex\nget_from_one_immutable_index\n0.0 (0.0%)\rof 1530.4 (10.8%)\r”,shape=box,fontsize=8.0];
N18 [label=“std\nvector\nemplace_back\n[clone\n.isra.0]\n1527.8 (10.7%)\r”,shape=box,fontsize=24.4];
N19 [label=“starrocks\nTabletUpdates\n_apply_normal_rowset_commit\n0.0 (0.0%)\rof 1440.7 (10.1%)\r”,shape=box,fontsize=8.0];
N20 [label=“starrocks\nTabletUpdates\n_apply_rowset_commit\n0.0 (0.0%)\rof 1440.7 (10.1%)\r”,shape=box,fontsize=8.0];
N21 [label=“starrocks\nPrimaryIndex\n_upsert_into_persistent_index\n35.0 (0.2%)\rof 1405.7 (9.9%)\r”,shape=box,fontsize=10.5];
N22 [label=“starrocks\nPrimaryIndex\nupsert\n0.0 (0.0%)\rof 1405.7 (9.9%)\r”,shape=box,fontsize=8.0];
N23 [label=“starrocks\nTabletUpdates\n_do_update\n0.0 (0.0%)\rof 1405.7 (9.9%)\r”,shape=box,fontsize=8.0];
N24 [label=“starrocks\nPersistentIndex\nupsert\n0.0 (0.0%)\rof 1255.7 (8.8%)\r”,shape=box,fontsize=8.0];
N25 [label=“starrocks\nShardByLengthMutableIndex\nupsert\n0.0 (0.0%)\rof 664.0 (4.7%)\r”,shape=box,fontsize=8.0];
N26 [label=“starrocks\nFixedMutableIndex\nupsert\n0.0 (0.0%)\rof 640.0 (4.5%)\r”,shape=box,fontsize=8.0];
N27 [label=“phmap\npriv\nraw_hash_set\nprepare_insert\n0.0 (0.0%)\rof 580.0 (4.1%)\r”,shape=box,fontsize=8.0];
N28 [label=“phmap\npriv\nraw_hash_set\nresize\n580.0 (4.1%)\r”,shape=box,fontsize=18.1];
N29 [label=“starrocks\nImmutableIndex\nload\n0.5 (0.0%)\rof 514.7 (3.6%)\r”,shape=box,fontsize=8.3];
N30 [label=“starrocks\nPersistentIndex\n_flush_advance_or_append_wal\n0.0 (0.0%)\rof 514.7 (3.6%)\r”,shape=box,fontsize=8.0];
N31 [label=“starrocks\nBloomFilter\ninit\n512.7 (3.6%)\r”,shape=box,fontsize=17.5];
N32 [label=“starrocks\nPersistentIndex\n_merge_compaction_advance\n0.0 (0.0%)\rof 403.0 (2.8%)\r”,shape=box,fontsize=8.0];
N33 [label=“starrocks\nPersistentIndex\nflush_advance\n0.0 (0.0%)\rof 111.8 (0.8%)\r”,shape=box,fontsize=8.0];
N34 [label=“starrocks\nPrimaryIndex\n_build_persistent_keys\n80.0 (0.6%)\r”,shape=box,fontsize=11.7];
N3 -> N4 [label=14235.2, weight=100000, style=“setlinewidth(2.000000)”];
N1 -> N2 [label=14235.2, weight=100000, style=“setlinewidth(2.000000)”];
N2 -> N3 [label=14235.2, weight=100000, style=“setlinewidth(2.000000)”];
N4 -> N5 [label=12704.7, weight=100000, style=“setlinewidth(2.000000)”];
N9 -> N8 [label=11264.0, weight=100000, style=“setlinewidth(2.000000)”];
N7 -> N6 [label=11264.0, weight=100000, style=“setlinewidth(2.000000)”];
N5 -> N10 [label=11264.0, weight=100000, style=“setlinewidth(2.000000)”];
N8 -> N7 [label=11264.0, weight=100000, style=“setlinewidth(2.000000)”];
N10 -> N9 [label=11264.0, weight=100000, style=“setlinewidth(2.000000)”];
N13 -> N12 [label=8192.0, weight=100000, style=“setlinewidth(2.000000)”];
N12 -> N11 [label=8192.0, weight=100000, style=“setlinewidth(2.000000)”];
N6 -> N13 [label=8192.0, weight=100000, style=“setlinewidth(2.000000)”];
N6 -> N14 [label=3072.0, weight=100000, style=“setlinewidth(1.294777)”];
N4 -> N15 [label=1530.4, weight=100000, style=“setlinewidth(0.645038)”];
N17 -> N16 [label=1530.4, weight=100000, style=“setlinewidth(0.645038)”];
N15 -> N17 [label=1530.4, weight=100000, style=“setlinewidth(0.645038)”];
N16 -> N18 [label=1527.8, weight=100000, style=“setlinewidth(0.643942)”];
N20 -> N19 [label=1440.7, weight=100000, style=“setlinewidth(0.607235)”];
N5 -> N20 [label=1440.7, weight=100000, style=“setlinewidth(0.607235)”];
N19 -> N23 [label=1405.7, weight=100000, style=“setlinewidth(0.592483)”];
N22 -> N21 [label=1405.7, weight=100000, style=“setlinewidth(0.592483)”];
N23 -> N22 [label=1405.7, weight=100000, style=“setlinewidth(0.592483)”];
N21 -> N24 [label=1255.7, weight=100000, style=“setlinewidth(0.529262)”];
N24 -> N25 [label=664.0, weight=100000, style=“setlinewidth(0.279861)”];
N25 -> N26 [label=640.0, weight=100000, style=“setlinewidth(0.269746)”];
N27 -> N28 [label=580.0, weight=100000, style=“setlinewidth(0.244457)”];
N26 -> N27 [label=580.0, weight=100000, style=“setlinewidth(0.244457)”];
N24 -> N30 [label=514.7, weight=100000, style=“setlinewidth(0.216947)”];
N29 -> N31 [label=512.7, weight=100000, style=“setlinewidth(0.216085)”];
N30 -> N32 [label=403.0, weight=100000, style=“setlinewidth(0.169844)”];
N32 -> N29 [label=403.0, weight=100000, style=“setlinewidth(0.169844)”];
N33 -> N29 [label=111.8, weight=100000, style=“setlinewidth(0.047102)”];
N30 -> N33 [label=111.8, weight=100000, style=“setlinewidth(0.047102)”];
N21 -> N34 [label=80.0, weight=100000, style=“setlinewidth(0.033718)”];
N24 -> N14 [label=77.0, weight=100000, style=“setlinewidth(0.032454)”];
N26 -> N11 [label=60.0, weight=100000, style=“setlinewidth(0.025289)”];
}

这个主要是主键索引内存用了10G左右,你配置了主键索引持久化吗

配置了

这个pull request 优化了 row_number = 1 这类内存使用
https://github.com/StarRocks/starrocks/pull/49011

@许秀不许秀
升级了3.3.2还是有些问题