为了更快的定位您的问题,请提供以下信息,谢谢
【详述】be瞬间内存和cpu满负载,然后就oom挂掉了
【背景】正常操作,查询和流任务写入
【业务影响】集群不可用
【是否存算分离】否
【StarRocks版本】3.1.8
【集群规模】3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】32C/128G/万兆
【联系方式】社区群15 朴实无华
【附件】
be.INFO没有错误日志,be.out也没有错误日志
[Tue May 14 19:22:17 2024] pip_wg_executor invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[Tue May 14 19:22:17 2024] pip_wg_executor cpuset=/ mems_allowed=0-1
[Tue May 14 19:22:17 2024] CPU: 10 PID: 2627 Comm: pip_wg_executor Tainted: G OE ------------ 3.10.0-514.el7.x86_64 #1
[Tue May 14 19:22:17 2024] Hardware name: Unis Huashan Technologies Co., Ltd. R4900 G2/RS32M2C9S, BIOS 1.01.12 07/24/2017
[Tue May 14 19:22:17 2024] ffff88041f590000 00000000f288d896 ffff880143c87a78 ffffffff81685fac
[Tue May 14 19:22:17 2024] ffff880143c87b08 ffffffff81680f57 0000000000000001 000000000000fa88
[Tue May 14 19:22:17 2024] 000000000000043f ffffffff819f1160 ffff88019963b478 0000000001f693cf
[Tue May 14 19:22:17 2024] Call Trace:
[Tue May 14 19:22:17 2024] [] dump_stack+0x19/0x1b
[Tue May 14 19:22:17 2024] [] dump_header+0x8e/0x225
[Tue May 14 19:22:17 2024] [] oom_kill_process+0x24e/0x3c0
[Tue May 14 19:22:17 2024] [] ? oom_unkillable_task+0xcd/0x120
[Tue May 14 19:22:17 2024] [] ? find_lock_task_mm+0x56/0xc0
[Tue May 14 19:22:17 2024] [] ? has_capability_noaudit+0x1e/0x30
[Tue May 14 19:22:17 2024] [] out_of_memory+0x4b6/0x4f0
[Tue May 14 19:22:17 2024] [] __alloc_pages_slowpath+0x5d7/0x725
[Tue May 14 19:22:17 2024] [] __alloc_pages_nodemask+0x405/0x420
[Tue May 14 19:22:17 2024] [] alloc_pages_vma+0x9a/0x150
[Tue May 14 19:22:17 2024] [] handle_mm_fault+0xc6f/0xfe0
[Tue May 14 19:22:17 2024] [] __do_page_fault+0x154/0x450
[Tue May 14 19:22:17 2024] [] ? __do_page_fault+0x19f/0x450
[Tue May 14 19:22:17 2024] [] do_page_fault+0x35/0x90
[Tue May 14 19:22:17 2024] [] page_fault+0x28/0x30
[Tue May 14 19:22:17 2024] Mem-Info:
[Tue May 14 19:22:17 2024] active_anon:32317399 inactive_anon:28763 isolated_anon:0
active_file:5233 inactive_file:4323 isolated_file:95
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:149086 slab_unreclaimable:27216
mapped:6232 shmem:47867 pagetables:72237 bounce:0
free:87775 free_pcp:2243 free_cma:0
[Tue May 14 19:22:17 2024] Node 0 DMA free:11796kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Tue May 14 19:22:17 2024] lowmem_reserve[]: 0 1671 64133 64133
[Tue May 14 19:22:17 2024] Node 0 DMA32 free:250904kB min:1168kB low:1460kB high:1752kB active_anon:1265396kB inactive_anon:1000kB active_file:104kB inactive_file:188kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1965688kB managed:1713780kB mlocked:0kB dirty:0kB writeback:0kB mapped:100kB shmem:1496kB slab_reclaimable:177436kB slab_unreclaimable:9892kB kernel_stack:752kB pagetables:2972kB unstable:0kB bounce:0kB free_pcp:1356kB local_pcp:20kB free_cma:0kB writeback_tmp:0kB pages_scanned:1441 all_unreclaimable? yes
[Tue May 14 19:22:17 2024] lowmem_reserve[]: 0 0 62461 62461
[Tue May 14 19:22:17 2024] Node 0 Normal free:43368kB min:43744kB low:54680kB high:65616kB active_anon:62937344kB inactive_anon:72644kB active_file:11316kB inactive_file:7832kB unevictable:0kB isolated(anon):0kB isolated(file):380kB present:65011712kB managed:63960528kB mlocked:0kB dirty:0kB writeback:0kB mapped:16944kB shmem:123632kB slab_reclaimable:182420kB slab_unreclaimable:56480kB kernel_stack:18720kB pagetables:141180kB unstable:0kB bounce:0kB free_pcp:3800kB local_pcp:132kB free_cma:0kB writeback_tmp:0kB pages_scanned:33535 all_unreclaimable? yes
[Tue May 14 19:22:17 2024] lowmem_reserve[]: 0 0 0 0
[Tue May 14 19:22:17 2024] Node 1 Normal free:45032kB min:45180kB low:56472kB high:67768kB active_anon:65066856kB inactive_anon:41408kB active_file:9512kB inactive_file:9272kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:66057444kB mlocked:0kB dirty:0kB writeback:0kB mapped:7884kB shmem:66340kB slab_reclaimable:236488kB slab_unreclaimable:42492kB kernel_stack:3552kB pagetables:144796kB unstable:0kB bounce:0kB free_pcp:3816kB local_pcp:204kB free_cma:0kB writeback_tmp:0kB pages_scanned:32970 all_unreclaimable? yes
[Tue May 14 19:22:17 2024] lowmem_reserve[]: 0 0 0 0
[Tue May 14 19:22:17 2024] Node 0 DMA: 14kB (U) 08kB 116kB (U) 032kB 264kB (U) 1128kB (U) 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 24096kB (M) = 11796kB
[Tue May 14 19:22:17 2024] Node 0 DMA32: 1324kB (UE) 1418kB (UE) 406816kB (UEM) 291832kB (UEM) 111064kB (UEM) 139128kB (UEM) 2256kB (EM) 1512kB (M) 11024kB (M) 02048kB 04096kB = 251000kB
[Tue May 14 19:22:17 2024] Node 0 Normal: 1764kB (UE) 3238kB (UEM) 11716kB (UEM) 37732kB (UEM) 8964kB (UEM) 58128kB (UEM) 13256kB (UEM) 3512kB (EM) 11024kB (U) 02048kB 24096kB (M) = 44424kB
[Tue May 14 19:22:17 2024] Node 1 Normal: 2144kB (UEM) 3718kB (UEM) 18916kB (UEM) 37632kB (UEM) 11264kB (UEM) 31128kB (UEM) 8256kB (EM) 5512kB (UEM) 21024kB (UM) 02048kB 24096kB (U) = 44864kB
[Tue May 14 19:22:17 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Tue May 14 19:22:17 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Tue May 14 19:22:17 2024] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Tue May 14 19:22:17 2024] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Tue May 14 19:22:17 2024] 58202 total pagecache pages
[Tue May 14 19:22:17 2024] 0 pages in swap cache
[Tue May 14 19:22:17 2024] Swap cache stats: add 0, delete 0, find 0/0
[Tue May 14 19:22:17 2024] Free swap = 0kB
[Tue May 14 19:22:17 2024] Total swap = 0kB
[Tue May 14 19:22:17 2024] 33525560 pages RAM
[Tue May 14 19:22:17 2024] 0 pages HighMem/MovableOnly
[Tue May 14 19:22:17 2024] 588649 pages reserved
[Tue May 14 19:22:17 2024] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Tue May 14 19:22:17 2024] [ 661] 0 661 18520 7919 40 0 0 systemd-journal
[Tue May 14 19:22:17 2024] [ 695] 0 695 10896 153 21 0 -1000 systemd-udevd
[Tue May 14 19:22:17 2024] [ 1036] 0 1036 13854 114 26 0 -1000 auditd
[Tue May 14 19:22:17 2024] [ 1057] 998 1057 132432 1906 54 0 0 polkitd
[Tue May 14 19:22:17 2024] [ 1058] 0 1058 53186 449 57 0 0 abrtd
[Tue May 14 19:22:17 2024] [ 1063] 0 1063 52566 340 55 0 0 abrt-watch-log
[Tue May 14 19:22:17 2024] [ 1065] 81 1065 6655 134 18 0 -900 dbus-daemon
[Tue May 14 19:22:17 2024] [ 1069] 0 1069 50303 124 39 0 0 gssproxy
[Tue May 14 19:22:17 2024] [ 1070] 997 1070 2131 37 9 0 0 lsmd
[Tue May 14 19:22:17 2024] [ 1080] 0 1080 6109 127 14 0 0 systemd-logind
[Tue May 14 19:22:17 2024] [ 1085] 995 1085 28962 97 26 0 0 chronyd
[Tue May 14 19:22:17 2024] [ 1086] 0 1086 109477 623 64 0 0 NetworkManager
[Tue May 14 19:22:17 2024] [ 1716] 0 1716 20617 217 42 0 -1000 sshd
[Tue May 14 19:22:17 2024] [ 1719] 0 1719 181329 4709 59 0 0 node_exporter
[Tue May 14 19:22:17 2024] [ 1728] 0 1728 138289 3200 89 0 0 tuned
[Tue May 14 19:22:17 2024] [ 1729] 0 1729 87979 250 41 0 0 rsyslogd
[Tue May 14 19:22:17 2024] [ 1758] 0 1758 23079 158 50 0 0 login
[Tue May 14 19:22:17 2024] [ 1785] 0 1785 6461 52 17 0 0 atd
[Tue May 14 19:22:17 2024] [ 1801] 0 1801 31556 155 18 0 0 crond
[Tue May 14 19:22:17 2024] [ 3451] 1202 3451 516182 4685 99 0 0 process-exporte
[Tue May 14 19:22:17 2024] [26807] 0 26807 29083 351 15 0 0 bash
[Tue May 14 19:22:17 2024] [16690] 0 16690 19868 197 38 0 0 zabbix_agentd
[Tue May 14 19:22:17 2024] [16692] 0 16692 19869 668 38 0 0 zabbix_agentd
[Tue May 14 19:22:17 2024] [16693] 0 16693 19868 200 38 0 0 zabbix_agentd
[Tue May 14 19:22:17 2024] [16694] 0 16694 19868 200 38 0 0 zabbix_agentd
[Tue May 14 19:22:17 2024] [16695] 0 16695 19868 200 38 0 0 zabbix_agentd
[Tue May 14 19:22:17 2024] [16696] 0 16696 19868 236 38 0 0 zabbix_agentd
[Tue May 14 19:22:17 2024] [ 1696] 0 1696 40477526 32271129 71159 0 0 starrocks_be
[Tue May 14 19:22:17 2024] Out of memory: Kill process 1696 (starrocks_be) score 952 or sacrifice child
[Tue May 14 19:22:17 2024] Killed process 1696 (starrocks_be) total-vm:161910104kB, anon-rss:129084516kB, file-rss:0kB, shmem-rss:0kB
be.out只有start time
start time: Tue May 14 15:48:57 CST 2024
start time: Tue May 14 15:54:34 CST 2024
start time: Tue May 14 16:08:39 CST 2024
start time: Tue May 14 16:30:26 CST 2024
start time: Tue May 14 16:34:22 CST 2024
start time: Tue May 14 16:47:51 CST 2024
start time: Tue May 14 17:01:28 CST 2024
start time: Tue May 14 17:05:04 CST 2024
start time: Tue May 14 17:14:13 CST 2024
start time: Tue May 14 17:38:26 CST 2024
start time: Tue May 14 18:04:29 CST 2024
start time: Tue May 14 19:00:30 CST 2024
start time: Tue May 14 19:03:43 CST 2024
start time: Tue May 14 19:13:21 CST 2024
start time: Tue May 14 19:22:33 CST 2024
最后关掉流任务写入,再一个一个开启,发现是有个表写入导致的,但表写入量非常小,30秒几十条
建表语句:
CREATE TABLE ods_auto_open_app_auth_d
(
ModelCode
varchar(256) NOT NULL DEFAULT “”,
ts
bigint(20) NULL,
msg
varchar(65533) NULL,
Type
varchar(20) NULL COMMENT,
BrandCode
varchar(256) NULL,
DeviceID
varchar(256) NULL,
SignatureCore
varchar(65533) NULL,
SignFwk
varchar(65533) NULL,
AppID
varchar(256) NULL,
PkgName
varchar(256) NULL ,
PkgVersion
varchar(64) NULL,
SignatureApp
varchar(65533) NULL,
AppSource
varchar(128) NULL ,
OperatorName
varchar(100) NULL ,
OS
varchar(64) NULL,
SupplierCode
varchar(64) NULL,
VehicleType
varchar(64) NULL
) ENGINE=OLAP
DUPLICATE KEY(ModelCode
)
DISTRIBUTED BY HASH(ModelCode
)
PROPERTIES (
“replication_num” = “3”,
“in_memory” = “false”,
“enable_persistent_index” = “false”,
“replicated_storage” = “true”,
“compression” = “LZ4”
);
建表时没有指定bucket
表数据分布