【详述】问题详细描述
【背景】做过哪些操作?
【业务影响】
【StarRocks版本】2.3
【集群规模】例如:3fe+5be
【机器信息】48C/256G/万兆
大概2亿+的数据 重B表导入A表, BE节点会挂掉, 然后自己会恢复了,多次测试 都会出现这种情况。
insert into t_A select * from t_B ;
能否通过优化参数解决?
【详述】问题详细描述
【背景】做过哪些操作?
【业务影响】
【StarRocks版本】2.3
【集群规模】例如:3fe+5be
【机器信息】48C/256G/万兆
大概2亿+的数据 重B表导入A表, BE节点会挂掉, 然后自己会恢复了,多次测试 都会出现这种情况。
insert into t_A select * from t_B ;
能否通过优化参数解决?
发一下be.out 日志
有点怀疑是内存超了,可以考虑调整parallel_fragment_exec_instance_num和 load_mem_limit。
另外你这个机器上是不是部署了其他的服务
没有部署其他服务,be.out没有错误日志 ,
难道是监控误报吗
SET load_mem_limit =21474836480 ; 这个设置了 还是一样。
dmest -T | tail -n 500
执行一下这个命令
[root@cdh85-138 ~]# /bin/dmesg -T | tail -n 50
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_UP): p5p1: link is not ready
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_UP): p5p1: link is not ready
[四 7月 14 15:16:18 2022] [qede_link_update:2051(p5p1)]Link is up
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_CHANGE): p5p1: link becomes ready
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_UP): p5p2: link is not ready
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_UP): p5p2: link is not ready
[四 7月 14 15:16:18 2022] [qede_link_update:2057(p5p1)]Link is down
[四 7月 14 15:16:18 2022] [qede_link_update:2051(p5p1)]Link is up
[四 7月 14 15:16:18 2022] Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[四 7月 14 15:16:18 2022] [qede_link_update:2051(p5p2)]Link is up
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_CHANGE): p5p2: link becomes ready
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[四 7月 14 15:16:18 2022] bond0: Setting MII monitoring interval to 100
[四 7月 14 15:16:18 2022] bond0: Setting up delay to 0
[四 7月 14 15:16:18 2022] bond0: Setting down delay to 0
[四 7月 14 15:16:18 2022] bond0: Setting ad_actor_sys_prio to 65535
[四 7月 14 15:16:18 2022] bond0: Setting ad_select to stable (0)
[四 7月 14 15:16:18 2022] bond0: Setting ad_user_port_key to 0
[四 7月 14 15:16:18 2022] bond0: Setting arp_all_targets to any (0)
[四 7月 14 15:16:18 2022] bond0: Setting fail_over_mac to none (0)
[四 7月 14 15:16:18 2022] bond0: Setting LACP rate to slow (0)
[四 7月 14 15:16:18 2022] bond0: Setting min links value to 0
[四 7月 14 15:16:18 2022] bond0: Setting primary_reselect to always (0)
[四 7月 14 15:16:18 2022] bond0: Setting resend_igmp to 1
[四 7月 14 15:16:18 2022] bond0: Setting use_carrier to 1
[四 7月 14 15:16:18 2022] bond0: Setting xmit hash policy to layer2+3 (2)
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[四 7月 14 15:16:18 2022] bond0: Enslaving p5p1 as a backup interface with a down link
[四 7月 14 15:16:18 2022] [qede_link_update:2051(p5p1)]Link is up
[四 7月 14 15:16:18 2022] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
[四 7月 14 15:16:18 2022] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[四 7月 14 15:16:18 2022] bond0: Enslaving p5p2 as a backup interface with a down link
[四 7月 14 15:16:18 2022] bond0: link status definitely up for interface p5p1, 10000 Mbps full duplex
[四 7月 14 15:16:18 2022] bond0: first active interface up!
[四 7月 14 15:16:18 2022] [qede_link_update:2057(p5p1)]Link is down
[四 7月 14 15:16:18 2022] [qede_link_update:2051(p5p1)]Link is up
[四 7月 14 15:16:18 2022] [qede_link_update:2051(p5p2)]Link is up
[四 7月 14 15:16:18 2022] bond0: link status definitely up for interface p5p2, 10000 Mbps full duplex
[四 7月 14 15:16:28 2022] warning: `clickhouse-serv' uses 32-bit capabilities (legacy support in use)
[四 7月 14 15:16:33 2022] sctp: Hash tables configured (bind 4096/4096)
[四 7月 14 15:16:34 2022] usb 1-1.4: USB disconnect, device number 3
[四 7月 14 15:16:34 2022] usb 2-1.2: USB disconnect, device number 3
[四 7月 14 15:16:34 2022] hid-generic 0003:413C:2113.0002: usb_submit_urb(ctrl) failed: -19
[四 7月 14 16:56:06 2022] perf: interrupt took too long (2513 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[四 7月 14 17:32:02 2022] perf: interrupt took too long (3145 > 3141), lowering kernel.perf_event_max_sample_rate to 63000
[四 7月 14 18:16:57 2022] perf: interrupt took too long (3942 > 3931), lowering kernel.perf_event_max_sample_rate to 50000
[四 7月 14 20:34:21 2022] perf: interrupt took too long (4932 > 4927), lowering kernel.perf_event_max_sample_rate to 40000
[一 7月 18 16:41:42 2022] perf: interrupt took too long (6175 > 6165), lowering kernel.perf_event_max_sample_rate to 32000
dmest -T | grep starrocks_be 试试这个呢