3.1.6 BE节点假死导致查询导入异常

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
有个BE节点hang住了,导致查询合导入都异常

日期:2月18号

18:33 后开始出现metrics 上报异常


18:40 后开始出现报了很多这种WARNING 日志
pipeline_driver_poller.cpp:84] [Driver] Timeout

tablet_sink.cpp:1201] NodeChannel[31263229], tablet open failed, load_id=0e180187-ce4a-11ee-9a58-525400b41863, txn_id: 1173056133, parallel=1, compress_type=2, node=, errmsg=[E1008]Reached timeout=30000ms @

20:15左右重启BE节点,然后就恢复了,没有打pstack记录当时的运行状况

备份了FE 和BE的日志

【背景】做过哪些操作?

【业务影响】查询和导入受影响
【是否存算分离】否
【StarRocks版本】例如:3.1.6
【集群规模】例如:1fe + 7be
【机器信息】CPU虚拟核/内存/网卡,例如:32C/128G/万兆
【联系方式】社区群3-杨荣
【附件】

报错的日志跟这个帖子差不多,3.1 版本默认启用了资源组,不知道是不是这块有问题

没手动重启之前 状态一直显示的是存活吧 可以单发下这个节点的be.out日志 我再确认下 当前有点缺信息 再遇到麻烦您打下pastack 我们确认下是hang在哪里了

重启前一直没挂,be.out 如下, 这个环境的日志发给 trueeyu 大佬了,方便的话可以和他一起看看

start time: Fri Dec 15 23:24:12 CST 2023
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/StarRocks/be/lib/jni-packages/starrocks-jdbc-bridge-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/StarRocks/be/lib/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
start time: Thu Dec 21 21:06:48 CST 2023
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/StarRocks/be/lib/jni-packages/starrocks-jdbc-bridge-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/StarRocks/be/lib/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
start time: Mon Feb 5 12:56:48 CST 2024
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/StarRocks/be/lib/jni-packages/starrocks-jdbc-bridge-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/StarRocks/be/lib/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
start time: Sun Feb 18 20:14:41 CST 2024
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/StarRocks/be/lib/jni-packages/starrocks-jdbc-bridge-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/StarRocks/be/lib/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]