be节点执行routine load出现异常,任务在Running,但是不消费

日志报错:
W0424 14:15:03.573925 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33303 txn_ids=552206
W0424 14:15:03.573930 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33304 txn_ids=552206
W0424 14:15:03.573935 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33305 txn_ids=552206
W0424 14:15:03.573940 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33306 txn_ids=552206
W0424 14:15:03.573945 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33307 txn_ids=552206
W0424 14:15:03.573951 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33309 txn_ids=552206
W0424 14:15:03.573956 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33310 txn_ids=552206
W0424 14:15:03.573961 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33312 txn_ids=552206
W0424 14:15:03.573966 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33313 txn_ids=552206
W0424 14:15:03.573971 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33315 txn_ids=552206
W0424 14:15:03.573978 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33316 txn_ids=552206
W0424 14:15:03.573983 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33317 txn_ids=552206
W0424 14:15:03.573988 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 tasks queued). tablet_id=33318 txn_ids=552206
W0424 14:15:03.573993 622 lake_service.cpp:221] Fail to submit publish version task: Service unavailable: Thread pool is at capacity (16/16 tasks running, 2048/2048 t

sr版本:3.2.4 存算分离 3fe+5cn

hello,方便发一下be.INFO的日志吗?

cn.INFO (39 MB)

日志传上来了

收到日志了,正在排查。

I0424 16:51:52.380627 46412 data_consumer.h:102] kafka log-3-FAIL, event: [thrd:GroupCoordinator]: GroupCoordinator: SASL authentication error: SaslAuthenticateRequest failed: Local: Broker transport failure (after 0ms in state DOWN)
SASL认证报错了。确认一下

导入任务很多么?看起来导入频率很高。

这个是没问题的,任务是一直在跑,认证之前都正常,kafka服务也是正常状态

任务500+

下午那时候我进行cn节点扩展,同时隔壁同事也在进行broker load等操作,然后后续就发生这个故障了。具体这几个事之间的关联性我也不是很清楚

今天重启任务后,开始也是运行正常,现在又出现昨天同样的错误了

收到了,我们继续排查一下

新扩展的CN节点,权限,网络,认证方面没问题吧?

500+任务确实有点太多了 都是routine load么?

没问题

现在我们降到了200个了。之前是运行一段时间才报错,现在是直接任务跑不动了,重启cn,fe节点都不行。任务还是运行不了

Fail to submit publish version task: Timeout: acquire semaphore reached deadline=1714096744719. tablet_id=43274 txn_ids=683243 任务停了现在还是满屏刷这种提示