be每天凌晨俩点假死,导致前段查询一分钟无法获取数据

【详述】问题详细描述
每天凌晨2点出现故障,be假死,前端查询一分钟无法使用, 请问是否有什么定时任务?
【背景】做过哪些操作?
服务正常运行,其他时间响应很快
【业务影响】
【StarRocks版本】例如:2.3.0
【集群规模】例如:3fe(3 follower)+3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:16C/64G/千兆
【附件】

期间be日志无异常,fe leader有如下日志:
.111.13.204
2022-12-17 02:00:59,756 INFO (thrift-server-pool-2270|69541) [ThriftServerEventProcessor.createContext():98] create thrift context. client: TNetworkAddress(hostname:10.111.13.204, port:48550)
2022-12-17 02:00:59,756 INFO (thrift-server-pool-2270|69541) [FrontendServiceImpl.loadTxnBegin():763] receive txn begin request, db: dealmoon, tbl: sp_subject_user_impression_stats, label: 72514817-d26c-4359-912d-81b5904a8b43, backend: 10.111.13.204
2022-12-17 02:00:59,814 WARN (thrift-server-pool-1523|49031) [Database.tryReadLock():144] database lock is held by: dump thread: PUBLISH_VERSION, id: 29
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
com.starrocks.transaction.DatabaseTransactionMgr.writeLock(DatabaseTransactionMgr.java:153)
com.starrocks.transaction.DatabaseTransactionMgr.finishTransaction(DatabaseTransactionMgr.java:894)
com.starrocks.transaction.GlobalTransactionMgr.finishTransaction(GlobalTransactionMgr.java:418)
com.starrocks.transaction.PublishVersionDaemon.publishVersion(PublishVersionDaemon.java:122)
com.starrocks.transaction.PublishVersionDaemon.runAfterCatalogReady(PublishVersionDaemon.java:52)
com.starrocks.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:61)
com.starrocks.common.util.Daemon.run(Daemon.java:115)

2022-12-17 02:00:59,814 WARN (thrift-server-pool-2331|69968) [Database.tryReadLock():144] database lock is held by: dump thread: PUBLISH_VERSION, id: 29
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
com.starrocks.transaction.DatabaseTransactionMgr.writeLock(DatabaseTransactionMgr.java:153)
com.starrocks.transaction.DatabaseTransactionMgr.finishTransaction(DatabaseTransactionMgr.java:894)
com.starrocks.transaction.GlobalTransactionMgr.finishTransaction(GlobalTransactionMgr.java:418)
com.starrocks.transaction.PublishVersionDaemon.publishVersion(PublishVersionDaemon.java:122)
com.starrocks.transaction.PublishVersionDaemon.runAfterCatalogReady(PublishVersionDaemon.java:52)
com.starrocks.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:61)
com.starrocks.common.util.Daemon.run(Daemon.java:115)

2022-12-17 02:00:59,814 WARN (thrift-server-pool-1523|49031) [FrontendServiceImpl.streamLoadPut():1021] failed to get stream load plan: get database read lock timeout, database=default_cluster:dealmoon
2022-12-17 02:00:59,814 WARN (thrift-server-pool-1037|27591) [Database.tryReadLock():144] database lock is held by: dump thread: PUBLISH_VERSION, id: 29
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
com.starrocks.transaction.DatabaseTransactionMgr.writeLock(DatabaseTransactionMgr.java:153)
com.starrocks.transaction.DatabaseTransactionMgr.finishTransaction(DatabaseTransactionMgr.java:894)
com.starrocks.transaction.GlobalTransactionMgr.finishTransaction(GlobalTransactionMgr.java:418)
com.starrocks.transaction.PublishVersionDaemon.publishVersion(PublishVersionDaemon.java:122)
com.starrocks.transaction.PublishVersionDaemon.runAfterCatalogReady(PublishVersionDaemon.java:52)
com.starrocks.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:61)
com.starrocks.common.util.Daemon.run(Daemon.java:115)

  • 慢查询:
    • Profile信息
    • 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
    • pipeline是否开启:show variables like ‘%pipeline%’;
    • be节点cpu和内存使用率截图
  • 查询报错:
  • be crash
    • be.out

可以查下同时期机器的io压力大不大,是不是2点在做analyze.统计信息的采集

用户重启后,再没出现过,当时的日志和监控信息都没有了,下次出现的时候,再分析下。