【详述】问题详细描述
创建一个broker load任务,导致集群的fe和be直接挂掉,服务都不可用了
【背景】做过哪些操作?
创建一个broker load任务,导致集群的fe和be直接挂掉
【业务影响】
【StarRocks版本】例如:3.0.0存算分离版本
【集群规模】例如:3fe(1 follower+2observer)+6be(fe与be分开部署)
【机器信息】CPU虚拟核/内存/网卡,例如:16C/64G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群4-小李或者邮箱,谢谢
社区群13-Sltily.w fantasticmao@gmail.com
【附件】
- fe.log/beINFO/相应截图
- 慢查询:
- Profile信息,获取Profile,通过Profile分析查询瓶颈
- 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
- pipeline是否开启:show variables like ‘%pipeline%’;
- be节点cpu和内存使用率截图
- 查询报错:
- query_dump,怎么获取query_dump文件
- be crash
- be.out
2023-06-01 06:23:44,531 INFO (leaderCheckpointer|189) [EditLog.loadJournal():202] Begin to unprotect create table. db = flow table = 493354
2023-06-01 06:23:44,533 WARN (leaderCheckpointer|189) [GlobalStateMgr.replayJournalInner():2012] catch exception when replaying 201672,
com.starrocks.journal.JournalInconsistentException: failed to load journal type 118
at com.starrocks.persist.EditLog.loadJournal(EditLog.java:981) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2001) [starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:1953) [starrocks-fe.jar:?]
at com.starrocks.leader.Checkpoint.replayAndGenerateGlobalStateMgrImage(Checkpoint.java:215) [starrocks-fe.jar:?]
at com.starrocks.leader.Checkpoint.runAfterCatalogReady(Checkpoint.java:106) [starrocks-fe.jar:?]
at com.starrocks.common.util.LeaderDaemon.runOneCycle(LeaderDaemon.java:73) [starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) [starrocks-fe.jar:?]
Caused by: java.lang.NullPointerException
at com.starrocks.lake.StarOSAgent.getServiceId(StarOSAgent.java:101) ~[starrocks-fe.jar:?]
at com.starrocks.lake.StarOSAgent.prepare(StarOSAgent.java:94) ~[starrocks-fe.jar:?]
at com.starrocks.lake.StarOSAgent.getShardReplicas(StarOSAgent.java:393) ~[starrocks-fe.jar:?]
at com.starrocks.lake.StarOSAgent.getBackendIdsByShard(StarOSAgent.java:444) ~[starrocks-fe.jar:?]
at com.starrocks.lake.LakeTablet.getBackendIds(LakeTablet.java:88) ~[starrocks-fe.jar:?]
at com.starrocks.server.LocalMetastore.truncateTableInternal(LocalMetastore.java:4833) ~[starrocks-fe.jar:?]
at com.starrocks.server.LocalMetastore.replayTruncateTable(LocalMetastore.java:4862) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayTruncateTable(GlobalStateMgr.java:3520) ~[starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.loadJournal(EditLog.java:574) ~[starrocks-fe.jar:?]
… 6 more
2023-06-01 06:23:44,533 WARN (leaderCheckpointer|189) [GlobalStateMgr.replayJournal():1955] got interrupt exception or inconsistent exception when replay journal 201672, will exit,
com.starrocks.journal.JournalInconsistentException: failed to load journal type 118
at com.starrocks.persist.EditLog.loadJournal(EditLog.java:981) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2001) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:1953) [starrocks-fe.jar:?]
at com.starrocks.leader.Checkpoint.replayAndGenerateGlobalStateMgrImage(Checkpoint.java:215) [starrocks-fe.jar:?]
at com.starrocks.leader.Checkpoint.runAfterCatalogReady(Checkpoint.java:106) [starrocks-fe.jar:?]
at com.starrocks.common.util.LeaderDaemon.runOneCycle(LeaderDaemon.java:73) [starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) [starrocks-fe.jar:?]
Caused by: java.lang.NullPointerException
at com.starrocks.lake.StarOSAgent.getServiceId(StarOSAgent.java:101) ~[starrocks-fe.jar:?]
at com.starrocks.lake.StarOSAgent.prepare(StarOSAgent.java:94) ~[starrocks-fe.jar:?]
at com.starrocks.lake.StarOSAgent.getShardReplicas(StarOSAgent.java:393) ~[starrocks-fe.jar:?]
at com.starrocks.lake.StarOSAgent.getBackendIdsByShard(StarOSAgent.java:444) ~[starrocks-fe.jar:?]
at com.starrocks.lake.LakeTablet.getBackendIds(LakeTablet.java:88) ~[starrocks-fe.jar:?]
at com.starrocks.server.LocalMetastore.truncateTableInternal(LocalMetastore.java:4833) ~[starrocks-fe.jar:?]
at com.starrocks.server.LocalMetastore.replayTruncateTable(LocalMetastore.java:4862) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayTruncateTable(GlobalStateMgr.java:3520) ~[starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.loadJournal(EditLog.java:574) ~[starrocks-fe.jar:?]
… 6 more
2023-06-01 06:23:44,535 INFO (Thread-52|114) [StarRocksFE.lambda$addShutdownHook$1():368] start to execute shutdown hook
2023-06-01 06:23:44,553 INFO (Thread-52|114) [StarRocksFE.lambda$addShutdownHook$1():393] shutdown hook end