fe启动恢复很久

【详述】fe启动时间很长
【背景】3个fe 挂掉后,重启失败,尝试使用 metadata_failure_recovery 的方式当作一个单节点集群启动, jps 能立即查到进程,大约2天后,还查询fe相关的占用,停止重新启动,大约4天后能查询到 fe相关端口的占用,(意味着fe相关的服务都正常) ,读写一切正常,但是日志中狂刷leaderCheckpointer 相关的日志, 至此时,该日志已经刷了6天日志,
【业务影响】
【StarRocks版本】2.4.2
【集群规模】例如:3fe + 3be(fe与be混部)
【机器信息】 fe 在重启时内存改了20G
【联系方式】 社区群8-空格
【其他】
业务上使用姿势:3分钟对1000张表 单线程执行 insert into override 操作。
在启动阶段,有看到fe恢复fe挂掉前的 insert into override 这样的job.

     现在不敢去掉配置文件中的metadata_failure_recovery  参数进行重启,担心启动一次耗时好几天。当前fe.log 日志内容
2023-06-25 10:10:46,806 WARN (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():137] invalid job info. current state:OVERWRITE_RUNNING, from state:OVERWRITE_FAILED

2023-06-25 10:10:46,806 INFO (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():135] replay state change:InsertOverwriteStateChangeInfo{jobId=113212222, fromState=OVERWRITE_FAILED, toState=OVERWRITE_FAILED, sourcePartitionIds=[108308418], tmpPartitionIds=[113212223]}
2023-06-25 10:10:46,806 WARN (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():137] invalid job info. current state:OVERWRITE_RUNNING, from state:OVERWRITE_FAILED
2023-06-25 10:10:46,806 INFO (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():135] replay state change:InsertOverwriteStateChangeInfo{jobId=113212225, fromState=OVERWRITE_FAILED, toState=OVERWRITE_FAILED, sourcePartitionIds=[108308546], tmpPartitionIds=[113212226]}
2023-06-25 10:10:46,806 WARN (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():137] invalid job info. current state:OVERWRITE_RUNNING, from state:OVERWRITE_FAILED
2023-06-25 10:10:46,806 INFO (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():135] replay state change:InsertOverwriteStateChangeInfo{jobId=113212228, fromState=OVERWRITE_FAILED, toState=OVERWRITE_FAILED, sourcePartitionIds=[108308461], tmpPartitionIds=[113212229]}
2023-06-25 10:10:46,806 WARN (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():137] invalid job info. current state:OVERWRITE_RUNNING, from state:OVERWRITE_FAILED
2023-06-25 10:10:46,806 INFO (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():135] replay state change:InsertOverwriteStateChangeInfo{jobId=113212231, fromState=OVERWRITE_FAILED, toState=OVERWRITE_FAILED, sourcePartitionIds=[108308503], tmpPartitionIds=[113212232]}
2023-06-25 10:10:46,806 WARN (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():137] invalid job info. current state:OVERWRITE_RUNNING, from state:OVERWRITE_FAILED
2023-06-25 10:10:46,806 INFO (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():135] replay state change:InsertOverwriteStateChangeInfo{jobId=113212234, fromState=OVERWRITE_FAILED, toState=OVERWRITE_FAILED, sourcePartitionIds=[108308674], tmpPartitionIds=[113212235]}
2023-06-25 10:10:46,806 WARN (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():137] invalid job info. current state:OVERWRITE_RUNNING, from state:OVERWRITE_FAILED
2023-06-25 10:10:46,806 INFO (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():135] replay state change:InsertOverwriteStateChangeInfo{jobId=113212237, fromState=OVERWRITE_FAILED, toState=OVERWRITE_FAILED, sourcePartitionIds=[108308632], tmpPartitionIds=[113212238]}
2023-06-25 10:10:46,806 WARN (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():137] invalid job info. current state:OVERWRITE_RUNNING, from state:OVERWRITE_FAILED
2023-06-25 10:10:46,806 INFO (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():135] replay state change:InsertOverwriteStateChangeInfo{jobId=113212240, fromState=OVERWRITE_FAILED, toState=OVERWRITE_FAILED, sourcePartitionIds=[108308589], tmpPartitionIds=[113212241]}
2023-06-25 10:10:46,806 WARN (leaderCheckpointer|456) [InsertOverwriteJobRunner.replayStateChange():137] invalid job info. current state:OVERWRITE_RUNNING, from state:OVERWRITE_FAILED