Flink进行cdc任务偶尔报错:com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: XXX, table: XXX, label: XXX, responseBody: {"status":"FAILED","msg":"No backend alive."}

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】使用flink cdc将mysql数据同步到starrocks,偶尔报错(影响作业)会消耗flink重试次数,导致任务挂掉
【背景】测试环境一切正常,生产环境每天都会因为这个错误挂掉(数据多的原因?)
【业务影响】flink任务挂掉导致检查点失效,需要重新同步任务。(rds只读库,binlog保存时间短)
【是否存算分离】否
【StarRocks版本】2.5.22-5dffd65
【集群规模】1fe+1be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:8C/16G
【表模型】主键模型
【导入或者导出方式】Flink
【联系方式】381147241@qq.com
【附件】
Suppressed: java.lang.RuntimeException: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: data_center, table: order_receipt, label: flink-7bf7b7ba-0f30-48ef-9d2b-cf39cc39134f,
responseBody: {\“status\”:\“FAILED\”,\“msg\”:\“No backend alive.\”}
errorLog: null
at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.AssertNotException(StreamLoadManagerV2.java:427)
at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.flush(StreamLoadManagerV2.java:355)
at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.close(StarRocksDynamicSinkFunctionV2.java:251)
at org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:41)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.close(AbstractUdfStreamOperator.java:114)
at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:163)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.closeAllOperators(RegularOperatorChain.java:125)
at org.apache.flink.streaming.runtime.tasks.StreamTask.closeAllOperators(StreamTask.java:1000)
at org.apache.flink.util.IOUtils.closeAll(IOUtils.java:254)
at org.apache.flink.core.fs.AutoCloseableRegistry.doClose(AutoCloseableRegistry.java:72)
at org.apache.flink.util.AbstractAutoCloseableRegistry.close(AbstractAutoCloseableRegistry.java:127)
at org.apache.flink.streaming.runtime.tasks.StreamTask.cleanUp(StreamTask.java:919)
at org.apache.flink.runtime.taskmanager.Task.lambda$restoreAndInvoke$0(Task.java:930)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:930)
… 3 more
Caused by: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: data_center, table: order_receipt, label: flink-7bf7b7ba-0f30-48ef-9d2b-cf39cc39134f,
responseBody: {\“status\”:\“FAILED\”,\“msg\”:\“No backend alive.\”}
errorLog: null
at com.starrocks.data.load.stream.DefaultStreamLoader.sendToSR(DefaultStreamLoader.java:339)
at com.starrocks.data.load.stream.DefaultStreamLoader.lambda$send$3(DefaultStreamLoader.java:170)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
… 1 more
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete snapshot 1266 for operator Sink: data_center (1/3)#5. Failure reason: Checkpoint was declined.
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:269)
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:173)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:345)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:227)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:212)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:192)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:647)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:320)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$12(StreamTask.java:1256)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1244)
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1201)
… 22 more
Caused by: java.lang.RuntimeException: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: data_center, table: order_receipt, label: flink-7bf7b7ba-0f30-48ef-9d2b-cf39cc39134f,
responseBody: {\“status\”:\“FAILED\”,\“msg\”:\“No backend alive.\”}
errorLog: null
at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.AssertNotException(StreamLoadManagerV2.java:427)
at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.flush(StreamLoadManagerV2.java:355)
at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.snapshotState(StarRocksDynamicSinkFunctionV2.java:264)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:87)
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:222)
… 33 more
Caused by: [CIRCULAR REFERENCE: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: data_center, table: order_receipt, label: flink-7bf7b7ba-0f30-48ef-9d2b-cf39cc39134f,
responseBody: {\“status\”:\“FAILED\”,\“msg\”:\“No backend alive.\”}
errorLog: null]
",“timestamp”:1720598471397,“taskName”:“Sink: data_center (1/3) - execution #5”,“location”:“localhost:35305”,“concurrentExceptions”:[]}],“truncated”:false}}