【starrocks-connector】Checkpoint处理失败

starrocks-connector-for-apache-flink 任务运行中Checkpoint失败

Checkpoint Interval 3分钟、CheckpointTimeout 3分钟

Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete snapshot 4 for operator Source: Custom Source -> Map -> Sink: XX事件写sr (28/32)#1. Failure reason: Checkpoint was declined.
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:265)
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:170)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:348)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:233)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:206)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:186)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:605)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:315)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$14(StreamTask.java:1329)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1315)
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:1163)
… 13 more
Caused by: java.lang.RuntimeException: Snapshot state failed by prepare
at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.snapshotState(StarRocksDynamicSinkFunctionV2.java:201)

请问,这个问题有跟进吗

什么版本的集群 这是用Spark Connector读取数据报的错??

StartRock的集群版本是2.4.4

还有其他信息可以补充吗 当前如果还有这个问题可以发下任务id和对应的日志 感谢

这里是错误的信息:
2023-08-05 13:13:55,823 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1) (382d72300150e726d0a658dfd02dfade) switched from RUNNING to FAILED on container_e71_1691118567152_0049_01_000002 @ prod-23(dataPort=18682).
java.lang.Exception: Could not perform checkpoint 321 for operator Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1)#1.
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:1006) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$7(StreamTask.java:958) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:344) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:330) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:202) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_162]
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete snapshot 321 for operator Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1)#1. Failure reason: Checkpoint was declined.
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:264) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:169) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:706) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:627) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:590) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:312) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:1092) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1076) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:994) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
… 13 more
Caused by: java.lang.RuntimeException: Snapshot state failed by prepare
at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.snapshotState(StarRocksDynamicSinkFunctionV2.java:200) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:89) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:218) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:169) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:706) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:627) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:590) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:312) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:1092) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1076) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:994) ~[AppFaultBasicDetailBusinessFilterPassProd-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
… 13 more
2023-08-05 13:13:55,826 INFO org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] - Clearing resource requirements of job a894837bb45bdeee082d521f1c18a666
2023-08-05 13:13:55,827 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - Calculating tasks to restart to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_0.
2023-08-05 13:13:55,828 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - 1 tasks should be restarted to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_0.
2023-08-05 13:13:55,829 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job Flink Streaming Job (a894837bb45bdeee082d521f1c18a666) switched from state RUNNING to RESTARTING.
2023-08-05 13:14:05,833 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job Flink Streaming Job (a894837bb45bdeee082d521f1c18a666) switched from state RESTARTING to RUNNING.
2023-08-05 13:14:05,835 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job a894837bb45bdeee082d521f1c18a666 from Savepoint 319 @ 0 for a894837bb45bdeee082d521f1c18a666 located at hdfs://didonline/apps/flink/flink-checkpoints/74a9361f36537dbf59350d91105a26d8/chk-319.
2023-08-05 13:14:05,836 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No master state to restore
2023-08-05 13:14:05,836 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1) (fb172d27eb7de07fe7341c391b8d8132) switched from CREATED to SCHEDULED.
2023-08-05 13:14:05,838 INFO org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] - Received resource requirements from job a894837bb45bdeee082d521f1c18a666: [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, numberOfRequiredSlots=1}]
2023-08-05 13:14:05,839 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1) (fb172d27eb7de07fe7341c391b8d8132) switched from SCHEDULED to DEPLOYING.
2023-08-05 13:14:05,839 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1) (attempt #2) with attempt id fb172d27eb7de07fe7341c391b8d8132 to container_e71_1691118567152_0049_01_000002 @ prod-23.chain.cloud (dataPort=18682) with allocation id de9d100636213a713207773e1e3140ba
2023-08-05 13:14:05,855 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1) (fb172d27eb7de07fe7341c391b8d8132) switched from DEPLOYING to INITIALIZING.
2023-08-05 13:14:05,883 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1) (fb172d27eb7de07fe7341c391b8d8132) switched from INITIALIZING to RUNNING.
2023-08-05 13:14:13,054 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 322 (type=CHECKPOINT) @ 1691212453038 for job a894837bb45bdeee082d521f1c18a666.
2023-08-05 13:14:13,653 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 322 for job a894837bb45bdeee082d521f1c18a666 (10903 bytes in 613 ms).
2023-08-05 13:17:13,060 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 323 (type=CHECKPOINT) @ 1691212633038 for job a894837bb45bdeee082d521f1c18a666.
2023-08-05 13:17:18,722 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1) (fb172d27eb7de07fe7341c391b8d8132) switched from RUNNING to FAILED on container_e71_1691118567152_0049_01_000002 prod-23.chain.cloud (dataPort=18682).
java.lang.Exception: Could not perform checkpoint 323 for operator Source: Custom Source -> Process -> Sink: AppFaultBasicDetailBusinessFilterPassProd (1/1)#2.

com.starrocks
flink-connector-starrocks
1.2.5_flink-1.13_2.11

好的 我验证一下

感谢,任务一般是跑一两天就出这个错误

请问这个问题还有出现过吗?