flink实时写入SR表,滚动重启BE会报错label state: UNKNOWN.

【详述】flink实时写入SR表报错
【背景】滚动重启BE服务后Flink任务写入失败
【业务影响】影响业务
【是否存算分离】集群存算一体
【StarRocks版本】3.2.16
【集群规模】例如:3fe(1 follower+2observer)+ 8BE
【机器信息】96C + 512G + 8T
【附件】

2025-08-19 19:55:56
java.lang.RuntimeException: Failed to notify checkpoint complete for checkpoint id 9223372036854775807
at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.notifyCheckpointComplete(StarRocksDynamicSinkFunctionV2.java:367)
at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.openForExactlyOnce(StarRocksDynamicSinkFunctionV2.java:231)
at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.open(StarRocksDynamicSinkFunctionV2.java:212)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:101)
at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:46)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction commit failed, db: cdc, table: erp_order, label: flink-07bd50d0-5689-425b-ab12-41fa56dac0e5, commit response status: TXN_NOT_EXISTS, label state: UNKNOWN. commit response status with TXN_NOT_EXISTS or label state with UNKNOWN often happens when transaction timeouts, and please check StarRocks FE leader’s log to confirm it. You can find the transaction id for the label in the FE log first, and search with the transaction id and the keyword ‘expired’
at com.starrocks.data.load.stream.TransactionStreamLoader.commit(TransactionStreamLoader.java:310)
at com.starrocks.data.load.stream.DefaultStreamLoader.commit(DefaultStreamLoader.java:239)
at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.commit(StreamLoadManagerV2.java:401)
at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.notifyCheckpointComplete(StarRocksDynamicSinkFunctionV2.java:355)
… 15 more
Caused by: com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction commit failed, db: cdc_test, table: tms_t_failed_order, label: flink-07bd50d0-5689-425b-ab12-41fa56dac0e5, commit response status: TXN_NOT_EXISTS, label state: UNKNOWN. commit response status with TXN_NOT_EXISTS or label state with UNKNOWN often happens when transaction timeouts, and please check StarRocks FE leader’s log to confirm it. You can find the transaction id for the label in the FE log first, and search with the transaction id and the keyword ‘expired’
at com.starrocks.data.load.stream.TransactionStreamLoader.commit(TransactionStreamLoader.java:308)
… 18 more

可以根据日志提示去看下fe log 里面用label 去搜一下事务的状态,看下是否是由于恢复一个比较旧的checkpoint ,事务已经被清理了 。
解决办法:
考虑从最新的cp 来恢复。
另外建议配置label prefix ,可以避免这个问题
image https://docs.mirrorship.cn/zh/docs/3.3/loading/Flink-connector-starrocks/#exactly-once