Fail to publish partition 27955075 of txnIds 大量日志

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】
【背景】日志爆炸,确定了一下是Fail to publish partition 27955075 of txnIds [55058500,
【业务影响】
磁盘爆炸
【是否存算分离】
【StarRocks版本】StarRocks-3.4.0-centos-amd64
【集群规模】3fe(3follower)+3be(fe与be混部)
【机器信息】8C/64G/万兆
【联系方式】微信winter_1123
fe.log

2025-05-12 15:49:32.772+08:00 INFO (PUBLISH_VERSION|25) [PublishVersionDaemon.publishLakeTransactionBatchAsync():656] start publish lake batch db:26433994 table:27955073 txns:55058500,55102727,55122108,55134531,55151789,55174678,55196883,55219542,55238106,55249897
2025-05-12 15:49:32.785+08:00 ERROR (lake-publish-task-139|638) [PublishVersionDaemon.publishPartitionBatch():551] Fail to publish partition 27955075 of txnIds [55058500, 55102727, 55122108, 55134531, 55151789, 55174678, 55196883, 55219542, 55238106, 55249897]:
发现一个INSERT 导入很久,进行 CANCEL LOAD FROM kmyy WHERE LABEL = ‘insert_586e3dbc-2216-11f0-b075-00163e11d395’;

cn.log
W20250512 16:58:13.410660 140629145740864 transactions.cpp:282] txn_log of txn: 55058500 not found, and can not find the tablet_meta
W20250512 16:58:13.410699 140629145740864 lake_service.cpp:226] Fail to publish version: Internal error: Both txn_log and corresponding tablet_meta missing. tablet_id=27955076 txn_ids=txn_id: 55058500
commit_time: 1745469988
combined_txn_log: false
txn_type: TXN_NORMAL
force_publish: false
gtid: 351552277200240640
,txn_id: 55102727
commit_time: 1745498779
combined_txn_log: false
txn_type: TXN_NORMAL
force_publish: false
gtid: 351612654776745984
,txn_id: 55122108
commit_time: 1745513179
combined_txn_log: false
txn_type: TXN_NORMAL
force_publish: false
gtid: 351642854698778624
,txn_id: 55134531
commit_time: 1745527579
combined_txn_log: false
txn_type: TXN_NORMAL
force_publish: false
gtid: 351673054392221696
,txn_id: 55151789
commit_time: 1745541980
combined_txn_log: false
txn_type: TXN_NORMAL
force_publish: false
gtid: 351703254312157184
,txn_id: 55174678
commit_time: 1745556382
combined_txn_log: false
txn_type: TXN_NORMAL
force_publish: false
gtid: 351733457373626368
,txn_id: 55196883
commit_time: 1745570781
combined_txn_log: false
txn_type: TXN_NORMAL
force_publish: false
gtid: 351763653896175616
,txn_id: 55219542
commit_time: 1745585181
combined_txn_log: false
txn_type: TXN_NORMAL
force_publish: false
【附件】
combined_logs_sr_01.tar.gz (70.5 KB) combined_logs_sr_03.tar.gz (765.1 KB) combined_logs_sr_02.tar.gz (997.9 KB)

发现一个INSERT 导入很久,进行 CANCEL LOAD FROM kmyy WHERE LABEL = ‘insert_586e3dbc-2216-11f0-b075-00163e11d395’;
重启FE 和CN节点还是会报错,

fe_sr03.log 是leader

怎么确定是哪一个表导致的?我直接重建表可以解决吗?

存算分离,阿里云OSS存储

select *

from information_schema.partitions_meta

where partition_id=7955075 通 分区ID查询到对应的表和文件位置

和日志的不一样。55058500, 55102727, 55122108, 55134531, 55151789, 55174678, 55196883, 55219542, 55238106, 55249897
原因不确定;
drop table kmyy.temp_data_validation force 没有报错

谢谢凯哥解答。