streamload事务提交不定期失败

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】streamload事务提交不定期失败,报错内容: ```
Data loading to remote server failed - {“TxnId”:100167076,“Label”:“growthdata-f2f09a16-1dc8-442f-b9b6-3c848f3cf8d9”,“Status”:“Fail”,“Message”:"Commit transaction fail cause call frontend service failed, address=TNetworkAddress(hostname=10.200.227.2, port=9020), reason=No more data to read., Transaction status unknown, you can retry with same

【背景】帆软FDL工具实时导入不定期出现脏数据,脏数据报错内容如上所示,期间未发现大规模查询和写入
【业务影响】丢失数据,需要手工处理,但近期发生次数变多
【是否存算分离】否
【StarRocks版本】3.2.1
【集群规模】3fe(1 follower+2observer)+5be
【机器信息】FE:32C/64GB/双万兆,BE:72C/512GB/双万兆
【联系方式】社区群18-常志远
【附件】
![31f3538b18039e73cd938c5d4141f73|690x330](upload://7ed50hbAhafyzW4vMYqTvFDNUdA.png) 
![image|690x452](upload://zOdv6RcdV9VrLF0NoYxZCPevejT.png) 
![image|690x259](upload://y4dGlT41OWPqCFZoLCfLIQaNFcy.png) 
![image|690x310](upload://7uTPU3mOmdX6xWqU0QgHQUfFVV2.png) 
![image|690x232](upload://m660jHUvvOMxA86PhCep9Xow92x.png) 
![image|690x167](upload://tQS4VMQUUFx1qTwqfgixYNprePU.png) 
![image|690x294](upload://e3sNSVmAemc2vWDyZtIShQfFjNn.png) 
- 慢查询:
  - Profile信息,
![image|690x80](upload://zgvDgMsFIblNfYvpHiL71enQU90.png) 
  - 并行度:show variables like '%parallel_fragment_exec_instance_num%';
![image|483x85](upload://4IqDj68jVYyZ2FkzxtGUyu4x5rA.png) 
  - pipeline是否开启:show variables like '%pipeline%';
![image|353x143](upload://dIekAbv7duVAMwME1wk5k3fg8GL.png) 
  - be节点cpu和内存使用率截图
![image|690x322](upload://aFBvOFfpucicYJe80d6BWcmiiQh.png) 
CPU占用30%左右,由于升级了grafana到9.4.3,有些指标不显示了
- 查询报错:
  - 未发生
- be crash
  - be.out
  - 未发生
- 外表查询报错
  - 未发生

能给看下这个问题吗?

这个看着是fe响应超时了 确认下fe有重启过吗 或者假死 什么版本的集群呢

集群版本3.2.1,3台FE全部正常,这个事务是不定期某个批次提交失败,其他时间都正常

2024-04-25 09:43:01,397 INFO (starrocks-mysql-nio-pool-178399|440516838) [QeProcessorImpl.unregisterQuery():147] deregister query id = 26b3c6ec-02a5-11ef-8bf2-347379263aad
2024-04-25 09:43:01,400 INFO (nioEventLoopGroup-7-68|3556) [RestBaseAction.handleRequest():73] receive http request. url=/api/ODS_JK_5_2C/growthdata/_stream_load
2024-04-25 09:43:01,400 INFO (nioEventLoopGroup-7-68|3556) [LoadAction.executeWithoutPasswordInternal():139] redirect load action to destination=TNetworkAddress(hostname:10.200.227.5, p
ort:8040), db: ODS_JK_5_2C, tbl: growthdata, label: growthdata-f2f09a16-1dc8-442f-b9b6-3c848f3cf8d9
2024-04-25 09:43:01,411 INFO (thrift-server-pool-439237648|440523903) [FrontendServiceImpl.loadTxnBegin():1273] receive txn begin request, db: ODS_JK_5_2C, tbl: growthdata, label: growt
hdata-f2f09a16-1dc8-442f-b9b6-3c848f3cf8d9, backend: 10.200.227.5
2024-04-25 09:43:01,411 INFO (thrift-server-pool-439237648|440523903) [DatabaseTransactionMgr.beginTransaction():317] begin transaction: txn_id: 100167076 with label growthdata-f2f09a16
-1dc8-442f-b9b6-3c848f3cf8d9 from coordinator BE: 10.200.227.5, listner id: -1
2024-04-25 09:43:01,412 INFO (thrift-server-pool-439237648|440523903) [FrontendServiceImpl.streamLoadPut():1698] receive stream load put request. db:ODS_JK_5_2C, tbl: growthdata, txn_id
: 100167076, load id: bd4a76d3-f732-5cfd-4035-069f9985eb9b, backend: 10.200.227.5
2024-04-25 09:43:01,412 INFO (thrift-server-pool-439237648|440523903) [StreamLoadPlanner.plan():286] load job id: bd4a76d3-f732-5cfd-4035-069f9985eb9b tx id 100167076 parallel 0 compres
s NO_COMPRESSION replicated true quorum MAJORITY
2024-04-25 09:43:01,555 INFO (thrift-server-pool-439237763|440524018) [QeProcessorImpl.reportExecStatus():188] ReportExecStatus() failed, query does not exist, fragment_instance_id=26b3
c6ec-02a5-11ef-8bf2-347379263aae, query_id=26b3c6ec-02a5-11ef-8bf2-347379263aad,
2024-04-25 09:43:01,596 INFO (thrift-server-pool-439237766|440524021) [FrontendServiceImpl.loadTxnCommit():1371] receive txn commit request. db: ODS_JK_4_K, tbl: historydata_K, txn_id:
100167062, backend: 10.200.227.7
相关事务日志如上

今天又发生一个批次事务失败:


Commit transaction fail cause call frontend service failed, address=TNetworkAddress(hostname=10.200.227.2, port=9020), reason=write() send(): Connection reset by peer, Transaction status unknown, you can retry with same label., backend: 10.200.227.7

追踪BE节点日志:

Fail to handle streaming load, id=6f410ca66a9f1737-fc65d2fe0ffdc89c errmsg=Commit transaction fail cause call frontend service failed, address=TNetworkAddress(hostname=10.200.227.2, port=9020), reason=write() send(): Connection reset by peer, Transaction status unknown, you can retry with same label. id=6f410ca66a9f1737-fc65d2fe0ffdc89c, job_id=-1, txn_id: 100988631, label=growthdata-908b2fb5-fbb1-4db4-a22d-0cee748b3a18, db=ODS_JK_5_2A

I0426 02:38:07.984542 12316 tablet_sink_sender.cpp:327] Olap table sink statistics. load_id: 6f410ca6-6a9f-1737-fc65-d2fe0ffdc89c, txn_id: 100988631, add chunk time(ms)/wait lock time(ms)/num: {10005:(411)(0)(1)} {10006:(350)(0)(1)} {10007:(400)(0)(1)} {10008:(340)(0)(1)} {10009:(214)(0)(1)}
- TxnID: 100988631