【详述】sparkStreaming任务经常有写入失败的报错,
任务报错
responseBody: {
“Status”: “THRIFT_RPC_ERROR”,
“Message”: “call frontend service failed, address=TNetworkAddress(hostname=fe lead节点, port=9020), reason=THRIFT_EAGAIN (timed out)”
}
FE节点部分关键日志:
2024-03-29 11:40:59,804 WARN (thrift-server-pool-145291339|146510701) [FrontendServiceImpl.streamLoadPut():1263] failed to get stream load plan: get database read lock timeout, database=dm
2024-03-29 11:40:59,807 WARN (thrift-server-pool-145291340|146510702) [Database.logTryLockFailureEvent():150] try db lock failed. type: readLock, current owner id: 146510543, owner name: thrift-server-pool-145291181, owner stack: dump thread: thrift-server-pool-145291181, id: 146510543
…
…
2024-03-29 11:41:09,278 INFO (thrift-server-pool-145291181|146510543) [DatabaseTransactionMgr.commitTransaction():484] transaction:[TransactionState. txn_id: 33202343, label: spark_streamload_20240329_114052_4eaf658d120c4ed5ab456ed8b6dd9e37, db id: 12139, table id list: 4933128, callback id: -1, coordinator: BE: 172.30.30.117, transaction status: COMMITTED, error replicas num: 0, replica ids: , prepare time: 1711683652787, commit time: 1711683654795, finish time: -1, write cost: 2008ms, reason: attachment: com.starrocks.load.loadv2.ManualLoadTxnCommitAttachment@7d3e0ccc] successfully committed
2024-03-29 11:41:09,278 INFO (thrift-server-pool-145291372|146510734) [DatabaseTransactionMgr.beginTransaction():309] begin transaction: txn_id: 33202354 with label spark_streamload_20240329_114100_e513189f51944092870a2169d22b5236 from coordinator BE: 172.30.30.117, listner id: -1
2024-03-29 11:41:09,278 WARN (PUBLISH_VERSION|19) [Database.logSlowLockEventIfNeeded():143] slow db lock. type: readLock, db id: 12139, db name: dm, wait time: 16252ms, former owner id: 146510543, owner name: thrift-server-pool-145291181, owner stack: dump thread: thrift-server-pool-145291181, id: 146510543
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
com.starrocks.transaction.DatabaseTransactionMgr.readLock(DatabaseTransactionMgr.java:144)
com.starrocks.transaction.DatabaseTransactionMgr.commitTransaction(DatabaseTransactionMgr.java:379)
com.starrocks.transaction.GlobalTransactionMgr.commitTransaction(GlobalTransactionMgr.java:376)
com.starrocks.transaction.GlobalTransactionMgr.commitAndPublishTransaction(GlobalTransactionMgr.java:454)
com.starrocks.service.FrontendServiceImpl.loadTxnCommitImpl(FrontendServiceImpl.java:1036)
com.starrocks.service.FrontendServiceImpl.loadTxnCommit(FrontendServiceImpl.java:995)
com.starrocks.thrift.FrontendService$Processor$loadTxnCommit.getResult(FrontendService.java:2716)
com.starrocks.thrift.FrontendService$Processor$loadTxnCommit.getResult(FrontendService.java:2696)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
com.starrocks.common.SRTThreadPoolServer$WorkerProcess.run(SRTThreadPoolServer.java:311)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
【背景】无
【业务影响】任务写入数据失败
【是否存算分离】否
【StarRocks版本】2.5.11
【集群规模】3fe+4be(fe与be混部)
【机器信息】12C/48G
【联系方式】社区群3-Mr。xiao
【附件】
sparkstreaming任务报错日志截图:
lead FE日志截图:
在FE日志里搜索streamload任务的label相关日志截图: