compaction 任务失败

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】有一个表 tv_total_Info 大概有30亿数据, 我导了好几次,失败了大概有3次,使用broker 的 insert Into select 导入的, 最后使用 datax 导入成功。
但是这两天 tv_total_info 这表一直 compact 失败。

【业务影响】
【是否存算分离】是
【StarRocks版本】3.1.4
【集群规模】1fe +2be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】社区群17 不知不觉
【附件】
2023-12-06 02:30:23,677 INFO (COMPACTION_DISPATCH|110) [DatabaseTransactionMgr.abortTransaction():1235] transaction:[TransactionState. txn_id: 17450, label: COMPACTION_10089-22549-22599-1701714623453, db id: 10089, table id list: 22549, callback id: -1, coordinator: FE: 172.16.132.124, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1701714623453, commit time: -1, finish time: 1701801023675, total cost: 86400222ms, reason: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:86400000(MILLISECONDS) correlationId:55584 timeout with bound channel =>[id: 0x7b621da7, L:/172.16.132.124:35377 - R:/172.16.132.124:8060]] successfully rollback

2023-12-06 02:30:23,678 ERROR (COMPACTION_DISPATCH|110) [CompactionScheduler.schedule():148] Compaction job TxnId=17451 partition=enlightent_daily.tv_total_info.p201810 failed: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:86400000(MILLISECONDS) correlationId:55585 timeout with bound channel =>[id: 0x8a707e20, L:/172.16.132.124:35385 - R:/172.16.132.124:8060]

2023-12-06 02:30:23,678 INFO (COMPACTION_DISPATCH|110) [CompactionTask.abort():119] aborted compaction task, txn_id: 17451, node: 10034

2023-12-06 02:30:23,679 INFO (COMPACTION_DISPATCH|110) [DatabaseTransactionMgr.abortTransaction():1235] transaction:[TransactionState. txn_id: 17451, label: COMPACTION_10089-22549-22597-1701714623453, db id: 10089, table id list: 22549, callback id: -1, coordinator: FE: 172.16.132.124, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1701714623453, commit time: -1, finish time: 1701801023678, total cost: 86400225ms, reason: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:86400000(MILLISECONDS) correlationId:55585 timeout with bound channel =>[id: 0x8a707e20, L:/172.16.132.124:35385 - R:/172.16.132.124:8060]] successfully rollback

2023-12-06 02:30:23,680 ERROR (COMPACTION_DISPATCH|110) [CompactionScheduler.schedule():148] Compaction job TxnId=17449 partition=enlightent_daily.tv_total_info.p201904 failed: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:86400000(MILLISECONDS) correlationId:55583 timeout with bound channel =>[id: 0x73b9362d, L:/172.16.132.124:35399 - R:/172.16.132.124:8060]

2023-12-06 02:30:23,680 INFO (COMPACTION_DISPATCH|110) [CompactionTask.abort():119] aborted compaction task, txn_id: 17449, node: 10034

2023-12-06 02:30:23,683 INFO (COMPACTION_DISPATCH|110) [DatabaseTransactionMgr.abortTransaction():1235] transaction:[TransactionState. txn_id: 17449, label: COMPACTION_10089-22549-22609-1701714623453, db id: 10089, table id list: 22549, callback id: -1, coordinator: FE: 172.16.132.124, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1701714623453, commit time: -1, finish time: 1701801023680, total cost: 86400227ms, reason: A error occurred: errorCode=62 errorMessage:method request time out, please check ‘onceTalkTimeout’ property. current value is:86400000(MILLISECONDS) correlationId:55583 timeout with bound channel =>[id: 0x73b9362d, L:/172.16.132.124:35399 - R:/172.16.132.124:8060]] successfully rollback

2023-12-06 02:30:23,685 INFO (COMPACTION_DISPATCH|110) [DatabaseTransactionMgr.beginTransaction():313] begin transaction: txn_id: 56116 with label COMPACTION_10089-22549-22719-1701801023685 from coordinator FE: 172.16.132.124, listner id: -1

2023-12-06 02:30:23,685 INFO (COMPACTION_DISPATCH|110) [DatabaseTransactionMgr.beginTransaction():313] begin transaction: txn_id: 56117 with label COMPACTION_10089-22549-22667-1701801023685 from coordinator FE: 172.16.132.124, listner id: -1

2023-12-06 02:30:23,685 INFO (COMPACTION_DISPATCH|110) [DatabaseTransactionMgr.beginTransaction():313] begin transaction: txn_id: 56118 with label COMPACTION_10089-22549-22665-1701801023685 from coordinator FE: 172.16.132.124, listner id: -1

select * from information_schema.be_cloud_native_compactions where TXN_ID = 17449;

2023-12-06 16:38:57,326 WARN (starrocks-mysql-nio-pool-574|21484) [Coordinator.getNext():1518] query failed: Query exceeded time limit of 300 seconds

2023-12-06 16:38:57,326 INFO (starrocks-mysql-nio-pool-574|21484) [QeProcessorImpl.unregisterQuery():139] deregister query id = 328827ed-9412-11ee-9dd1-00163e354101

2023-12-06 16:38:57,326 INFO (starrocks-mysql-nio-pool-574|21484) [StmtExecutor.execute():660] execute Exception, sql: SELECT @@max_allowed_packet,@@system_time_zone,@@time_zone,@@auto_increment_increment, error: Query exceeded time limit of 300 seconds

fe 节点无法连接了, fe的log 在飞快的刷, 其实没人跑sql啥的

拿个fe的jstack看下,然后重启下fe

不过这看这里的日志,看起来这个be有点问题,这个be有啥异常日志吗

我用datax 导入数据, datax 显示已经完成了, 后台fe还在 疯狂的刷 datax 的 streamload 的日志, 请问咱们 starrockswriter 是异步的么

W1206 16:49:01.847401 3160 fragment_context.cpp:170] [Driver] Canceled, query_id=9adae4c6-9413-11ee-9dd1-00163e354101, instance_id=9adae4c6-9413-11ee-9dd1-00163e354102, reason=InternalError
W1206 16:45:32.964785 3172 pipeline_driver.cpp:715] fragment_id 1e596b72-9413-11ee-9dd1-00163e354104 driver query_id=1e596b72-9413-11ee-9dd1-00163e354101 fragment_id=1e596b72-9413-11ee-9dd1-00163e354104 driver=driver_3_1, status=INPUT_EMPTY, operator-chain: [exchange_source_3_0x7f0a32d59890(X) -> hash_join_build_4_0x7f09878ca710(X)(HashJoiner=0x7f09878c9a90)] cancels operator hash_join_build_4_0x7f09878ca710(X)(HashJoiner=0x7f09878c9a90) with finished error runtime state is cancelled

be 节点确实有异常日志, 在刷 Canceled, query_id=3d8e75f8-9413-11ee-9dd1-00163e354101, instance_id=3d8e75f8-9413-11ee-9dd1-00163e354102, reason=InternalError

compactions 表空了。不知道什么原因

在这个be搜下失败的compaction任务的txn id上下文看看