数据导出报错:transmit chunk rpc failed

版本:2.5.11
5FE 11BE
使用datax导出数据,1小时固定报错退出:transmit chunk rpc failed:1d35ec94-933f-11ee-b3c4-28e424bb0239

在每个BE里加入了以下参数:

brpc_socket_max_unwritten_bytes=10737418240
thrift_rpc_timeout_ms=10000

在同网段申请了另一台主机,同样使用datax,发现导出了10G数据无报错,区别是无报错的这台主机因为是同网段,传输速率高,在85秒内导出完毕

2023-12-05 16:38:27.955 [job-0] INFO  StandAloneJobContainerCommunicator - Total 23347520 records, 10644187238 bytes | Speed 24.66MB/s, 56281 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 31.217s |  All Task WaitReaderTime 85.333s | Percentage 0.00%
2023-12-05 16:38:33.408 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from sales.sales act where act.visit_date BETWEEN '2023-06-01' and '2023-12-04';
] jdbcUrl:[jdbc:mysql://10.204.128.68:6033/sales?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2023-12-05 16:38:35.359 [0-0-0-writer] INFO  TxtFileWriter$Task - end do write
2023-12-05 16:38:35.436 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[437635]ms
2023-12-05 16:38:35.437 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-12-05 16:38:37.958 [job-0] INFO  StandAloneJobContainerCommunicator - Total 24236840 records, 11050266238 bytes | Speed 38.73MB/s, 88932 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 31.318s |  All Task WaitReaderTime 88.078s | Percentage 100.00%
2023-12-05 16:38:37.958 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-12-05 16:38:37.959 [job-0] INFO  JobContainer - DataX Writer.Job [txtfilewriter] do post work.
2023-12-05 16:38:37.960 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2023-12-05 16:38:37.960 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-12-05 16:38:37.963 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /data/datax/datax/hook
2023-12-05 16:38:37.965 [job-0] INFO  JobContainer - 
	 [total cpu info] => 
		averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
		-1.00%                         | -1.00%                         | -1.00%
                        

	 [total gc info] => 
		 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
		 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
		 PS Scavenge          | 2504               | 1733               | 771                | 9.524s             | 6.814s             | 2.710s             

2023-12-05 16:38:37.965 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-12-05 16:38:37.966 [job-0] INFO  StandAloneJobContainerCommunicator - Total 24236840 records, 11050266238 bytes | Speed 23.95MB/s, 55083 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 31.318s |  All Task WaitReaderTime 88.078s | Percentage 100.00%
2023-12-05 16:38:37.968 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2023-12-05 16:31:17
任务结束时刻                    : 2023-12-05 16:38:37
任务总计耗时                    :                440s
任务平均流量                    :           23.95MB/s
记录写入速度                    :          55083rec/s
读出记录总数                    :            24236840
读写失败总数                    :                   0

brpc_socket_max_unwritten_bytes 这个有尝试继续调大吗?

brpc_socket_max_unwritten_bytes=10737418240

brpc_socket_max_unwritten_bytes=10737418240,调到10G了

这个问题是BI那边反馈的,因为外网下载速度慢,总是1小时断开;
刚开始以为是网络问题,后面在同网段申请了一台主机,用datax限速导数,发现也存在1小时退出

感觉不是这个参数的问题,因为超过10G在1小时内也能成功导出

您好 当前是想不做超时报错 正常限速执行完是吗

是的,因为网络限速的原因,有部分用户从starrocks里抽数会存在超过1小时的情况,现在的问题是下不完 :joy: 1小时一定会断开

嗯呢 我确认下是任务的超时限制还是datax超时限制的配置

这个在帆软上也是同样的表现,1小时断开

3.1.0也有这个问题 :joy:

2.5.5 88分钟断开 :joy:

3.1 最新的版本解决了

3.1.5吗?

对 3.1.5版本

好,我先在测试环境上试试

@许秀不许秀 @yuchen1019 3.1.5上测试确认修复了,想问下2.5.16上有对这个问题修复吗?

当前没有合入到2.5版本 方便的话建议您升级到最新的3.1.*版本

后续有计划合并吗?