Export到OSS导出异常.Part number must be an integer between 1 and 10000, inclusive.

导出表A数据到OSS。表A数据量:3副本合计19TB,2500亿行数据。
报错信息

type:RUN_FAIL; msg:export job fail. query id: bd0419e7-bf76-41ce-a53d-08f5f5d3ec1f, fail msg: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A_ass693ca72c-ff3b-11ee-83d4-0c42a16417cc_0_0_2_1_4_3_5_6_8_7_10_9_11_13_12_14_15_17_16_19_18_21_20_23_22_26_25_24_27_28_29_0.csv.1713639981348: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive.

目前配置导出相关参数:
“load_mem_limit”=“85899345920”,
export_max_bytes_per_be_per_task=1073741824
export_task_pool_size=20

当前集群资源过大,68个BE,每台96C512G,所以load_mem_limit给了80G。做这个导出也是为了集群数据迁移到另一个小的sr集群。看这个报错像是数据导出分片过多超过OSS的10000限制了。

https://help.aliyun.com/zh/oss/developer-reference/completemultipartupload?spm=a2c4g.11186623.0.i7#reference-lq1-dtx-wdb

集群信息
【StarRocks版本】3.1.0-rc01-64ca37e
【是否存算分离】否

问题1:我该如何调整导出参数能够减少completemultipartupload的partNumber使这份数据可以正常导出到OSS?

问题2: export_job拆分逻辑是怎样的?为什么这里这么大的导出任务只拆了3个export job 这个和load_mem_limit强相关吗 因为我配了80G 所以拆的少?

当前A表一个partition,48个buckets,288个tablets (这个分区数和数据量是有点离谱所以看看能否当前sr集群直接导出到另一个集群再做优化)

【FE报错日志】

2024-04-21 01:28:32,890 INFO (thrift-server-pool-833249|834744) [HdfsFsManager.getOSSFileSystem():981] could not find file system for path oss://devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc create a new one
2024-04-21 01:28:32,911 INFO (thrift-server-pool-833249|834744) [ExportJob.genCoordinators():445] split export job to tasks. job id: 41227733, job query id: 693ca72c-ff3b-11ee-83d4-0c42a16417cc, task idx: 0, task query id: bd0419e7-bf76-41ce-b8d2-b03f21c544b1
2024-04-21 01:28:32,911 INFO (thrift-server-pool-833249|834744) [ExportJob.genCoordinators():445] split export job to tasks. job id: 41227733, job query id: 693ca72c-ff3b-11ee-83d4-0c42a16417cc, task idx: 1, task query id: bd0419e7-bf76-41cf-b8d2-b03f21c544b1
2024-04-21 01:28:32,911 INFO (thrift-server-pool-833249|834744) [ExportJob.genCoordinators():445] split export job to tasks. job id: 41227733, job query id: 693ca72c-ff3b-11ee-83d4-0c42a16417cc, task idx: 2, task query id: bd0419e7-bf76-41d0-b8d2-b03f21c544b1
2024-04-21 03:06:21,075 WARN (thrift-server-pool-1692813|1695046) [Coordinator.updateFragmentExecStatus():1524] exec state report failed status=errorCode IO_ERROR S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_0_1_0_2_4_3_0.csv.1713634121519: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive., query_id=bd0419e7-bf76-41ce-b8d2-b03f21c544b1, instance_id=bd0419e7-bf76-41ce-b8d2-b03f21c544b5
com.starrocks.common.UserException: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_0_1_0_2_4_3_0.csv.1713634121519: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive.
2024-04-21 03:06:21,078 INFO (export_exporting_sub_task_pool-2|835860) [HdfsFsManager.getOSSFileSystem():981] could not find file system for path oss://devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc create a new one
2024-04-21 03:06:21,098 WARN (export_exporting_sub_task_pool-2|835860) [ExportExportingTask$ExportExportingSubTask.exec():364] export sub task fail. err: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_0_1_0_2_4_3_0.csv.1713634121519: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive… task idx: 0, task query id: bd0419e7-bf76-41ce-b8d2-b03f21c544b1. retry: 0, new query id: bd0419e7-bf76-41ce-a53d-08f5f5d3ec1f
2024-04-21 03:08:14,031 WARN (thrift-server-pool-1713471|1715718) [Coordinator.updateFragmentExecStatus():1524] exec state report failed status=errorCode IO_ERROR S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_1_0_0.csv.1713634122522: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive., query_id=bd0419e7-bf76-41cf-b8d2-b03f21c544b1, instance_id=bd0419e7-bf76-41cf-b8d2-b03f21c544b2
com.starrocks.common.UserException: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_1_0_0.csv.1713634122522: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive.
2024-04-21 03:08:14,092 WARN (export_exporting_sub_task_pool-3|835861) [ExportExportingTask$ExportExportingSubTask.exec():364] export sub task fail. err: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_1_0_0.csv.1713634122522: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive… task idx: 1, task query id: bd0419e7-bf76-41cf-b8d2-b03f21c544b1. retry: 0, new query id: bd0419e7-bf76-41cf-90b7-ebffca4af287
2024-04-21 03:09:54,065 WARN (thrift-server-pool-1727802|1730064) [Coordinator.updateFragmentExecStatus():1524] exec state report failed status=errorCode IO_ERROR S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_2_0_0.csv.1713634121484: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive., query_id=bd0419e7-bf76-41d0-b8d2-b03f21c544b1, instance_id=bd0419e7-bf76-41d0-b8d2-b03f21c544b2
com.starrocks.common.UserException: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_2_0_0.csv.1713634121484: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive.
2024-04-21 03:09:54,067 WARN (export_exporting_sub_task_pool-4|835862) [ExportExportingTask$ExportExportingSubTask.exec():364] export sub task fail. err: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_2_0_0.csv.1713634121484: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive… task idx: 2, task query id: bd0419e7-bf76-41d0-b8d2-b03f21c544b1. retry: 0, new query id: bd0419e7-bf76-41d0-b8dd-081535b1484f
2024-04-21 04:42:02,617 WARN (thrift-server-pool-2472853|2475750) [Coordinator.updateFragmentExecStatus():1524] exec state report failed status=errorCode IO_ERROR S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_0_0_2_1_4_3_5_6_8_7_10_9_11_13_12_14_15_17_16_19_18_21_20_23_22_26_25_24_27_28_29_0.csv.1713639981348: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive., query_id=bd0419e7-bf76-41ce-a53d-08f5f5d3ec1f, instance_id=bd0419e7-bf76-41ce-a53d-08f5f5d3ec3d
com.starrocks.common.UserException: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_0_0_2_1_4_3_5_6_8_7_10_9_11_13_12_14_15_17_16_19_18_21_20_23_22_26_25_24_27_28_29_0.csv.1713639981348: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive.
2024-04-21 04:42:02,620 WARN (export_exporting_sub_task_pool-2|835860) [ExportExportingTask$ExportExportingSubTask.onSubTaskFailed():435] export sub task fail. task idx: 0, task query id: bd0419e7-bf76-41ce-a53d-08f5f5d3ec1f, err: export job fail. query id: bd0419e7-bf76-41ce-a53d-08f5f5d3ec1f, fail msg: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_0_0_2_1_4_3_5_6_8_7_10_9_11_13_12_14_15_17_16_19_18_21_20_23_22_26_25_24_27_28_29_0.csv.1713639981348: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive.
2024-04-21 04:42:02,620 INFO (export_exporting_job_pool-1|835859) [HdfsService.deletePath():72] receive a delete path request, path: oss://devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc
2024-04-21 04:42:03,186 INFO (export_exporting_job_pool-1|835859) [FsStats.logStats():18] cmd=delete, src=oss://devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc, dst=null, size=0, parameter=null, time-in-ms=488, version=4.6.8
2024-04-21 04:42:03,186 INFO (export_exporting_job_pool-1|835859) [ExportJob.cancelInternal():810] remove export temp path success, path: oss://devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc
2024-04-21 04:42:03,187 INFO (export_exporting_job_pool-1|835859) [ExportJob.cancelInternal():832] export job cancelled. job: ExportJob [jobId=41227733, dbId=10929, tableId=4932516, state=CANCELLED, path=oss://devops/sr/export/A/, partitions=(null), progress=100, createTimeMs=2024-04-21 01:28:32, exportStartTimeMs=2024-04-21 01:28:36, exportFinishTimeMs=2024-04-21 04:42:03, failMsg=ExportFailMsg [cancelType=RUN_FAIL, msg=export job fail. query id: bd0419e7-bf76-41ce-a53d-08f5f5d3ec1f, fail msg: S3: Fail to upload part of devops/sr/export/A/__starrocks_export_tmp_693ca72c-ff3b-11ee-83d4-0c42a16417cc/A693ca72c-ff3b-11ee-83d4-0c42a16417cc_0_0_2_1_4_3_5_6_8_7_10_9_11_13_12_14_15_17_16_19_18_21_20_23_22_26_25_24_27_28_29_0.csv.1713639981348: Unable to parse ExceptionName: InvalidArgument Message: Part number must be an integer between 1 and 10000, inclusive.], tmp files=(), files=()]

up up

@lvlouisaslia 大佬能帮忙看看这个吗, 不知道找谁了,救救孩子吧

看一下BE上有没有相应的一些错误日志.

be.conf里调整下面两个参数的值

experimental_s3_max_single_part_size = 16777216
experimental_s3_min_upload_part_size = 16777216

默认16MB, 尝试调整到100MB或者更大.

好的 多谢 我试试

集群之间迁移为什么不用 starrocks-cluster-sync呢,就是这个,之前试了一下,一步到位,貌似还行

你们试的单表最大多少的数据量呢。我这边从存算一体迁移到存算分离,我们存算一体是3.1.0的 之前用这个有点问题,就换成导出再导入了。。