使用files导入minio数据时,如果是分区表,路径中带=会失败

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】s3a的路径中没有=时能正常查询和导入,但如果带了=时,比如dt=20240301就会失败
【背景】
SELECT * FROM FILES
(
“aws.s3.endpoint” = “…”,
“path” = “s3a://bucket/datawarehouse/tables/dwd_financial_be_fee_mi/dt=202403/*”,
“aws.s3.enable_ssl” = “false”,
“aws.s3.access_key” = “…”,
“aws.s3.secret_key” = “…”,
“format” = “orc”,
“aws.s3.use_aws_sdk_default_behavior” = “false”,
“aws.s3.use_instance_profile” = “false”,
“aws.s3.enable_path_style_access” = “true”
)
LIMIT 3
;

抓包看到是使用HTTP接口,调用请求是:GET /bucket/?list-type=2&delimiter=%2F&max-keys=5000&prefix=datawarehouse%2Ftables%2Fdwd_financial_be_fee_mi%2Fdt%3D202403%2F&fetch-owner=false HTTP/1.1\r\n

针对http接口,=应该是敏感字符吧,路径中的=会导致解析出问题。

【业务影响】无法成功导入分区数据
【是否存算分离】否
【StarRocks版本】例如:3.1.9
【集群规模】例如:1fe+2be
【机器信息】8C/32G/万兆
【导入或者导出方式】files
【联系方式】55309107@qq.com
【附件】

请问报错信息是什么,s3a://bucket/datawarehouse/tables/dwd_financial_be_fee_mi/dt=202403/ 这个路径是存在的是吧,分区是月份

这个路径是存在的。
把dt=202403目录下的文件,移到s3a://bucket/datawarehouse/tables/dwd_financial_be_fee_mi/下,就可以访问到,放在dt=202403下就会报这个错误:

com.starrocks.sql.analyzer.StorageAccessException: Access storage error. Error message: failed to get file schema, path: s3a://bucket/datawarehouse/tables/dwd_financial_be_fee_mi/dt=202403/, error: [BE access S3 file failed, SdkResponseCode=403, SdkErrorType=15, SdkErrorMessage=No response body.]
at com.starrocks.sql.analyzer.QueryAnalyzer.resolveTableFunctionTable(QueryAnalyzer.java:1004) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer.access$100(QueryAnalyzer.java:97) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.resolveTableRef(QueryAnalyzer.java:247) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitSelect(QueryAnalyzer.java:194) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitSelect(QueryAnalyzer.java:114) ~[starrocks-fe.jar:?]
at com.starrocks.sql.ast.SelectRelation.accept(SelectRelation.java:242) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.process(QueryAnalyzer.java:119) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitQueryRelation(QueryAnalyzer.java:134) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitQueryStatement(QueryAnalyzer.java:124) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.visitQueryStatement(QueryAnalyzer.java:114) ~[starrocks-fe.jar:?]
at com.starrocks.sql.ast.QueryStatement.accept(QueryStatement.java:56) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer$Visitor.process(QueryAnalyzer.java:119) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer.analyze(QueryAnalyzer.java:107) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.AnalyzerVisitor.visitQueryStatement(AnalyzerVisitor.java:351) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.AnalyzerVisitor.visitQueryStatement(AnalyzerVisitor.java:134) ~[starrocks-fe.jar:?]
at com.starrocks.sql.ast.QueryStatement.accept(QueryStatement.java:56) ~[starrocks-fe.jar:?]
at com.starrocks.sql.ast.AstVisitor.visit(AstVisitor.java:57) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.AnalyzerVisitor.analyze(AnalyzerVisitor.java:136) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.Analyzer.analyze(Analyzer.java:34) ~[starrocks-fe.jar:?]
at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:78) ~[starrocks-fe.jar:?]
at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:62) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:482) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:375) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:481) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:767) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:842) ~[?:?]
Caused by: com.starrocks.common.DdlException: failed to get file schema, path: s3a://bucket/datawarehouse/tables/dwd_financial_be_fee_mi/dt=202403/
, error: [BE access S3 file failed, SdkResponseCode=403, SdkErrorType=15, SdkErrorMessage=No response body.]
at com.starrocks.catalog.TableFunctionTable.getFileSchema(TableFunctionTable.java:235) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.TableFunctionTable.(TableFunctionTable.java:77) ~[starrocks-fe.jar:?]
at com.starrocks.sql.analyzer.QueryAnalyzer.resolveTableFunctionTable(QueryAnalyzer.java:1002) ~[starrocks-fe.jar:?]
… 28 more

您好,可以在 be.conf 加上 aws_sdk_enable_compliant_rfc3986_encoding=true 试下

已经添加上,并重启了fe:
[root@fe-1 StarRocks-3.1.9]# cat fe/conf/fe.conf | grep aws
aws_sdk_enable_compliant_rfc3986_encoding=true

执行后还是会报错:
SQL 错误 [1064] [42000]: Access storage error. Error message: failed to get file schema, path: s3a://bucket/datawarehouse/tables/dws_financial_expense_be_final_monthly/dt=202403/*, error: [BE access S3 file failed, SdkResponseCode=403, SdkErrorType=15, SdkErrorMessage=No response body.]

把这个url换成不带=的URI,把dt=202403里的文件拷到它的上一级目录中后,s3a://bucket/datawarehouse/tables/dws_financial_expense_be_final_monthly/*就能成功取到数据。


应该是跟分区有关,我用catalog也测试了下,2个表,区别仅仅是一个有分区,一个没分区,其它字段和内容都一模一样。有分区的查询出错,没有分区的查询就正常。具体见上面2个截图,temp_partition是带分区的,查询出错。temp_nopartition是不带分区的,就能查出来数据。

在Hive中这2个表的建表语句见下面的截图,也可以看到temp_partition查询是有数据的: