StarRocks如何查询Azure Databricks/ADLS Gen2中的Delta Lake 表

【详述】StarRocks如何查询Azure Databricks/ADLS Gen2中的Delta Lake表
【背景】我正在试着使用StarRocks连接Databricks中的Delta Lake表进行分析,但是官方教程上给的示例都需要Hive Metastore,而在Databricks中使用的是Unity Catalog,我们尝试了一些方法配置,没有成功。

我们注意到社区提问中有关此问题无HMS 如何使用Catalog功能?的答复为不支持,但是我同时在社区的文章中看到腾讯实验平台基于 StarRocks 构建湖仓底座 - :sparkles: 精选文章 / 用户案例 这个案例中是支持了Databricks的,于是我们做了许多尝试,包括尝试直接连接存放在Azure Data Lake Storage中的Delta Lake文件,通过Unity Catalog访问。

同时Delta Lake支持Uniform,Databricks也提供了Iceberg Catalog的支持,相关文章Use UniForm to read Delta tables with Iceberg clients 并给出了Snowflake的案例How to Read Unity Catalog Tables in Snowflake,然而我们未能尝试成功,故而向社区求助

【业务影响】我们期望使用StarRocks作为统一的OLAP,如果无法使用Databricks中的数据此方案将被否定
【是否存算分离】是
【StarRocks版本】version info
Version: 3.3.7
Git: 00177de
Build Info: StarRocks@localhost (Ubuntu 22.04.4 LTS)
Build Time: 2024-11-29 09:41:21
【集群规模】1fe+1be(fe与be混部)
【机器信息】24C/64GB
【联系方式】newforesee@outlook.com
【附件】尝试Uniform方式时报错:SQL 错误 [1064] [HY000]: (conn=4560) Not authorized: {“error_code”:401,“message”:“Credential was not sent or was of an unsupported type for this API.”}

连接的操作步骤贴一下?

【按照官网文档】

CREATE EXTERNAL CATALOG deltalake_catalog_hms
PROPERTIES
(
"type" = "deltalake",
"hive.metastore.type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls2.oauth2_client_id" = "<service_client_id>",
"azure.adls2.oauth2_client_secret" = "<service_principal_client_secret>",
"azure.adls2.oauth2_client_endpoint" = "<service_principal_client_endpoint>"
);

官方教程上给的示例都需要Hive Metastore,而在Databricks中使用的是Unity Catalog我们无法获取thrift服务的URI

【REST Catalog方式】

同时Delta Lake支持Uniform,Databricks也提供了Iceberg Catalog的支持,相关文章Use UniForm to read Delta tables with Iceberg clients 并给出了Snowflake的案例How to Read Unity Catalog Tables in Snowflake 1,然而我们未能尝试成功

CREATE EXTERNAL CATALOG 'iceberg'
COMMENT "External catalog to Apache Iceberg on Gen2"
PROPERTIES
(
"type"="iceberg",
"iceberg.catalog.type"="rest",
"iceberg.catalog.uri"="[https://adb-3xxxxxxxxxxxxxxx64.4.azuredatabricks.net/api/2.1/unity-catalog/iceberg"](https://adb-3xxxxxxxxxxxxxxx64.4.azuredatabricks.net/api/2.1/unity-catalog/iceberg%22),
"iceberg.catalog.warehouse"="warehouse",
"azure.blob.access_key" = "a5916818-cxxxxxxxxxxxxxxxxxxxxfxxxxxx1864",
"azure.blob.secret_key"="doseb3xxxxxxxxxxxxxxxxxxxxxxxx791ae",
"azure.blob.endpoint" = "[https://starrocksadlsgen2.blob.core.windows.net/;QueueEndpoint=https://starrocksadlsgen2.queue.core.windows.net/;FileEndpoint=https://starrocksadlsgen2.file.core.windows.net/;TableEndpoint=https://starrocksadlsgen2.table.core.windows.net/;SharedAccessSignature=sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupyx&se=2025-12-10T14:23:23Z&st=2024-12-10T06:23:23Z&spr=https,http&sig=NsdnvDAIP5xxxxxxxxxxxxxxxxxxxxxxxxxxxxNzBtDs%3D"](https://starrocksadlsgen2.blob.core.windows.net/;QueueEndpoint=https://starrocksadlsgen2.queue.core.windows.net/;FileEndpoint=https://starrocksadlsgen2.file.core.windows.net/;TableEndpoint=https://starrocksadlsgen2.table.core.windows.net/;SharedAccessSignature=sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupyx&se=2025-12-10T14:23:23Z&st=2024-12-10T06:23:23Z&spr=https,http&sig=NsdnvDAIP5xxxxxxxxxxxxxxxxxxxxxxxxxxxxNzBtDs%3D%22),
"azure.blob.enable_path_style_access"="true",
"azure.blob.shared_key" = "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupyx&se=2025-12-10T14:23:23Z&st=2024-12-10T06:23:23Z&spr=https,http&sig=NsdnxxxxxxxxxxxxxxxxxxxxxxNzBtDs%3D",
"client.factory"="com.starrocks.connector.iceberg.IcebergAwsClientFactory"
);

SHOW CATALOGS;

DROP catalog iceberg_test;

SET CATALOG iceberg;

SHOW DATABASES;

报错:

执行:SHOW DATABASES;
SQL 错误 [1064] [HY000]: (conn=4560) Not authorized: {"error_code":401,"message":"Credential was not sent or was of an unsupported type for this API."}

【JDBC catalog】

同时我们发现StarRocks官方文档中有JDBC Catalog相关支持。以下是我们通过jdbc连接Databricks SQL warehouse进行尝试,从官方文档看没有对databricks jdbc的支持,我们尝试了一下无法加载databricks驱动

CREATE EXTERNAL CATALOG jdbc
PROPERTIES
(
"type"="jdbc",
"user"="token",
"password"="dapia59xxxxxxxxxxxxxxxxxbc2e-3",
"jdbc_uri"="jdbc:databricks://adb-567856787656787.3.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/78e8ed94bafdc448;",
"driver_url"="[https://repo1.maven.org/maven2/com/databricks/databricks-jdbc/2.6.40/databricks-jdbc-2.6.40.jar"](https://repo1.maven.org/maven2/com/databricks/databricks-jdbc/2.6.40/databricks-jdbc-2.6.40.jar%22),
"driver_class"="com.databricks.client.jdbc.Driver"
);

SHOW CATALOGS;

DROP catalog jdbc;

SET CATALOG jdbc;

SHOW DATABASES FROM jdbc;

报错:

执行:SHOW DATABASES FROM jdbc;
SQL 错误 [1064] [HY000]: (conn=10) doesn't find class: com.databricks.client.jdbc.Driver

以上是我们尝试的步骤,求助如何查询databricks中的delta table