hive catalog 无法实现分区自动同步

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】期望使用starrocks-实现对数据湖做联邦查询,根据官方配置提示,配置了hive-site.xml文件,重启hivemeta store服务,同时开启全局元数据同步和单个catalog同步,重启了FE服务,对于表中新增的分区,starocks无法正常识别
【背景】1、修改hive-site.xml,重启服务(hive、fe)2、在hive中使用beeline进行建库、创建分区表
1、 create database demo;
2、 use demo;
3、 create table t_demo1(id int) partitioned by(dt date) stored as orc; //可以正常同步
4、 insert into t_demo1 values(2,‘2012-10-26’); //可以正常同步
5、 insert into t_demo1 values(3,‘2012-10-27’);//无法正常同步

第3、4步骤元数据可以正常同步,但是第5步骤的新增分区在starrocks中无法显示

【业务影响】
【StarRocks版本】例如:3.1.3-384ba23
【集群规模】例如:fe与be混部 单机
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】Starocks社区群16-中洲
【附件】
1)starocks配置


starrocks查询
2)hive中执行

必须执行refresh external table 指令才能够识别新增的分区

FE 日志文件中搜索 event id ,然后通过查看事件 ID 来检查事件监听器是否配置成功。如果配置失败,则所有 event id 均为 0

另外不refresh external table的话,5秒之后能正常同步吗?

refresh external table 是可以正常同步的。


首次创建的新表和分区是可以的,后续再创建分区和写入数据就看不到了,一下是我的hive-site.xml文件的配置信息:

<configuration>
   <property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://ip:3306/hive?createDatabaseIfNotExist=true</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>***</value>
</property>
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://CentOS:9083</value>
</property>

<property>
<name>hive.server2.active.passive.ha.enable</name>
<value>true</value>
</property>
<!--监听-->
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.notifications.add.thrift.objects</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.alter.notifications.basic</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.dml.events</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.transactional.event.listeners</name>
<value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
</property>
<property>
<name>hive.metastore.event.db.listener.timetolive</name>
<value>172800s</value>
</property>
<property>
<name>hive.metastore.server.max.message.size</name>
<value>858993459</value>
</property>
</configuration>

下面是catalog的ddl语句

mysql> show create catalog hive_catalog_hms;
+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Catalog          | Create Catalog                                                                                                                                                                                                  |
+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| hive_catalog_hms | CREATE EXTERNAL CATALOG `hive_catalog_hms`
PROPERTIES ("hive.metastore.type"  =  "hive",
"hive.metastore.uris"  =  "thrift://CentOS:9083",
"type"  =  "hive",
"enable_hms_events_incremental_sync"  =  "true"
) |
+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

hive中的sql脚本

0: jdbc:hive2://CentOS:10000> create database demo;
No rows affected (0.089 seconds)
0: jdbc:hive2://CentOS:10000> use demo;
No rows affected (0.047 seconds)
0: jdbc:hive2://CentOS:10000>  create table t_demo1(id int) partitioned by(dt date) stored as orc;//可以识别
No rows affected (0.092 seconds)
0: jdbc:hive2://CentOS:10000> insert into t_demo1 values(2,'2012-10-26');//可以识别
No rows affected (31.987 seconds)
0: jdbc:hive2://CentOS:10000> insert into t_demo1 values(3,'2012-10-27');//等待多久,都无法识别,必须执行refresh external table才可以同步
No rows affected (31.128 seconds)
0: jdbc:hive2://CentOS:10000>

后台能看到日志event id日志

2023-10-13 20:06:47,790 WARN (hms-event-processor-executor-2|1804) [AlterPartitionEvent.process():108] Partition [Catalog: [hive_catalog_hms], Table: [demo.t_demo1]. Partition name: [HivePartitionName{databaseName='demo', tableName='t_demo1', partitionValues=[2012-10-27], partitionNames=Optional[dt=2012-10-27]}] ] doesn't exist in cache on event id [53]
2023-10-13 20:06:47,790 WARN (hms-event-processor-executor-2|1804) [AlterPartitionEvent.process():108] Partition [Catalog: [hive_catalog_hms], Table: [demo.t_demo1]. Partition name: [HivePartitionName{databaseName='demo', tableName='t_demo1', partitionValues=[2012-10-27], partitionNames=Optional[dt=2012-10-27]}] ] doesn't exist in cache on event id [53]
2023-10-13 20:07:40,762 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|29) [HiveMetastore.getNextEventResponse():206] Received 2 events. Start event id : 54. Last synced id : 53 on catalog : resource_mapping_inside_catalog_hive_hive0
2023-10-13 20:07:40,771 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|29) [HiveMetastore.getNextEventResponse():206] Received 2 events. Start event id : 54. Last synced id : 53 on catalog : hive_catalog_hms
2023-10-13 20:07:40,772 WARN (hms-event-processor-executor-0|181) [AlterPartitionEvent.process():108] Partition [Catalog: [hive_catalog_hms], Table: [demo.t_demo1]. Partition name: [HivePartitionName{databaseName='demo', tableName='t_demo1', partitionValues=[2012-10-28], partitionNames=Optional[dt=2012-10-28]}] ] doesn't exist in cache on event id [55]
2023-10-13 20:07:40,772 WARN (hms-event-processor-executor-0|181) [AlterPartitionEvent.process():108] Partition [Catalog: [hive_catalog_hms], Table: [demo.t_demo1]. Partition name: [HivePartitionName{databaseName='demo', tableName='t_demo1', partitionValues=[2012-10-28], partitionNames=Optional[dt=2012-10-28]}] ] doesn't exist in cache on event id [55]
2023-10-13 20:09:46,095 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|29) [HiveMetastore.getNextEventResponse():206] Received 2 events. Start event id : 56. Last synced id : 55 on catalog : resource_mapping_inside_catalog_hive_hive0
2023-10-13 20:09:46,103 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|29) [HiveMetastore.getNextEventResponse():206] Received 2 events. Start event id : 56. Last synced id : 55 on catalog : hive_catalog_hms
2023-10-13 20:09:46,104 WARN (hms-event-processor-executor-3|1837) [AlterPartitionEvent.process():108] Partition [Catalog: [hive_catalog_hms], Table: [demo.t_demo1]. Partition name: [HivePartitionName{databaseName='demo', tableName='t_demo1', partitionValues=[2012-10-29], partitionNames=Optional[dt=2012-10-29]}] ] doesn't exist in cache on event id [57]
2023-10-13 20:09:46,104 WARN (hms-event-processor-executor-3|1837) [AlterPartitionEvent.process():108] Partition [Catalog: [hive_catalog_hms], Table: [demo.t_demo1]. Partition name: [HivePartitionName{databaseName='demo', tableName='t_demo1', partitionValues=[2012-10-29], partitionNames=Optional[dt=2012-10-29]}] ] doesn't exist in cache on event id [57]

但是查询starocks如下

C:\Program Files\MySQL\MySQL Server 8.0\bin>mysql.exe -h CentOS -P9030 -uroot -proot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 87
Server version: 5.1.0 3.1.3-384ba23

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>
mysql> show catalogs;
+------------------+----------+------------------------------------------------------------------+
| Catalog          | Type     | Comment                                                          |
+------------------+----------+------------------------------------------------------------------+
| default_catalog  | Internal | An internal catalog contains this cluster's self-managed tables. |
| hive_catalog_hms | Hive     | NULL                                                             |
| hudi_catalog_hms | Hudi     | NULL                                                             |
+------------------+----------+------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql> show create catalog  hive_catalog_hms;
+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Catalog          | Create Catalog                                                                                                                                                                                                  |
+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| hive_catalog_hms | CREATE EXTERNAL CATALOG `hive_catalog_hms`
PROPERTIES ("hive.metastore.type"  =  "hive",
"hive.metastore.uris"  =  "thrift://CentOS:9083",
"type"  =  "hive",
"enable_hms_events_incremental_sync"  =  "true"
) |
+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> set catalog hive_catalog_hms;
Query OK, 0 rows affected (0.00 sec)

mysql> show databases;
+----------+
| Database |
+----------+
| default  |
| demo     |
| jiangzz  |
| jtlas    |
| ods      |
| stg      |
+----------+
6 rows in set (0.01 sec)

mysql> use demo;
Database changed
mysql> show tables;
+----------------+
| Tables_in_demo |
+----------------+
| t_demo1        |
+----------------+
1 row in set (0.01 sec)

mysql> select * from t_demo1;
+------+------------+
| id   | dt         |
+------+------------+
|    2 | 2012-10-26 |
+------+------------+
1 row in set (0.10 sec)

mysql>

fe.log里有比如Unable to fetch notifications from metastore的ERROR日志吗,或者可以的话能提供一下压缩后的fe.log吗?
另外当前不太推荐使用元数据的自动同步了,更推荐使用周期性刷新元数据缓存,https://docs.starrocks.io/zh-cn/latest/data_source/catalog/hive_catalog#周期性刷新元数据缓存