com.starrocks.connector.hive.events.MetastoreNotificationFetchException: Failed to get next notification based on last event id

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】通过StarRocks中的Catalog访问Hudi中的数据时,StarRocks不能自动的增量的更新hive的元数据信息,表像如下:我通过Flink任务,已经往Hudi的表中写入了一条新的数据,通过Hive的客户端都能查到这条新的数据了,但是通过Hudi 的Catalog去查,第一次查不到,再一次查询时,就可查到这条数据。
【背景】我们在排查时,发现以下一些异常信息:Hive中日志报出来的:
Caused by: com.starrocks.connector.hive.events.MetastoreNotificationFetchException: Failed to get next notification based on last event id: 2652902, msg: null
2023-07-21 17:01:39,142 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|30) [MetastoreEventsProcessor.runAfterCatalogReady():204] Start to pull [[catalog_juslink_hive]] events. resource mapping catalog size [0], normal catalog log size [1]
2023-07-21 17:01:39,142 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|30) [MetastoreEventsProcessor.getNextHMSEvents():111] Start to pull events on catalog [catalog_juslink_hive]
2023-07-21 17:01:39,145 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|30) [HiveMetaStoreThriftClient.open():415] Trying to connect to metastore with URI thrift://cdh2:9083
2023-07-21 17:01:39,145 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|30) [HiveMetaStoreThriftClient.open():495] Opened a connection to metastore, current connections: 2
2023-07-21 17:01:39,146 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|30) [HiveMetaStoreThriftClient.open():550] Connected to metastore.
2023-07-21 17:01:39,358 ERROR (com.starrocks.connector.hive.events.MetastoreEventsProcessor|30) [HiveMetaClient.callRPC():144] Failed to get next notification based on last event id: 2652902
at com.starrocks.connector.hive.events.MetastoreEventsProcessor.getNextHMSEvents(MetastoreEventsProcessor.java:118) [starrocks-fe.jar:?]
at com.starrocks.connector.hive.events.MetastoreEventsProcessor.getNextHMSEvents(MetastoreEventsProcessor.java:143) [starrocks-fe.jar:?]
at com.starrocks.connector.hive.events.MetastoreEventsProcessor.runAfterCatalogReady(MetastoreEventsProcessor.java:210) [starrocks-fe.jar:?]
2023-07-21 17:01:39,358 ERROR (com.starrocks.connector.hive.events.MetastoreEventsProcessor|30) [HiveMetaClient.callRPC():151] An exception occurred when using the current long link to access metastore. msg: Failed to get next notification based on last event id: 2652902
2023-07-21 17:01:39,359 INFO (com.starrocks.connector.hive.events.MetastoreEventsProcessor|30) [HiveMetaStoreThriftClient.close():575] Closed a connection to metastore, current connections: 1
2023-07-21 17:01:39,359 ERROR (com.starrocks.connector.hive.events.MetastoreEventsProcessor|30) [HiveMetastore.getNextEventResponse():177] Unable to fetch notifications from metastore. Last synced event id is 2652902


【业务影响】
【StarRocks版本】2.5.3
【集群规模】4fe,2 follower+1ob+3be be和fe混部

谢谢

我也是这样的错误, 我是从2.4.3升级到2.5.9报出的错误

目前不建议使用event同步元数据了,是否考虑升级到 2.5.5以后的版本,使用 周期性同步元数据的方案 https://docs.starrocks.io/zh-cn/3.1/data_source/catalog/hive_catalog#周期性刷新元数据缓存

请教下 那以前版本的catalog基于event的怎么办

这是本身的bug,还是我们配置有问题导致的呢? 升级到后面的版本,可能需要时间才得行,而且周期更新,默认的时间是10分钟,当这一段时间hudi的数据发生变化,产生的新的parquet文件,StarRocks没有更新到这些信息,查询时是不是也还是会报文件找不到异常呢?

exception上看像是解析hms message json失败导致的。

你们在hive端做过schema change吗?我看exception下面有个column type 不兼容的字段(on_way_qty).


如果StarRocks没有更新到Hive这个部分信息的话,那么就还只是分析之前table下面的那些文件,不会去分析新增的文件,直到StarRocks和HMS metadata信息同步上。

https://github.com/StarRocks/starrocks/pull/27972 可能和这个bugfix 有关系,这个bugfix影响到了2.5和3.0两个版本