【不用看了是误操作】基于hive分区表建starrocks的视图,然后再基于视图建分区物化视图,源端hive覆写分区后,starrocks做refresh external table和refresh mv后无法更新mv中的数据,感觉元数据同步有问题,但是不知道怎么看是否有问题

基于hive分区表建starrocks的视图,然后再基于视图建分区物化视图,源端hive覆写分区后,starrocks做refresh external table和refresh mv后无法更新mv中的数据,这个我以前测都是没问题的,mv都能刷新到新的数据的,不知道为什么现在不行了

上周我改动过hive端的hive-site.xml,里面增加了starrocks文档 https://docs.starrocks.io/zh/docs/data_source/catalog/hive_catalog/#自动增量更新 里面写的实时同步hive catalog的配置参数,重启了hive,但是当时没把hive-site.xml复制到fe和be的conf目录,好像是今天中午把新的hive-site.xml复制到fe和be的conf目录才这样的

'root'@(none) Tue Jan  9 18:08:29 2024>show create view default_catalog.hiveviewtestdb.v821\G
*************************** 1. row ***************************
                View: v821
         Create View: CREATE VIEW `v821` (`p_brand`, `LO_ORDERDATE`, `revenue_sum`) AS SELECT `hive`.`hivessbtest`.`p`.`p_brand`, `hive`.`hivessbtest`.`l`.`LO_ORDERDATE`, sum(`hive`.`hivessbtest`.`l`.`LO_REVENUE`) AS `revenue_sum`
FROM `hive`.`hivessbtest`.`P_LINEORDER` AS `l` LEFT OUTER JOIN `hive`.`hivessbtest`.`part` AS `p` ON `hive`.`hivessbtest`.`l`.`LO_PARTKEY` = `hive`.`hivessbtest`.`p`.`P_PARTKEY`
GROUP BY `hive`.`hivessbtest`.`p`.`p_brand`, `hive`.`hivessbtest`.`l`.`LO_ORDERDATE`;
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.01 sec)

'root'@(none) Tue Jan  9 18:08:57 2024>show create view default_catalog.hiveviewtestdb.mv8211\G
*************************** 1. row ***************************
                View: mv8211
         Create View: CREATE VIEW `mv8211` AS SELECT `hiveviewtestdb`.`v821`.`p_brand`, `hiveviewtestdb`.`v821`.`LO_ORDERDATE`, `hiveviewtestdb`.`v821`.`revenue_sum`
FROM `hiveviewtestdb`.`v821`
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.01 sec)

hive 使用insert overwrite写入2个分区

0: jdbc:hive2://music-impala-olap-10.gy.ntes:> INSERT overwrite TABLE P_LINEORDER PARTITION(LO_ORDERDATE)
. . . . . . . . . . . . . . . . . . . . . . .> SELECT LO_ORDERKEY,     
. . . . . . . . . . . . . . . . . . . . . . .> LO_LINENUMBER,
. . . . . . . . . . . . . . . . . . . . . . .> LO_CUSTKEY,  
. . . . . . . . . . . . . . . . . . . . . . .> LO_PARTKEY,     
. . . . . . . . . . . . . . . . . . . . . . .> LO_SUPPKEY,   
. . . . . . . . . . . . . . . . . . . . . . .> LO_ORDERPRIOTITY,
. . . . . . . . . . . . . . . . . . . . . . .> LO_SHIPPRIOTITY, 
. . . . . . . . . . . . . . . . . . . . . . .> LO_QUANTITY,     
. . . . . . . . . . . . . . . . . . . . . . .> LO_EXTENDEDPRICE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_ORDTOTALPRICE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_DISCOUNT,     
. . . . . . . . . . . . . . . . . . . . . . .> LO_REVENUE,      
. . . . . . . . . . . . . . . . . . . . . . .> LO_SUPPLYCOST,   
. . . . . . . . . . . . . . . . . . . . . . .> LO_TAX,          
. . . . . . . . . . . . . . . . . . . . . . .> LO_COMMITDATE,   
. . . . . . . . . . . . . . . . . . . . . . .> LO_SHIPMODE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_EXTENDEDPRICE * LO_DISCOUNT AS V_REVENUE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_ORDERDATE
. . . . . . . . . . . . . . . . . . . . . . .> FROM LINEORDER
. . . . . . . . . . . . . . . . . . . . . . .> where LO_ORDERDATE in (19990201, 19991101)
. . . . . . . . . . . . . . . . . . . . . . .> DISTRIBUTE BY LO_ORDERDATE;
No rows affected (76.787 seconds)

此时元数据还没同步,explain显示查询视图命中物化视图

'root'@(none) Tue Jan  9 17:58:44 2024>explain select    p_brand,    sum(revenue_sum) as revenue_sum_sum  from    default_catalog.hiveviewtestdb.v821  group by    p_brand;
+-----------------------------------------------------------------------------------------------------+
| Explain String                                                                                      |
+-----------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                     |
|  OUTPUT EXPRS:23: p_brand | 29: sum                                                                 |
|   PARTITION: UNPARTITIONED                                                                          |
|                                                                                                     |
|   RESULT SINK                                                                                       |
|                                                                                                     |
|   5:EXCHANGE                                                                                        |
|                                                                                                     |
| PLAN FRAGMENT 1                                                                                     |
|  OUTPUT EXPRS:                                                                                      |
|   PARTITION: HASH_PARTITIONED: 30: p_brand                                                          |
|                                                                                                     |
|   STREAM DATA SINK                                                                                  |
|     EXCHANGE ID: 05                                                                                 |
|     UNPARTITIONED                                                                                   |
|                                                                                                     |
|   4:Project                                                                                         |
|   |  <slot 23> : 30: p_brand                                                                        |
|   |  <slot 29> : 33: sum                                                                            |
|   |  <slot 33> : clone(33: sum)                                                                     |
|   |                                                                                                 |
|   3:AGGREGATE (merge finalize)                                                                      |
|   |  output: sum(33: sum)                                                                           |
|   |  group by: 30: p_brand                                                                          |
|   |                                                                                                 |
|   2:EXCHANGE                                                                                        |
|                                                                                                     |
| PLAN FRAGMENT 2                                                                                     |
|  OUTPUT EXPRS:                                                                                      |
|   PARTITION: RANDOM                                                                                 |
|                                                                                                     |
|   STREAM DATA SINK                                                                                  |
|     EXCHANGE ID: 02                                                                                 |
|     HASH_PARTITIONED: 30: p_brand                                                                   |
|                                                                                                     |
|   1:AGGREGATE (update serialize)                                                                    |
|   |  STREAMING                                                                                      |
|   |  output: sum(32: revenue_sum)                                                                   |
|   |  group by: 30: p_brand                                                                          |
|   |                                                                                                 |
|   0:OlapScanNode                                                                                    |
|      TABLE: mv8211                                                                                  |
|      PREAGGREGATION: ON                                                                             |
|      partitions=2406/2406                                                                           |
|      rollup: mv8211                                                                                 |
|      tabletRatio=24060/24060                                                                        |
|      tabletList=3779146,3779148,3779150,3779152,3779154,3779156,3779158,3779160,3779162,3779164 ... |
|      cardinality=2203395                                                                            |
|      avgRowSize=8.043875                                                                            |
|      MaterializedView: true                                                                         |
+-----------------------------------------------------------------------------------------------------+
50 rows in set (0.79 sec)

这里强制同步外表的元数据

'root'@(none) Tue Jan  9 17:59:11 2024>refresh external table hive.hivessbtest.p_lineorder;
Query OK, 0 rows affected (4.42 sec)

再次查看explain,发现还是能完全命中物化视图,这里不符合预期

'root'@(none) Tue Jan  9 17:59:19 2024>explain select    p_brand,    sum(revenue_sum) as revenue_sum_sum  from    default_catalog.hiveviewtestdb.v821  group by    p_brand;
+-----------------------------------------------------------------------------------------------------+
| Explain String                                                                                      |
+-----------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                     |
|  OUTPUT EXPRS:23: p_brand | 29: sum                                                                 |
|   PARTITION: UNPARTITIONED                                                                          |
|                                                                                                     |
|   RESULT SINK                                                                                       |
|                                                                                                     |
|   5:EXCHANGE                                                                                        |
|                                                                                                     |
| PLAN FRAGMENT 1                                                                                     |
|  OUTPUT EXPRS:                                                                                      |
|   PARTITION: HASH_PARTITIONED: 30: p_brand                                                          |
|                                                                                                     |
|   STREAM DATA SINK                                                                                  |
|     EXCHANGE ID: 05                                                                                 |
|     UNPARTITIONED                                                                                   |
|                                                                                                     |
|   4:Project                                                                                         |
|   |  <slot 23> : 30: p_brand                                                                        |
|   |  <slot 29> : 33: sum                                                                            |
|   |  <slot 33> : clone(33: sum)                                                                     |
|   |                                                                                                 |
|   3:AGGREGATE (merge finalize)                                                                      |
|   |  output: sum(33: sum)                                                                           |
|   |  group by: 30: p_brand                                                                          |
|   |                                                                                                 |
|   2:EXCHANGE                                                                                        |
|                                                                                                     |
| PLAN FRAGMENT 2                                                                                     |
|  OUTPUT EXPRS:                                                                                      |
|   PARTITION: RANDOM                                                                                 |
|                                                                                                     |
|   STREAM DATA SINK                                                                                  |
|     EXCHANGE ID: 02                                                                                 |
|     HASH_PARTITIONED: 30: p_brand                                                                   |
|                                                                                                     |
|   1:AGGREGATE (update serialize)                                                                    |
|   |  STREAMING                                                                                      |
|   |  output: sum(32: revenue_sum)                                                                   |
|   |  group by: 30: p_brand                                                                          |
|   |                                                                                                 |
|   0:OlapScanNode                                                                                    |
|      TABLE: mv8211                                                                                  |
|      PREAGGREGATION: ON                                                                             |
|      partitions=2406/2406                                                                           |
|      rollup: mv8211                                                                                 |
|      tabletRatio=24060/24060                                                                        |
|      tabletList=3779146,3779148,3779150,3779152,3779154,3779156,3779158,3779160,3779162,3779164 ... |
|      cardinality=2203395                                                                            |
|      avgRowSize=8.043875                                                                            |
|      MaterializedView: true                                                                         |
+-----------------------------------------------------------------------------------------------------+
50 rows in set (0.53 sec)

手工刷新物化视图

'root'@(none) Tue Jan  9 17:59:31 2024>refresh materialized view default_catalog.hiveviewtestdb.mv8211;
+--------------------------------------+
| QUERY_ID                             |
+--------------------------------------+
| ce8b0b53-aed5-11ee-af3a-1070fd5198c8 |
+--------------------------------------+
1 row in set (0.00 sec)

物化视图没有更新任何分区,因为EXTRA_MESSAGE里面没有显示有分区内容

'root'@(none) Tue Jan  9 17:59:41 2024>select * from default_catalog.information_schema.task_runs where `DATABASE` = 'hiveviewtestdb' and CREATE_TIME  >= '2024-01-09 09:00:00' and DEFINITION like '%`mv8211`%' order by create_time desc limit 1\G
*************************** 1. row ***************************
     QUERY_ID: ce8b0b53-aed5-11ee-af3a-1070fd5198c8
    TASK_NAME: mv-1001381
  CREATE_TIME: 2024-01-09 17:59:41
  FINISH_TIME: 2024-01-09 17:59:44
        STATE: SUCCESS
     DATABASE: hiveviewtestdb
   DEFINITION: insert overwrite `mv8211` SELECT `hiveviewtestdb`.`v821`.`p_brand`, `hiveviewtestdb`.`v821`.`LO_ORDERDATE`, `hiveviewtestdb`.`v821`.`revenue_sum`
FROM `hiveviewtestdb`.`v821`
  EXPIRE_TIME: 2024-01-10 17:59:41
   ERROR_CODE: 0
ERROR_MESSAGE: NULL
     PROGRESS: 100%
EXTRA_MESSAGE: {"forceRefresh":false,"mvPartitionsToRefresh":[],"refBasePartitionsToRefreshMap":{},"basePartitionsToRefreshMap":{}}
   PROPERTIES: {"FORCE":"false"}
1 row in set (1.84 sec)

再次explain,还是命中物化视图,这里应该做hive和物化视图的分区union的

'root'@(none) Tue Jan  9 18:00:15 2024>explain select    p_brand,    sum(revenue_sum) as revenue_sum_sum  from    default_catalog.hiveviewtestdb.v821  group by    p_brand;
+-----------------------------------------------------------------------------------------------------+
| Explain String                                                                                      |
+-----------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                     |
|  OUTPUT EXPRS:23: p_brand | 29: sum                                                                 |
|   PARTITION: UNPARTITIONED                                                                          |
|                                                                                                     |
|   RESULT SINK                                                                                       |
|                                                                                                     |
|   5:EXCHANGE                                                                                        |
|                                                                                                     |
| PLAN FRAGMENT 1                                                                                     |
|  OUTPUT EXPRS:                                                                                      |
|   PARTITION: HASH_PARTITIONED: 30: p_brand                                                          |
|                                                                                                     |
|   STREAM DATA SINK                                                                                  |
|     EXCHANGE ID: 05                                                                                 |
|     UNPARTITIONED                                                                                   |
|                                                                                                     |
|   4:Project                                                                                         |
|   |  <slot 23> : 30: p_brand                                                                        |
|   |  <slot 29> : 33: sum                                                                            |
|   |  <slot 33> : clone(33: sum)                                                                     |
|   |                                                                                                 |
|   3:AGGREGATE (merge finalize)                                                                      |
|   |  output: sum(33: sum)                                                                           |
|   |  group by: 30: p_brand                                                                          |
|   |                                                                                                 |
|   2:EXCHANGE                                                                                        |
|                                                                                                     |
| PLAN FRAGMENT 2                                                                                     |
|  OUTPUT EXPRS:                                                                                      |
|   PARTITION: RANDOM                                                                                 |
|                                                                                                     |
|   STREAM DATA SINK                                                                                  |
|     EXCHANGE ID: 02                                                                                 |
|     HASH_PARTITIONED: 30: p_brand                                                                   |
|                                                                                                     |
|   1:AGGREGATE (update serialize)                                                                    |
|   |  STREAMING                                                                                      |
|   |  output: sum(32: revenue_sum)                                                                   |
|   |  group by: 30: p_brand                                                                          |
|   |                                                                                                 |
|   0:OlapScanNode                                                                                    |
|      TABLE: mv8211                                                                                  |
|      PREAGGREGATION: ON                                                                             |
|      partitions=2406/2406                                                                           |
|      rollup: mv8211                                                                                 |
|      tabletRatio=24060/24060                                                                        |
|      tabletList=3779146,3779148,3779150,3779152,3779154,3779156,3779158,3779160,3779162,3779164 ... |
|      cardinality=2203395                                                                            |
|      avgRowSize=8.043875                                                                            |
|      MaterializedView: true                                                                         |
+-----------------------------------------------------------------------------------------------------+
50 rows in set (1.00 sec)

不查View,使用View的SQL直接查hive表,会命中物化视图:

'root'@hivessbtest Tue Jan  9 20:23:22 2024>explain
    -> SELECT hive.hivessbtest.p.p_brand, hive.hivessbtest.l.LO_ORDERDATE, sum(hive.hivessbtest.l.LO_REVENUE) AS revenue_sum
    -> FROM hive.hivessbtest.P_LINEORDER AS l LEFT OUTER JOIN hive.hivessbtest.part AS p ON hive.hivessbtest.l.LO_PARTKEY = hive.hivessbtest.p.P_PARTKEY
    -> GROUP BY hive.hivessbtest.p.p_brand, hive.hivessbtest.l.LO_ORDERDATE;
+-----------------------------------------------------------------------------------------------------+
| Explain String                                                                                      |
+-----------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                     |
|  OUTPUT EXPRS:23: p_brand | 18: lo_orderdate | 28: sum                                              |
|   PARTITION: UNPARTITIONED                                                                          |
|                                                                                                     |
|   RESULT SINK                                                                                       |
|                                                                                                     |
|   2:EXCHANGE                                                                                        |
|                                                                                                     |
| PLAN FRAGMENT 1                                                                                     |
|  OUTPUT EXPRS:                                                                                      |
|   PARTITION: RANDOM                                                                                 |
|                                                                                                     |
|   STREAM DATA SINK                                                                                  |
|     EXCHANGE ID: 02                                                                                 |
|     UNPARTITIONED                                                                                   |
|                                                                                                     |
|   1:Project                                                                                         |
|   |  <slot 18> : 30: LO_ORDERDATE                                                                   |
|   |  <slot 23> : 29: p_brand                                                                        |
|   |  <slot 28> : 31: revenue_sum                                                                    |
|   |                                                                                                 |
|   0:OlapScanNode                                                                                    |
|      TABLE: mv8211                                                                                  |
|      PREAGGREGATION: ON                                                                             |
|      partitions=2406/2406                                                                           |
|      rollup: mv8211                                                                                 |
|      tabletRatio=24060/24060                                                                        |
|      tabletList=3779146,3779148,3779150,3779152,3779154,3779156,3779158,3779160,3779162,3779164 ... |
|      cardinality=1654073                                                                            |
|      avgRowSize=6.0                                                                                 |
|      MaterializedView: true                                                                         |
+-----------------------------------------------------------------------------------------------------+
31 rows in set (0.40 sec)

查hive基表,可以查


'root'@hivessbtest Tue Jan  9 20:27:46 2024>explain select lo_suppkey, count(*) from hive.hivessbtest.P_LINEORDER group by lo_suppkey;
+----------------------------------------------+
| Explain String                               |
+----------------------------------------------+
| PLAN FRAGMENT 0                              |
|  OUTPUT EXPRS:5: lo_suppkey | 19: count      |
|   PARTITION: UNPARTITIONED                   |
|                                              |
|   RESULT SINK                                |
|                                              |
|   4:EXCHANGE                                 |
|                                              |
| PLAN FRAGMENT 1                              |
|  OUTPUT EXPRS:                               |
|   PARTITION: HASH_PARTITIONED: 5: lo_suppkey |
|                                              |
|   STREAM DATA SINK                           |
|     EXCHANGE ID: 04                          |
|     UNPARTITIONED                            |
|                                              |
|   3:AGGREGATE (merge finalize)               |
|   |  output: count(19: count)                |
|   |  group by: 5: lo_suppkey                 |
|   |                                          |
|   2:EXCHANGE                                 |
|                                              |
| PLAN FRAGMENT 2                              |
|  OUTPUT EXPRS:                               |
|   PARTITION: RANDOM                          |
|                                              |
|   STREAM DATA SINK                           |
|     EXCHANGE ID: 02                          |
|     HASH_PARTITIONED: 5: lo_suppkey          |
|                                              |
|   1:AGGREGATE (update serialize)             |
|   |  STREAMING                               |
|   |  output: count(*)                        |
|   |  group by: 5: lo_suppkey                 |
|   |                                          |
|   0:HdfsScanNode                             |
|      TABLE: p_lineorder                      |
|      partitions=2406/2406                    |
|      cardinality=6001172                     |
|      avgRowSize=1.0                          |
+----------------------------------------------+
40 rows in set (0.35 sec)


'root'@hivessbtest Tue Jan  9 20:27:52 2024>select lo_suppkey, count(*) from hive.hivessbtest.P_LINEORDER group by lo_suppkey;
...
...
...
|        558 |     3004 |
|        720 |     3042 |
|        393 |     3064 |
+------------+----------+
2000 rows in set (5.24 sec)


【背景】做过哪些操作?
没有做过操作
【业务影响】
未上线业务
【是否存算分离】

【StarRocks版本】3.2.2
【集群规模】
3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,96C 512G
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群4-小李或者邮箱,谢谢

抱歉是我搞错了,应该是我在hive执行insert … select where LO_ORDERDATE in (19990201, 19991101)的时候,where LO_ORDERDATE in (19990201, 19991101)读不到数据,因为表里面没有这2个分区。

没问题就好。