基于hive分区表建starrocks的视图,然后再基于视图建分区物化视图,源端hive覆写分区后,starrocks做refresh external table和refresh mv后无法更新mv中的数据,这个我以前测都是没问题的,mv都能刷新到新的数据的,不知道为什么现在不行了
上周我改动过hive端的hive-site.xml,里面增加了starrocks文档 https://docs.starrocks.io/zh/docs/data_source/catalog/hive_catalog/#自动增量更新 里面写的实时同步hive catalog的配置参数,重启了hive,但是当时没把hive-site.xml复制到fe和be的conf目录,好像是今天中午把新的hive-site.xml复制到fe和be的conf目录才这样的
'root'@(none) Tue Jan 9 18:08:29 2024>show create view default_catalog.hiveviewtestdb.v821\G
*************************** 1. row ***************************
View: v821
Create View: CREATE VIEW `v821` (`p_brand`, `LO_ORDERDATE`, `revenue_sum`) AS SELECT `hive`.`hivessbtest`.`p`.`p_brand`, `hive`.`hivessbtest`.`l`.`LO_ORDERDATE`, sum(`hive`.`hivessbtest`.`l`.`LO_REVENUE`) AS `revenue_sum`
FROM `hive`.`hivessbtest`.`P_LINEORDER` AS `l` LEFT OUTER JOIN `hive`.`hivessbtest`.`part` AS `p` ON `hive`.`hivessbtest`.`l`.`LO_PARTKEY` = `hive`.`hivessbtest`.`p`.`P_PARTKEY`
GROUP BY `hive`.`hivessbtest`.`p`.`p_brand`, `hive`.`hivessbtest`.`l`.`LO_ORDERDATE`;
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.01 sec)
'root'@(none) Tue Jan 9 18:08:57 2024>show create view default_catalog.hiveviewtestdb.mv8211\G
*************************** 1. row ***************************
View: mv8211
Create View: CREATE VIEW `mv8211` AS SELECT `hiveviewtestdb`.`v821`.`p_brand`, `hiveviewtestdb`.`v821`.`LO_ORDERDATE`, `hiveviewtestdb`.`v821`.`revenue_sum`
FROM `hiveviewtestdb`.`v821`
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.01 sec)
hive 使用insert overwrite写入2个分区
0: jdbc:hive2://music-impala-olap-10.gy.ntes:> INSERT overwrite TABLE P_LINEORDER PARTITION(LO_ORDERDATE)
. . . . . . . . . . . . . . . . . . . . . . .> SELECT LO_ORDERKEY,
. . . . . . . . . . . . . . . . . . . . . . .> LO_LINENUMBER,
. . . . . . . . . . . . . . . . . . . . . . .> LO_CUSTKEY,
. . . . . . . . . . . . . . . . . . . . . . .> LO_PARTKEY,
. . . . . . . . . . . . . . . . . . . . . . .> LO_SUPPKEY,
. . . . . . . . . . . . . . . . . . . . . . .> LO_ORDERPRIOTITY,
. . . . . . . . . . . . . . . . . . . . . . .> LO_SHIPPRIOTITY,
. . . . . . . . . . . . . . . . . . . . . . .> LO_QUANTITY,
. . . . . . . . . . . . . . . . . . . . . . .> LO_EXTENDEDPRICE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_ORDTOTALPRICE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_DISCOUNT,
. . . . . . . . . . . . . . . . . . . . . . .> LO_REVENUE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_SUPPLYCOST,
. . . . . . . . . . . . . . . . . . . . . . .> LO_TAX,
. . . . . . . . . . . . . . . . . . . . . . .> LO_COMMITDATE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_SHIPMODE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_EXTENDEDPRICE * LO_DISCOUNT AS V_REVENUE,
. . . . . . . . . . . . . . . . . . . . . . .> LO_ORDERDATE
. . . . . . . . . . . . . . . . . . . . . . .> FROM LINEORDER
. . . . . . . . . . . . . . . . . . . . . . .> where LO_ORDERDATE in (19990201, 19991101)
. . . . . . . . . . . . . . . . . . . . . . .> DISTRIBUTE BY LO_ORDERDATE;
No rows affected (76.787 seconds)
此时元数据还没同步,explain显示查询视图命中物化视图
'root'@(none) Tue Jan 9 17:58:44 2024>explain select p_brand, sum(revenue_sum) as revenue_sum_sum from default_catalog.hiveviewtestdb.v821 group by p_brand;
+-----------------------------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:23: p_brand | 29: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 5:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 30: p_brand |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 05 |
| UNPARTITIONED |
| |
| 4:Project |
| | <slot 23> : 30: p_brand |
| | <slot 29> : 33: sum |
| | <slot 33> : clone(33: sum) |
| | |
| 3:AGGREGATE (merge finalize) |
| | output: sum(33: sum) |
| | group by: 30: p_brand |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 30: p_brand |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(32: revenue_sum) |
| | group by: 30: p_brand |
| | |
| 0:OlapScanNode |
| TABLE: mv8211 |
| PREAGGREGATION: ON |
| partitions=2406/2406 |
| rollup: mv8211 |
| tabletRatio=24060/24060 |
| tabletList=3779146,3779148,3779150,3779152,3779154,3779156,3779158,3779160,3779162,3779164 ... |
| cardinality=2203395 |
| avgRowSize=8.043875 |
| MaterializedView: true |
+-----------------------------------------------------------------------------------------------------+
50 rows in set (0.79 sec)
这里强制同步外表的元数据
'root'@(none) Tue Jan 9 17:59:11 2024>refresh external table hive.hivessbtest.p_lineorder;
Query OK, 0 rows affected (4.42 sec)
再次查看explain,发现还是能完全命中物化视图,这里不符合预期
'root'@(none) Tue Jan 9 17:59:19 2024>explain select p_brand, sum(revenue_sum) as revenue_sum_sum from default_catalog.hiveviewtestdb.v821 group by p_brand;
+-----------------------------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:23: p_brand | 29: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 5:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 30: p_brand |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 05 |
| UNPARTITIONED |
| |
| 4:Project |
| | <slot 23> : 30: p_brand |
| | <slot 29> : 33: sum |
| | <slot 33> : clone(33: sum) |
| | |
| 3:AGGREGATE (merge finalize) |
| | output: sum(33: sum) |
| | group by: 30: p_brand |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 30: p_brand |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(32: revenue_sum) |
| | group by: 30: p_brand |
| | |
| 0:OlapScanNode |
| TABLE: mv8211 |
| PREAGGREGATION: ON |
| partitions=2406/2406 |
| rollup: mv8211 |
| tabletRatio=24060/24060 |
| tabletList=3779146,3779148,3779150,3779152,3779154,3779156,3779158,3779160,3779162,3779164 ... |
| cardinality=2203395 |
| avgRowSize=8.043875 |
| MaterializedView: true |
+-----------------------------------------------------------------------------------------------------+
50 rows in set (0.53 sec)
手工刷新物化视图
'root'@(none) Tue Jan 9 17:59:31 2024>refresh materialized view default_catalog.hiveviewtestdb.mv8211;
+--------------------------------------+
| QUERY_ID |
+--------------------------------------+
| ce8b0b53-aed5-11ee-af3a-1070fd5198c8 |
+--------------------------------------+
1 row in set (0.00 sec)
物化视图没有更新任何分区,因为EXTRA_MESSAGE里面没有显示有分区内容
'root'@(none) Tue Jan 9 17:59:41 2024>select * from default_catalog.information_schema.task_runs where `DATABASE` = 'hiveviewtestdb' and CREATE_TIME >= '2024-01-09 09:00:00' and DEFINITION like '%`mv8211`%' order by create_time desc limit 1\G
*************************** 1. row ***************************
QUERY_ID: ce8b0b53-aed5-11ee-af3a-1070fd5198c8
TASK_NAME: mv-1001381
CREATE_TIME: 2024-01-09 17:59:41
FINISH_TIME: 2024-01-09 17:59:44
STATE: SUCCESS
DATABASE: hiveviewtestdb
DEFINITION: insert overwrite `mv8211` SELECT `hiveviewtestdb`.`v821`.`p_brand`, `hiveviewtestdb`.`v821`.`LO_ORDERDATE`, `hiveviewtestdb`.`v821`.`revenue_sum`
FROM `hiveviewtestdb`.`v821`
EXPIRE_TIME: 2024-01-10 17:59:41
ERROR_CODE: 0
ERROR_MESSAGE: NULL
PROGRESS: 100%
EXTRA_MESSAGE: {"forceRefresh":false,"mvPartitionsToRefresh":[],"refBasePartitionsToRefreshMap":{},"basePartitionsToRefreshMap":{}}
PROPERTIES: {"FORCE":"false"}
1 row in set (1.84 sec)
再次explain,还是命中物化视图,这里应该做hive和物化视图的分区union的
'root'@(none) Tue Jan 9 18:00:15 2024>explain select p_brand, sum(revenue_sum) as revenue_sum_sum from default_catalog.hiveviewtestdb.v821 group by p_brand;
+-----------------------------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:23: p_brand | 29: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 5:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 30: p_brand |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 05 |
| UNPARTITIONED |
| |
| 4:Project |
| | <slot 23> : 30: p_brand |
| | <slot 29> : 33: sum |
| | <slot 33> : clone(33: sum) |
| | |
| 3:AGGREGATE (merge finalize) |
| | output: sum(33: sum) |
| | group by: 30: p_brand |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 30: p_brand |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(32: revenue_sum) |
| | group by: 30: p_brand |
| | |
| 0:OlapScanNode |
| TABLE: mv8211 |
| PREAGGREGATION: ON |
| partitions=2406/2406 |
| rollup: mv8211 |
| tabletRatio=24060/24060 |
| tabletList=3779146,3779148,3779150,3779152,3779154,3779156,3779158,3779160,3779162,3779164 ... |
| cardinality=2203395 |
| avgRowSize=8.043875 |
| MaterializedView: true |
+-----------------------------------------------------------------------------------------------------+
50 rows in set (1.00 sec)
不查View,使用View的SQL直接查hive表,会命中物化视图:
'root'@hivessbtest Tue Jan 9 20:23:22 2024>explain
-> SELECT hive.hivessbtest.p.p_brand, hive.hivessbtest.l.LO_ORDERDATE, sum(hive.hivessbtest.l.LO_REVENUE) AS revenue_sum
-> FROM hive.hivessbtest.P_LINEORDER AS l LEFT OUTER JOIN hive.hivessbtest.part AS p ON hive.hivessbtest.l.LO_PARTKEY = hive.hivessbtest.p.P_PARTKEY
-> GROUP BY hive.hivessbtest.p.p_brand, hive.hivessbtest.l.LO_ORDERDATE;
+-----------------------------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:23: p_brand | 18: lo_orderdate | 28: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| UNPARTITIONED |
| |
| 1:Project |
| | <slot 18> : 30: LO_ORDERDATE |
| | <slot 23> : 29: p_brand |
| | <slot 28> : 31: revenue_sum |
| | |
| 0:OlapScanNode |
| TABLE: mv8211 |
| PREAGGREGATION: ON |
| partitions=2406/2406 |
| rollup: mv8211 |
| tabletRatio=24060/24060 |
| tabletList=3779146,3779148,3779150,3779152,3779154,3779156,3779158,3779160,3779162,3779164 ... |
| cardinality=1654073 |
| avgRowSize=6.0 |
| MaterializedView: true |
+-----------------------------------------------------------------------------------------------------+
31 rows in set (0.40 sec)
查hive基表,可以查
'root'@hivessbtest Tue Jan 9 20:27:46 2024>explain select lo_suppkey, count(*) from hive.hivessbtest.P_LINEORDER group by lo_suppkey;
+----------------------------------------------+
| Explain String |
+----------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:5: lo_suppkey | 19: count |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 4:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 5: lo_suppkey |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| |
| 3:AGGREGATE (merge finalize) |
| | output: count(19: count) |
| | group by: 5: lo_suppkey |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 5: lo_suppkey |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: count(*) |
| | group by: 5: lo_suppkey |
| | |
| 0:HdfsScanNode |
| TABLE: p_lineorder |
| partitions=2406/2406 |
| cardinality=6001172 |
| avgRowSize=1.0 |
+----------------------------------------------+
40 rows in set (0.35 sec)
'root'@hivessbtest Tue Jan 9 20:27:52 2024>select lo_suppkey, count(*) from hive.hivessbtest.P_LINEORDER group by lo_suppkey;
...
...
...
| 558 | 3004 |
| 720 | 3042 |
| 393 | 3064 |
+------------+----------+
2000 rows in set (5.24 sec)
【背景】做过哪些操作?
没有做过操作
【业务影响】
未上线业务
【是否存算分离】
否
【StarRocks版本】3.2.2
【集群规模】
3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,96C 512G
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群4-小李或者邮箱,谢谢