异步物化视图不能自动改写

物化视图创建:

CREATE MATERIALIZED VIEW IF NOT EXISTS test_mv
DISTRIBUTED BY HASH(uid)
REFRESH MANUAL
PROPERTIES (
    "replication_num" = "3",
    "force_external_table_query_rewrite" = "TRUE"
)
as select
    imp_date,
    el_lm_experiment_id,
    uid
from
    hive.test_db.test_table
group by
    imp_date,
    el_lm_experiment_id,
    uid
order by imp_date, el_lm_experiment_id;

查询语句:

        select
            imp_date,
            el_lm_experiment_id,
            uid
        from
            hive.test_db.test_table
        where
            imp_date = 20231112
            and el_lm_experiment_id in ('10901522')
        group by
            imp_date,
            el_lm_experiment_id,
            uid

explain执行计划:

+--------------------------------------------------------------------------------------------+
| Explain String                                                                             |
+--------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                            |
|  OUTPUT EXPRS:1: imp_date | 3: el_lm_experiment_id | 10: uid                           |
|   PARTITION: UNPARTITIONED                                                                 |
|                                                                                            |
|   RESULT SINK                                                                              |
|                                                                                            |
|   4:EXCHANGE                                                                               |
|                                                                                            |
| PLAN FRAGMENT 1                                                                            |
|  OUTPUT EXPRS:                                                                             |
|   PARTITION: HASH_PARTITIONED: 1: imp_date, 3: el_lm_experiment_id, 10: uid            |
|                                                                                            |
|   STREAM DATA SINK                                                                         |
|     EXCHANGE ID: 04                                                                        |
|     UNPARTITIONED                                                                          |
|                                                                                            |
|   3:AGGREGATE (merge finalize)                                                             |
|   |  group by: 1: imp_date, 3: el_lm_experiment_id, 10: uid                            |
|   |                                                                                        |
|   2:EXCHANGE                                                                               |
|                                                                                            |
| PLAN FRAGMENT 2                                                                            |
|  OUTPUT EXPRS:                                                                             |
|   PARTITION: RANDOM                                                                        |
|                                                                                            |
|   STREAM DATA SINK                                                                         |
|     EXCHANGE ID: 02                                                                        |
|     HASH_PARTITIONED: 1: imp_date, 3: el_lm_experiment_id, 10: uid                     |
|                                                                                            |
|   1:AGGREGATE (update serialize)                                                           |
|   |  STREAMING                                                                             |
|   |  group by: 1: imp_date, 3: el_lm_experiment_id, 10: uid                            |
|   |                                                                                        |
|   0:HdfsScanNode                                                                           |
|      TABLE: test_table                                          |
|      PARTITION PREDICATES: 1: imp_date = 20231112                                          |
|      NON-PARTITION PREDICATES: 1: imp_date = 20231112, 3: el_lm_experiment_id = '10901522' |
|      partitions=2/368                                                                      |
|      cardinality=1                                                                         |
|      avgRowSize=3.0                                                                        |
|      numNodes=0                                                                            |
+--------------------------------------------------------------------------------------------+

请问是什么版本呢,另外帮忙发下explain costs + sql的结果呢

版本是3.1
explain costs

+--------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0(F02)                                                                                                     |
|   Output Exprs:1: imp_date | 3: el_lm_experiment_id | 10: uid                                                        |
|   Input Partition: UNPARTITIONED                                                                                         |
|   RESULT SINK                                                                                                            |
|                                                                                                                          |
|   4:EXCHANGE                                                                                                             |
|      cardinality: 451703015                                                                                              |
|                                                                                                                          |
| PLAN FRAGMENT 1(F01)                                                                                                     |
|                                                                                                                          |
|   Input Partition: HASH_PARTITIONED: 1: imp_date, 3: el_lm_experiment_id, 10: uid                                    |
|   OutPut Partition: UNPARTITIONED                                                                                        |
|   OutPut Exchange Id: 04                                                                                                 |
|                                                                                                                          |
|   3:AGGREGATE (merge finalize)                                                                                           |
|   |  group by: [1: imp_date, BIGINT, true], [3: el_lm_experiment_id, VARCHAR, true], [10: uid, VARCHAR, true]        |
|   |  cardinality: 451703015                                                                                              |
|   |  column statistics:                                                                                                  |
|   |  * imp_date-->[2.0231112E7, 2.0231112E7, 0.0, 1.0, 1.0] UNKNOWN                                                      |
|   |  * el_lm_experiment_id-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                |
|   |  * uid-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                            |
|   |                                                                                                                      |
|   2:EXCHANGE                                                                                                             |
|      distribution type: SHUFFLE                                                                                          |
|      partition exprs: [1: imp_date, BIGINT, true], [3: el_lm_experiment_id, VARCHAR, true], [10: uid, VARCHAR, true] |
|      cardinality: 819415901                                                                                              |
|                                                                                                                          |
| PLAN FRAGMENT 2(F00)                                                                                                     |
|                                                                                                                          |
|   Input Partition: RANDOM                                                                                                |
|   OutPut Partition: HASH_PARTITIONED: 1: imp_date, 3: el_lm_experiment_id, 10: uid                                   |
|   OutPut Exchange Id: 02                                                                                                 |
|                                                                                                                          |
|   1:AGGREGATE (update serialize)                                                                                         |
|   |  STREAMING                                                                                                           |
|   |  group by: [1: imp_date, BIGINT, true], [3: el_lm_experiment_id, VARCHAR, true], [10: uid, VARCHAR, true]        |
|   |  cardinality: 819415901                                                                                              |
|   |  column statistics:                                                                                                  |
|   |  * imp_date-->[2.0231112E7, 2.0231112E7, 0.0, 1.0, 1.0] UNKNOWN                                                      |
|   |  * el_lm_experiment_id-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                |
|   |  * uid-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                            |
|   |                                                                                                                      |
|   0:HdfsScanNode                                                                                                         |
|      TABLE: test_table                                                                        |
|      PARTITION PREDICATES: 1: imp_date = 20231112                                                                        |
|      NON-PARTITION PREDICATES: 1: imp_date = 20231112, 3: el_lm_experiment_id = '10901522'                               |
|      partitions=2/368                                                                                                    |
|      avgRowSize=3.0                                                                                                      |
|      numNodes=0                                                                                                          |
|      cardinality: 1486468754                                                                                             |
|      column statistics:                                                                                                  |
|      * imp_date-->[2.0231112E7, 2.0231112E7, 0.0, 1.0, 1.0] UNKNOWN                                                      |
|      * el_lm_experiment_id-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                |
|      * uid-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                            |
+--------------------------------------------------------------------------------------------------------------------------+

3.1的哪个小版本呢,select current_version();的结果可以看下

可以把物化视图定义里的order by去掉试试

去掉了order by依然不能替换。用的是3.1.4版本。

麻烦执行下这个语句看下结果
trace rewrite select
imp_date,
el_lm_experiment_id,
uid
from
hive.test_db.test_table
where
imp_date = 20231112
and el_lm_experiment_id in (‘10901522’)
group by
imp_date,
el_lm_experiment_id,
uid;

另外还有show materialized views;看下这个物化视图的状态

感谢回复!
找到原因了,因为刷新视图没有成功所以没能替换:

[SYNC=false] MV test_mv is outdated, stale partitions [test_mv]

show materialized view

last_refresh_state: FAILED

手动refresh成功之后可以替换了

一晚上过后又不能替换了。

因为hive外表的分区字段是varchar,不能以此建带分区的物化视图,于是mv里面加了where条件,昨晚是可以替换的,今天即使手动refresh materialized view之后也还是不能替换。

> trace rewrite  select             imp_date,             el_lm_experiment_id,             uid         from             hive.test_db.test_table         where             imp_date = 20231112             and el_lm_experiment_id in ('10901522')         group by             imp_date,             el_lm_experiment_id,             uid;
+-------------------------------------------------------------------------------------------------------+
| Explain String                                                                                        |
+-------------------------------------------------------------------------------------------------------+
| -- [TRACE: PREPARE GLOBAL]                                                                            |
|    [SYNC=false] RelatedMVs: [test_mv_20231112], CandidateMVs: [] |
|    [SYNC=true] There are no related mvs for the query plan                                            |
| -- [TRACE: Summary]                                                                                   |
|    Query cannot be rewritten, please check the trace logs to find more information.                   |
+-------------------------------------------------------------------------------------------------------+
show materialized views;

                                  id: 12908
                       database_name: test
                                name: test_mv_20231112
                        refresh_type: MANUAL
                           is_active: true
                     inactive_reason: NULL
                      partition_type: UNPARTITIONED
                             task_id: 12947
                           task_name: mv-12908
             last_refresh_start_time: 2023-11-17 09:49:25
          last_refresh_finished_time: 2023-11-17 09:51:26
               last_refresh_duration: 121.116
                  last_refresh_state: SUCCESS
          last_refresh_force_refresh: false
        last_refresh_start_partition: 
          last_refresh_end_partition: 
last_refresh_base_refresh_partitions: {test_table=[imp_date_hive_part=default, imp_date_hive_part=p_20231112]}
  last_refresh_mv_refresh_partitions: test_mv_20231112
             last_refresh_error_code: 0
          last_refresh_error_message: 
                                rows: 1725741349
                                text: CREATE MATERIALIZED VIEW `test_mv_20231112` (`imp_date`, `el_lm_experiment_id`, `uid`)
COMMENT "MATERIALIZED_VIEW"
DISTRIBUTED BY HASH(`uid`)
REFRESH MANUAL
PROPERTIES (
"replicated_storage" = "true",
"mv_owner" = "root",
"force_external_table_query_rewrite" = "CHECKED",
"replication_num" = "3",
"datacache.enable" = "false",
"enable_async_write_back" = "false",
"storage_volume" = "builtin_storage_volume"
)
AS SELECT `test_table`.`imp_date`, `test_table`.`el_lm_experiment_id`, `test_table`.`uid`
FROM `hive`.`test_db`.`test_table`
WHERE `test_table`.`imp_date` = 20231112
GROUP BY `test_table`.`imp_date`, `test_table`.`el_lm_experiment_id`, `test_table`.`uid`;

你有试过建带分区的物化视图吗?会报错吗?可以试一下

带分区的物化视图刷新时会报错:

--手动刷新指定分区
 refresh materialized view  test_mv partition start ("p_20231112") end ("p_20231113");
--报错
ERROR 1064 (HY000): Getting analyzing error. Detail message: Unsupported batch partition build type:VARCHAR(1048576).

你这个是hive外表吗?后续推荐使用catalog了,外表不推荐使用了。刚刚那个不刷新的问题稍等我们再看下

是通过hive catalog访问的外表

CREATE MATERIALIZED VIEW test_mv_20231112 (imp_date, el_lm_experiment_id, uid)
COMMENT “MATERIALIZED_VIEW”
partition by str2date(imp_date, “%Y%m%d”)
DISTRIBUTED BY HASH(uid)
REFRESH MANUAL
PROPERTIES (
“replicated_storage” = “true”,
“mv_owner” = “root”,
“force_external_table_query_rewrite” = “CHECKED”,
“replication_num” = “3”,
“datacache.enable” = “false”,
“enable_async_write_back” = “false”,
“storage_volume” = “builtin_storage_volume”
)
AS SELECT test_table.imp_date, test_table.el_lm_experiment_id, test_table.uid
FROM hive.test_db.test_table
WHERE test_table.imp_date = 20231112
GROUP BY test_table.imp_date, test_table.el_lm_experiment_id, test_table.uid;

加上 **partition by str2date(imp_date, “%Y%m%d”)**试试

会报错:
ERROR 1064 (HY000): Getting syntax error from line 3, column 13 to line 3, column 50. Detail message: Unsupported expr ‘str2date(imp_date_hive_part, “ % Y % m % d”)’ in PARTITION BY clause.

不应该,check一下是不是全角半角字符导致的?贴一下对应的create语句看看

重新打了一遍报错是:
ERROR 1064 (HY000): Getting analyzing error from line 3, column 13 to line 3, column 40. Detail message: Materialized view partition function str2date check failed.

另外,imp_date是int类型的,hive表的分区字段是imp_date_hive_part 它的格式是’p_20231112’,这种是不是没法建分区表

哦,imp_date是int类型的话,直接加partition by imp_date
看看