一个SQL里面的一个子查询和物化视图的定义相同,但是这个SQL却不能用物化视图来透明改写,请问为什么呢?

一个SQL里面的一个子查询和物化视图的定义相同,但是这个SQL却不能用物化视图来透明改写,请问为什么呢?

物化视图定义:

‘root’@hivessbtest Thu Jan 4 21:17:41 2024>create materialized view default_catalog.hiveviewtestdb.mv8212 PARTITION BY lo_orderdate DISTRIBUTED BY HASH(LO_ORDERDATE) REFRESH async START (‘2024-01-04 21:18:00’) every (interval 10 minute) as select
-> p_brand,
-> LO_ORDERDATE,
-> sum(LO_REVENUE) as revenue_sum
-> from
-> P_LINEORDER l
-> left join part p on l.LO_PARTKEY = p.P_PARTKEY
-> group by
-> p_brand,
-> LO_ORDERDATE;
Query OK, 0 rows affected (0.02 sec)

物化视图已经刷新

‘root’@hivessbtest Mon Jan 8 16:57:38 2024>select * from default_catalog.information_schema.task_runs where DATABASE = ‘hiveviewtestdb’ and CREATE_TIME >= ‘2024-01-08 15:00:00’ and DEFINITION like ‘%mv8212%’ order by create_time desc limit 1\G
*************************** 1. row ***************************
QUERY_ID: a03576d9-ae02-11ee-9d33-5e9962dd9b0e
TASK_NAME: mv-1102449
CREATE_TIME: 2024-01-08 16:47:59
FINISH_TIME: 2024-01-08 16:50:19
STATE: SUCCESS
DATABASE: hiveviewtestdb
DEFINITION: insert overwrite mv8212 SELECT hive.hivessbtest.p.p_brand, hive.hivessbtest.l.LO_ORDERDATE, sum(hive.hivessbtest.l.LO_REVENUE) AS revenue_sum
FROM hive.hivessbtest.P_LINEORDER AS l LEFT OUTER JOIN hive.hivessbtest.part AS p ON hive.hivessbtest.l.LO_PARTKEY = hive.hivessbtest.p.P_PARTKEY
GROUP BY hive.hivessbtest.p.p_brand, hive.hivessbtest.l.LO_ORDERDATE
EXPIRE_TIME: 2024-01-09 16:47:59
ERROR_CODE: 0
ERROR_MESSAGE: NULL
PROGRESS: 100%
EXTRA_MESSAGE: {“forceRefresh”:false,“mvPartitionsToRefresh”:[],“refBasePartitionsToRefreshMap”:{},“basePartitionsToRefreshMap”:{}}
PROPERTIES: {“mvId”:“1102449”,“replication_num”:“1”,“in_memory”:“false”}
1 row in set (4.13 sec)

下面的这个SQL,第一个子查询的定义和上面的物化视图定义相同,物化视图却不能用来做查询改写

‘root’@hivessbtest Mon Jan 8 16:58:32 2024>explain
-> select
-> t1.p_brand as p_brand,
-> t1.LO_ORDERDATE as LO_ORDERDATE,
-> SUM(revenue_sum) + SUM(supplycost_sum) as revenue_and_supplycost_sum
-> from
-> (
-> select
-> p_brand,
-> LO_ORDERDATE,
-> sum(LO_REVENUE) as revenue_sum
-> from
-> P_LINEORDER l
-> left join part p on l.LO_PARTKEY = p.P_PARTKEY
-> group by
-> p_brand,
-> LO_ORDERDATE
-> ) t1
-> inner join (
-> select
-> p_brand,
-> LO_ORDERDATE,
-> sum(LO_SUPPLYCOST) as supplycost_sum
-> from
-> P_LINEORDER l
-> left join part p on l.LO_PARTKEY = p.P_PARTKEY
-> group by
-> p_brand,
-> LO_ORDERDATE
-> ) t2 on t1.p_brand = t2.p_brand
-> and t1.LO_ORDERDATE = t2.LO_ORDERDATE
-> group by
-> t1.p_brand,
-> t1.LO_ORDERDATE;
±--------------------------------------------------------------+
| Explain String |
±--------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:23: p_brand | 18: lo_orderdate | 59: expr |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 20:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 23: p_brand, 18: lo_orderdate |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 20 |
| UNPARTITIONED |
| |
| 19:Project |
| | <slot 18> : 18: lo_orderdate |
| | <slot 23> : 23: p_brand |
| | <slot 59> : 57: sum + 58: sum |
| | |
| 18:AGGREGATE (update finalize) |
| | output: sum(28: sum), sum(56: sum) |
| | group by: 23: p_brand, 18: lo_orderdate |
| | |
| 17:Project |
| | <slot 18> : 18: lo_orderdate |
| | <slot 23> : 23: p_brand |
| | <slot 28> : 28: sum |
| | <slot 56> : 56: sum |
| | |
| 16:HASH JOIN |
| | join op: INNER JOIN (BUCKET_SHUFFLE(S)) |
| | colocate: false, reason: |
| | equal join conjunct: 23: p_brand = 51: p_brand |
| | equal join conjunct: 18: lo_orderdate = 46: lo_orderdate |
| | |
| |----15:AGGREGATE (merge finalize) |
| | | output: sum(56: sum) |
| | | group by: 51: p_brand, 46: lo_orderdate |
| | | |
| | 14:EXCHANGE |
| | |
| 7:AGGREGATE (merge finalize) |
| | output: sum(28: sum) |
| | group by: 23: p_brand, 18: lo_orderdate |
| | |
| 6:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 14 |
| HASH_PARTITIONED: 51: p_brand, 46: lo_orderdate |
| |
| 13:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(41: lo_supplycost) |
| | group by: 51: p_brand, 46: lo_orderdate |
| | |
| 12:Project |
| | <slot 41> : 41: lo_supplycost |
| | <slot 46> : 46: lo_orderdate |
| | <slot 51> : 51: p_brand |
| | |
| 11:HASH JOIN |
| | join op: INNER JOIN (BROADCAST) |
| | colocate: false, reason: |
| | equal join conjunct: 32: lo_partkey = 47: p_partkey |
| | |
| |----10:EXCHANGE |
| | |
| 8:HdfsScanNode |
| TABLE: p_lineorder |
| PARTITION PREDICATES: 46: lo_orderdate IS NOT NULL |
| partitions=2406/2406 |
| cardinality=6001181 |
| avgRowSize=6.0 |
| |
| PLAN FRAGMENT 3 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 10 |
| UNPARTITIONED |
| |
| 9:HdfsScanNode |
| TABLE: part |
| NON-PARTITION PREDICATES: 51: p_brand IS NOT NULL |
| partitions=1/1 |
| cardinality=1 |
| avgRowSize=2.0 |
| |
| PLAN FRAGMENT 4 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 06 |
| HASH_PARTITIONED: 23: p_brand, 18: lo_orderdate |
| |
| 5:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(12: lo_revenue) |
| | group by: 23: p_brand, 18: lo_orderdate |
| | |
| 4:Project |
| | <slot 12> : 12: lo_revenue |
| | <slot 18> : 18: lo_orderdate |
| | <slot 23> : 23: p_brand |
| | |
| 3:HASH JOIN |
| | join op: INNER JOIN (BROADCAST) |
| | colocate: false, reason: |
| | equal join conjunct: 4: lo_partkey = 19: p_partkey |
| | |
| |----2:EXCHANGE |
| | |
| 0:HdfsScanNode |
| TABLE: p_lineorder |
| PARTITION PREDICATES: 18: lo_orderdate IS NOT NULL |
| partitions=2406/2406 |
| cardinality=6001181 |
| avgRowSize=6.0 |
| |
| PLAN FRAGMENT 5 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| UNPARTITIONED |
| |
| 1:HdfsScanNode |
| TABLE: part |
| NON-PARTITION PREDICATES: 23: p_brand IS NOT NULL |
| partitions=1/1 |
| cardinality=1 |
| avgRowSize=2.0 |
±--------------------------------------------------------------+
142 rows in set (0.63 sec)

【背景】做过哪些操作?
没有做过操作
【业务影响】
未上线业务
【是否存算分离】

【StarRocks版本】3.2.2
【集群规模】
3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,96C 512G
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群4-小李或者邮箱,谢谢

您好 您是3.2.2版本吗

嗯是的,3.2.2

mark 一下,同样的问题、

3.2+版本用

trace logs mv <query> 

查看下mv rewrite的日志。

注:下面的日志是我用branch-3.2的最新代码编译的,替换了fe/lib整个目录,版本号branch-3.2-8db6910

trace_logs_mv_subquery_test.log (1.3 MB)

但是,如果子查询是一个表,就能命中,有点奇怪。

'root'@hiveviewtestdb Tue Jan 23 19:10:02 2024>show create view v811\G
*************************** 1. row ***************************
                View: v811
         Create View: CREATE VIEW `v811` (`LO_SHIPMODE`, `LO_ORDERDATE`, `revenue_sum`) AS SELECT `hive`.`hivessbtest`.`P_LINEORDER`.`LO_SHIPMODE`, `hive`.`hivessbtest`.`P_LINEORDER`.`LO_ORDERDATE`, sum(`hive`.`hivessbtest`.`P_LINEORDER`.`LO_REVENUE`) AS `revenue_sum`
FROM `hive`.`hivessbtest`.`P_LINEORDER`
GROUP BY `hive`.`hivessbtest`.`P_LINEORDER`.`LO_SHIPMODE`, `hive`.`hivessbtest`.`P_LINEORDER`.`LO_ORDERDATE`;
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.01 sec)

'root'@hiveviewtestdb Tue Jan 23 19:10:33 2024>show materialized views where name like 'mv8111'\G
*************************** 1. row ***************************
                                  id: 1710089
                       database_name: hiveviewtestdb
                                name: mv8111
                        refresh_type: ASYNC
                           is_active: true
                     inactive_reason: 
                      partition_type: RANGE
                             task_id: 1710091
                           task_name: mv-1710089
             last_refresh_start_time: 2024-01-23 19:07:24
          last_refresh_finished_time: 2024-01-23 19:07:26
               last_refresh_duration: 2.504
                  last_refresh_state: SUCCESS
          last_refresh_force_refresh: false
        last_refresh_start_partition: 
          last_refresh_end_partition: 
last_refresh_base_refresh_partitions: {}
  last_refresh_mv_refresh_partitions: 
             last_refresh_error_code: 0
          last_refresh_error_message: 
                                rows: 16842
                                text: CREATE MATERIALIZED VIEW `mv8111` (`LO_SHIPMODE`, `LO_ORDERDATE`, `revenue_sum`)
PARTITION BY (`lo_orderdate`)
DISTRIBUTED BY HASH(`LO_ORDERDATE`)
REFRESH ASYNC START("2024-01-05 22:00:00") EVERY(INTERVAL 10 MINUTE)
PROPERTIES (
"replicated_storage" = "true",
"replication_num" = "1",
"storage_medium" = "HDD"
)
AS SELECT `v811`.`LO_SHIPMODE`, `v811`.`LO_ORDERDATE`, `v811`.`revenue_sum`
FROM `hiveviewtestdb`.`v811`;
1 row in set (0.09 sec)


'root'@hiveviewtestdb Tue Jan 23 19:11:16 2024>set catalog hive;
Query OK, 0 rows affected (0.00 sec)

'root'@hiveviewtestdb Tue Jan 23 19:11:21 2024>use hivessbtest;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
'root'@hivessbtest Tue Jan 23 19:11:26 2024>explain
    -> select 
    ->   t1.LO_SHIPMODE, 
    ->   t1.LO_ORDERDATE, 
    ->   SUM(revenue_sum) + SUM(supplycost_sum) as revenue_and_supplycost_sum
    -> from 
    ->   (
    ->     select 
    ->       LO_SHIPMODE, 
    ->       LO_ORDERDATE, 
    ->       sum(LO_REVENUE) as revenue_sum 
    ->     from 
    ->       P_LINEORDER 
    ->     group by 
    ->       LO_SHIPMODE, 
    ->       LO_ORDERDATE
    ->   ) t1 
    ->   inner join (
    ->     select 
    ->       LO_SHIPMODE, 
    ->       LO_ORDERDATE, 
    ->       sum(LO_SUPPLYCOST) as supplycost_sum 
    ->     from 
    ->       P_LINEORDER 
    ->     group by 
    ->       LO_SHIPMODE, 
    ->       LO_ORDERDATE
    ->   ) t2 on t1.LO_SHIPMODE = t2.LO_SHIPMODE 
    ->   and t1.LO_ORDERDATE = t2.LO_ORDERDATE 
    -> group by 
    ->   t1.LO_SHIPMODE, 
    ->   t1.LO_ORDERDATE;
+---------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                |
+---------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                               |
|  OUTPUT EXPRS:16: lo_shipmode | 18: lo_orderdate | 41: expr                                                   |
|   PARTITION: UNPARTITIONED                                                                                    |
|                                                                                                               |
|   RESULT SINK                                                                                                 |
|                                                                                                               |
|   11:EXCHANGE                                                                                                 |
|                                                                                                               |
| PLAN FRAGMENT 1                                                                                               |
|  OUTPUT EXPRS:                                                                                                |
|   PARTITION: HASH_PARTITIONED: 35: lo_shipmode, 37: lo_orderdate                                              |
|                                                                                                               |
|   STREAM DATA SINK                                                                                            |
|     EXCHANGE ID: 11                                                                                           |
|     UNPARTITIONED                                                                                             |
|                                                                                                               |
|   10:Project                                                                                                  |
|   |  <slot 16> : 16: lo_shipmode                                                                              |
|   |  <slot 18> : 18: lo_orderdate                                                                             |
|   |  <slot 41> : 39: sum + 40: sum                                                                            |
|   |                                                                                                           |
|   9:AGGREGATE (update finalize)                                                                               |
|   |  output: sum(19: sum), sum(38: sum)                                                                       |
|   |  group by: 16: lo_shipmode, 18: lo_orderdate                                                              |
|   |                                                                                                           |
|   8:Project                                                                                                   |
|   |  <slot 16> : 16: lo_shipmode                                                                              |
|   |  <slot 18> : 18: lo_orderdate                                                                             |
|   |  <slot 19> : 19: sum                                                                                      |
|   |  <slot 38> : 38: sum                                                                                      |
|   |                                                                                                           |
|   7:HASH JOIN                                                                                                 |
|   |  join op: INNER JOIN (BUCKET_SHUFFLE(S))                                                                  |
|   |  colocate: false, reason:                                                                                 |
|   |  equal join conjunct: 16: lo_shipmode = 35: lo_shipmode                                                   |
|   |  equal join conjunct: 18: lo_orderdate = 37: lo_orderdate                                                 |
|   |                                                                                                           |
|   |----6:AGGREGATE (merge finalize)                                                                           |
|   |    |  output: sum(38: sum)                                                                                |
|   |    |  group by: 35: lo_shipmode, 37: lo_orderdate                                                         |
|   |    |                                                                                                      |
|   |    5:EXCHANGE                                                                                             |
|   |                                                                                                           |
|   2:EXCHANGE                                                                                                  |
|                                                                                                               |
| PLAN FRAGMENT 2                                                                                               |
|  OUTPUT EXPRS:                                                                                                |
|   PARTITION: RANDOM                                                                                           |
|                                                                                                               |
|   STREAM DATA SINK                                                                                            |
|     EXCHANGE ID: 05                                                                                           |
|     HASH_PARTITIONED: 35: lo_shipmode, 37: lo_orderdate                                                       |
|                                                                                                               |
|   4:AGGREGATE (update serialize)                                                                              |
|   |  STREAMING                                                                                                |
|   |  output: sum(32: lo_supplycost)                                                                           |
|   |  group by: 35: lo_shipmode, 37: lo_orderdate                                                              |
|   |                                                                                                           |
|   3:HdfsScanNode                                                                                              |
|      TABLE: p_lineorder                                                                                       |
|      PARTITION PREDICATES: 37: lo_orderdate IS NOT NULL                                                       |
|      NON-PARTITION PREDICATES: 35: lo_shipmode IS NOT NULL                                                    |
|      partitions=2406/2406                                                                                     |
|      cardinality=1                                                                                            |
|      avgRowSize=3.0                                                                                           |
|                                                                                                               |
| PLAN FRAGMENT 3                                                                                               |
|  OUTPUT EXPRS:                                                                                                |
|   PARTITION: RANDOM                                                                                           |
|                                                                                                               |
|   STREAM DATA SINK                                                                                            |
|     EXCHANGE ID: 02                                                                                           |
|     HASH_PARTITIONED: 16: lo_shipmode, 18: lo_orderdate                                                       |
|                                                                                                               |
|   1:Project                                                                                                   |
|   |  <slot 16> : 127: LO_SHIPMODE                                                                             |
|   |  <slot 18> : 128: LO_ORDERDATE                                                                            |
|   |  <slot 19> : 129: revenue_sum                                                                             |
|   |                                                                                                           |
|   0:OlapScanNode                                                                                              |
|      TABLE: mv8111                                                                                            |
|      PREAGGREGATION: ON                                                                                       |
|      partitions=2406/2406                                                                                     |
|      rollup: mv8111                                                                                           |
|      tabletRatio=24060/24060                                                                                  |
|      tabletList=15481994,15481996,15481998,15482000,15482002,15482004,15482006,15482008,15482010,15482012 ... |
|      cardinality=1                                                                                            |
|      avgRowSize=3.0                                                                                           |
|      MaterializedView: true                                                                                   |
+---------------------------------------------------------------------------------------------------------------+
89 rows in set (1.10 sec)