【物化视图】【Hive catalog】列存储错序,命中物化视图后给出错误的数据类型,执行SQL失败

【详述】在使用concat的时候命中物化视图,数据错列存储,给出了不支持的数据类型float,SQL执行失败,报错向量错误,再次执行访问非法地址,BE crash。

--出错语句
SELECT CONCAT(`query_data_file_csv`.`string_01`, 'test') AS `re` FROM `dte_hive_catalog`.`_default`.`ry_data_file_csv` AS `query_data_file_csv` WHERE `query_data_file_csv`.`string_01` = 'string00';

第一次执行成功,explain costs显示命中外表

+-----------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                          |
+-----------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0(F01)                                                                                                                    |
|   Output Exprs:53: concat                                                                                                               |
|   Input Partition: UNPARTITIONED                                                                                                        |
|   RESULT SINK                                                                                                                           |
|                                                                                                                                         |
|   2:EXCHANGE                                                                                                                            |
|      cardinality: 812592                                                                                                                |
|                                                                                                                                         |
| PLAN FRAGMENT 1(F00)                                                                                                                    |
|                                                                                                                                         |
|   Input Partition: RANDOM                                                                                                               |
|   OutPut Partition: UNPARTITIONED                                                                                                       |
|   OutPut Exchange Id: 02                                                                                                                |
|                                                                                                                                         |
|   1:Project                                                                                                                             |
|   |  output columns:                                                                                                                    |
|   |  53 <-> concat[([2: string_01, VARCHAR, true], 'test'); args: VARCHAR; result: VARCHAR; args nullable: true; result nullable: true] |
|   |  cardinality: 812592                                                                                                                |
|   |  column statistics:                                                                                                                 |
|   |  * concat-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                                            |
|   |                                                                                                                                     |
|   0:HdfsScanNode                                                                                                                        |
|      TABLE: ry_data_file_csv                                                                                                                                                                                                     |
|      NON-PARTITION PREDICATES: 2: string_01 = 'string00'                                                                                |
|      MIN/MAX PREDICATES: 2: string_01 <= 'string00', 2: string_01 >= 'string00'                                                         |
|      partitions=1368/1368                                                                                                               |
|      avgRowSize=2.0                                                                                                                     |
|      cardinality: 812592                                                                                                                |
|      column statistics:                                                                                                                 |
|      * string_01-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                                         |
|      * concat-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                                            |
+-----------------------------------------------------------------------------------------------------------------------------------------+

第二次命中物化视图,传入不支持的数据类型float,报错向量错误;

-- explain costs
+---------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0(F01)                                                                                                                  |
|   Output Exprs:53: concat                                                                                                             |
|   Input Partition: UNPARTITIONED                                                                                                      |
|   RESULT SINK                                                                                                                         |
|                                                                                                                                       |
|   2:EXCHANGE                                                                                                                          |
|      cardinality: 915                                                                                                                 |
|                                                                                                                                       |
| PLAN FRAGMENT 1(F00)                                                                                                                  |
|                                                                                                                                       |
|   Input Partition: RANDOM                                                                                                             |
|   OutPut Partition: UNPARTITIONED                                                                                                     |
|   OutPut Exchange Id: 02                                                                                                              |
|                                                                                                                                       |
|   1:Project                                                                                                                           |
|   |  output columns:                                                                                                                  |
|   |  53 <-> concat[([56: float_02, FLOAT, true], 'test'); args: VARCHAR; result: VARCHAR; args nullable: true; result nullable: true] |
|   |  cardinality: 915                                                                                                                 |
|   |  column statistics:                                                                                                               |
|   |  * concat-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                                          |
|   |                                                                                                                                   |
|   0:OlapScanNode                                                                                                                      |
|      table: DEFAULT__mv_col__test, rollup: dte__DEFAULT__mv_col__test                                                            |
|      preAggregation: on                                                                                                               |
|      partitionsRatio=1/1, tabletsRatio=6/6                                                                                            |
|      tabletList=20517,20519,20521,20523,20525,20527                                                                                   |
|      actualRows=915, avgRowSize=9.0                                                                                                   |
|      MaterializedView: true                                                                                                           |
|      cardinality: 915                                                                                                                 |
|      column statistics:                                                                                                               |
|      * concat-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN                                                                          |
|      * float_02-->[100000.0, 100000.0, 0.0, 8.0, 1.0] ESTIMATE                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------+

物化视图结构

--describe DEFAULT__mv_col__test
+-----------+------------------+------+-------+---------+-------+
| Field     | Type             | Null | Key   | Default | Extra |
+-----------+------------------+------+-------+---------+-------+
| string_01 | varchar(1048576) | YES  | true  | NULL    |       |
| expTmp    | double           | YES  | false | NULL    | NONE  |
| float_02  | float            | YES  | false | NULL    | NONE  |
+-----------+------------------+------+-------+---------+-------+

物化视图数据

--select * from DEFAULT__mv_col__test limit 5;
--剩余数据和显示数据保持一致,共1300+行
+-----------+--------+----------+
| string_01 | expTmp | float_02 |
+-----------+--------+----------+
| string00  |   NULL |   100000 |
| string00  |   NULL |   100000 |
| string00  |   NULL |   100000 |
| string00  |   NULL |   100000 |
| string00  |   NULL |   100000 |
+-----------+--------+----------+

问题初步定位是由于异步物化视图数据存储错位导致的,具体原因未知,需要社区协助

-- 物化视图创建语句
CREATE MATERIALIZED VIEW `DEFAULT__mv_col__test` 
DISTRIBUTED BY RANDOM
REFRESH ASYNC START("2025-01-12 04:22:01") EVERY(INTERVAL 1 MINUTE)
PROPERTIES (
"replicated_storage" = "true",
"replication_num" = "1",
"session.query_timeout" = "3600",
"storage_medium" = "HDD"
)
AS SELECT exp(`query_data_logical_fact`.`integer_01`) AS `expTmp`, `query_data_logical_fact`.`float_02` AS `float_02`, `query_data_logical_fact`.`string_01` AS `string_01`
FROM (SELECT `query_data_file_csv`.`time_stamp`, `query_data_file_csv`.`string_01`, `query_data_file_csv`.`string_02`, `query_data_file_csv`.`string_03`, `query_data_file_csv`.`string_04`, `query_data_file_csv`.`string_05`, `query_data_file_csv`.`string_06`, `query_data_file_csv`.`string_07`, `query_data_file_csv`.`string_08`, `query_data_file_csv`.`string_09`, `query_data_file_csv`.`string_10`, `query_data_file_csv`.`integer_01`, `query_data_file_csv`.`integer_02`, `query_data_file_csv`.`integer_03`, `query_data_file_csv`.`integer_04`, `query_data_file_csv`.`integer_05`, `query_data_file_csv`.`integer_06`, `query_data_file_csv`.`integer_07`, `query_data_file_csv`.`integer_08`, `query_data_file_csv`.`integer_09`, `query_data_file_csv`.`integer_10`, `query_data_file_csv`.`long_01`, `query_data_file_csv`.`long_02`, `query_data_file_csv`.`long_03`, `query_data_file_csv`.`long_04`, `query_data_file_csv`.`long_05`, `query_data_file_csv`.`long_06`, `query_data_file_csv`.`long_07`, `query_data_file_csv`.`long_08`, `query_data_file_csv`.`long_09`, `query_data_file_csv`.`long_10`, `query_data_file_csv`.`float_01`, `query_data_file_csv`.`float_02`, `query_data_file_csv`.`float_03`, `query_data_file_csv`.`float_04`, `query_data_file_csv`.`float_05`, `query_data_file_csv`.`float_06`, `query_data_file_csv`.`float_07`, `query_data_file_csv`.`float_08`, `query_data_file_csv`.`float_09`, `query_data_file_csv`.`float_10`, `query_data_file_csv`.`double_01`, `query_data_file_csv`.`double_02`, `query_data_file_csv`.`double_03`, `query_data_file_csv`.`double_04`, `query_data_file_csv`.`double_05`, `query_data_file_csv`.`double_06`, `query_data_file_csv`.`double_07`, `query_data_file_csv`.`double_08`, `query_data_file_csv`.`double_09`, `query_data_file_csv`.`double_10`
FROM `dte_hive_catalog`.`_default`.`ry_data_file_csv` AS `query_data_file_csv`) `query_data_logical_fact`
WHERE `query_data_logical_fact`.`string_01` = 'string00';

但是show create显示了列名,和数据并不对应

-- 创建时不论是否指定列名,其中列名总是string_01, expTmp, float_02
CREATE MATERIALIZED VIEW `DEFAULT__mv_col__test` (`string_01`, `expTmp`, `float_02`)
DISTRIBUTED BY RANDOM
REFRESH ASYNC START("2025-01-12 04:22:01") EVERY(INTERVAL 1 MINUTE)
PROPERTIES (
"replicated_storage" = "true",
"replication_num" = "1",
"session.query_timeout" = "3600",
"storage_medium" = "HDD"
)
AS SELECT exp(`query_data_logical_fact`.`integer_01`) AS `expTmp`, `query_data_logical_fact`.`float_02`, `query_data_logical_fact`.`string_01`
FROM (SELECT `query_data_file_csv`.`time_stamp`, `query_data_file_csv`.`string_01`, `query_data_file_csv`.`string_02`, `query_data_file_csv`.`string_03`, `query_data_file_csv`.`string_04`, `query_data_file_csv`.`string_05`, `query_data_file_csv`.`string_06`, `query_data_file_csv`.`string_07`, `query_data_file_csv`.`string_08`, `query_data_file_csv`.`string_09`, `query_data_file_csv`.`string_10`, `query_data_file_csv`.`integer_01`, `query_data_file_csv`.`integer_02`, `query_data_file_csv`.`integer_03`, `query_data_file_csv`.`integer_04`, `query_data_file_csv`.`integer_05`, `query_data_file_csv`.`integer_06`, `query_data_file_csv`.`integer_07`, `query_data_file_csv`.`integer_08`, `query_data_file_csv`.`integer_09`, `query_data_file_csv`.`integer_10`, `query_data_file_csv`.`long_01`, `query_data_file_csv`.`long_02`, `query_data_file_csv`.`long_03`, `query_data_file_csv`.`long_04`, `query_data_file_csv`.`long_05`, `query_data_file_csv`.`long_06`, `query_data_file_csv`.`long_07`, `query_data_file_csv`.`long_08`, `query_data_file_csv`.`long_09`, `query_data_file_csv`.`long_10`, `query_data_file_csv`.`float_01`, `query_data_file_csv`.`float_02`, `query_data_file_csv`.`float_03`, `query_data_file_csv`.`float_04`, `query_data_file_csv`.`float_05`, `query_data_file_csv`.`float_06`, `query_data_file_csv`.`float_07`, `query_data_file_csv`.`float_08`, `query_data_file_csv`.`float_09`, `query_data_file_csv`.`float_10`, `query_data_file_csv`.`double_01`, `query_data_file_csv`.`double_02`, `query_data_file_csv`.`double_03`, `query_data_file_csv`.`double_04`, `query_data_file_csv`.`double_05`, `query_data_file_csv`.`double_06`, `query_data_file_csv`.`double_07`, `query_data_file_csv`.`double_08`, `query_data_file_csv`.`double_09`, `query_data_file_csv`.`double_10`
FROM `dte_hive_catalog`.`_default`.`ry_data_file_csv
` AS `query_data_file_csv`) `query_data_logical_fact`
WHERE `query_data_logical_fact`.`string_01` = 'string00'; |

通过该现象倒推,修改select语句中的数据顺序,即在AS 后的select顺序改成string-expTmp-float,命中时取到了正确的varchar数据进行拼接,SQL执行成功。(注:类似exp(query_data_logical_fact.integer_01) AS expTmp在string和float中加AS指定参数也是没有用的,必须按照该顺序排布才可执行成功)

该物化视图是基于select语句创建的,如果根据物化视图存储列再去对select中的表进行调整,很明显逻辑有问题,为什么describe的时候是正确的结构,但是在命中时会提供错误的数据列。

【背景】
创建hive catalog查询外表,基于该查询建立物化视图。
在concat的时候,第一次explain costs显示命中外表,第二次explain costs显示命中物化视图,但是给出不支持的数据类型,导致向量错误,第三次执行concat语句,访问非法地址,be crash。
【业务影响】
【是否存算分离】否
【StarRocks版本】3.2.0
【集群规模】3fe + 3be
【联系方式】lrabbit125@gmail.com

这个问题在当前场景下是一个必现问题,社区可以帮忙看一下吗?

多谢反馈。

3.2.0 是一个比较老的版本,应该有些已知的问题(在支持mv指定Order key的时候引入的)。 可以升级到3.2最新的版本,drop & recreate mv再试下嘛?

我们老版本还是走3.2.0,最新版本解决了的话也要将修改回合到老版本中,社区方便找下修复order key的pr吗? 我在github上筛选了,没有找到对应的更改…