【详述】在使用concat的时候命中物化视图,数据错列存储,给出了不支持的数据类型float,SQL执行失败,报错向量错误,再次执行访问非法地址,BE crash。
--出错语句
SELECT CONCAT(`query_data_file_csv`.`string_01`, 'test') AS `re` FROM `dte_hive_catalog`.`_default`.`ry_data_file_csv` AS `query_data_file_csv` WHERE `query_data_file_csv`.`string_01` = 'string00';
第一次执行成功,explain costs显示命中外表
+-----------------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0(F01) |
| Output Exprs:53: concat |
| Input Partition: UNPARTITIONED |
| RESULT SINK |
| |
| 2:EXCHANGE |
| cardinality: 812592 |
| |
| PLAN FRAGMENT 1(F00) |
| |
| Input Partition: RANDOM |
| OutPut Partition: UNPARTITIONED |
| OutPut Exchange Id: 02 |
| |
| 1:Project |
| | output columns: |
| | 53 <-> concat[([2: string_01, VARCHAR, true], 'test'); args: VARCHAR; result: VARCHAR; args nullable: true; result nullable: true] |
| | cardinality: 812592 |
| | column statistics: |
| | * concat-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN |
| | |
| 0:HdfsScanNode |
| TABLE: ry_data_file_csv |
| NON-PARTITION PREDICATES: 2: string_01 = 'string00' |
| MIN/MAX PREDICATES: 2: string_01 <= 'string00', 2: string_01 >= 'string00' |
| partitions=1368/1368 |
| avgRowSize=2.0 |
| cardinality: 812592 |
| column statistics: |
| * string_01-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN |
| * concat-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN |
+-----------------------------------------------------------------------------------------------------------------------------------------+
第二次命中物化视图,传入不支持的数据类型float,报错向量错误;
-- explain costs
+---------------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0(F01) |
| Output Exprs:53: concat |
| Input Partition: UNPARTITIONED |
| RESULT SINK |
| |
| 2:EXCHANGE |
| cardinality: 915 |
| |
| PLAN FRAGMENT 1(F00) |
| |
| Input Partition: RANDOM |
| OutPut Partition: UNPARTITIONED |
| OutPut Exchange Id: 02 |
| |
| 1:Project |
| | output columns: |
| | 53 <-> concat[([56: float_02, FLOAT, true], 'test'); args: VARCHAR; result: VARCHAR; args nullable: true; result nullable: true] |
| | cardinality: 915 |
| | column statistics: |
| | * concat-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN |
| | |
| 0:OlapScanNode |
| table: DEFAULT__mv_col__test, rollup: dte__DEFAULT__mv_col__test |
| preAggregation: on |
| partitionsRatio=1/1, tabletsRatio=6/6 |
| tabletList=20517,20519,20521,20523,20525,20527 |
| actualRows=915, avgRowSize=9.0 |
| MaterializedView: true |
| cardinality: 915 |
| column statistics: |
| * concat-->[-Infinity, Infinity, 0.0, 1.0, 1.0] UNKNOWN |
| * float_02-->[100000.0, 100000.0, 0.0, 8.0, 1.0] ESTIMATE |
+---------------------------------------------------------------------------------------------------------------------------------------+
物化视图结构
--describe DEFAULT__mv_col__test
+-----------+------------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-------+---------+-------+
| string_01 | varchar(1048576) | YES | true | NULL | |
| expTmp | double | YES | false | NULL | NONE |
| float_02 | float | YES | false | NULL | NONE |
+-----------+------------------+------+-------+---------+-------+
物化视图数据
--select * from DEFAULT__mv_col__test limit 5;
--剩余数据和显示数据保持一致,共1300+行
+-----------+--------+----------+
| string_01 | expTmp | float_02 |
+-----------+--------+----------+
| string00 | NULL | 100000 |
| string00 | NULL | 100000 |
| string00 | NULL | 100000 |
| string00 | NULL | 100000 |
| string00 | NULL | 100000 |
+-----------+--------+----------+
问题初步定位是由于异步物化视图数据存储错位导致的,具体原因未知,需要社区协助
-- 物化视图创建语句
CREATE MATERIALIZED VIEW `DEFAULT__mv_col__test`
DISTRIBUTED BY RANDOM
REFRESH ASYNC START("2025-01-12 04:22:01") EVERY(INTERVAL 1 MINUTE)
PROPERTIES (
"replicated_storage" = "true",
"replication_num" = "1",
"session.query_timeout" = "3600",
"storage_medium" = "HDD"
)
AS SELECT exp(`query_data_logical_fact`.`integer_01`) AS `expTmp`, `query_data_logical_fact`.`float_02` AS `float_02`, `query_data_logical_fact`.`string_01` AS `string_01`
FROM (SELECT `query_data_file_csv`.`time_stamp`, `query_data_file_csv`.`string_01`, `query_data_file_csv`.`string_02`, `query_data_file_csv`.`string_03`, `query_data_file_csv`.`string_04`, `query_data_file_csv`.`string_05`, `query_data_file_csv`.`string_06`, `query_data_file_csv`.`string_07`, `query_data_file_csv`.`string_08`, `query_data_file_csv`.`string_09`, `query_data_file_csv`.`string_10`, `query_data_file_csv`.`integer_01`, `query_data_file_csv`.`integer_02`, `query_data_file_csv`.`integer_03`, `query_data_file_csv`.`integer_04`, `query_data_file_csv`.`integer_05`, `query_data_file_csv`.`integer_06`, `query_data_file_csv`.`integer_07`, `query_data_file_csv`.`integer_08`, `query_data_file_csv`.`integer_09`, `query_data_file_csv`.`integer_10`, `query_data_file_csv`.`long_01`, `query_data_file_csv`.`long_02`, `query_data_file_csv`.`long_03`, `query_data_file_csv`.`long_04`, `query_data_file_csv`.`long_05`, `query_data_file_csv`.`long_06`, `query_data_file_csv`.`long_07`, `query_data_file_csv`.`long_08`, `query_data_file_csv`.`long_09`, `query_data_file_csv`.`long_10`, `query_data_file_csv`.`float_01`, `query_data_file_csv`.`float_02`, `query_data_file_csv`.`float_03`, `query_data_file_csv`.`float_04`, `query_data_file_csv`.`float_05`, `query_data_file_csv`.`float_06`, `query_data_file_csv`.`float_07`, `query_data_file_csv`.`float_08`, `query_data_file_csv`.`float_09`, `query_data_file_csv`.`float_10`, `query_data_file_csv`.`double_01`, `query_data_file_csv`.`double_02`, `query_data_file_csv`.`double_03`, `query_data_file_csv`.`double_04`, `query_data_file_csv`.`double_05`, `query_data_file_csv`.`double_06`, `query_data_file_csv`.`double_07`, `query_data_file_csv`.`double_08`, `query_data_file_csv`.`double_09`, `query_data_file_csv`.`double_10`
FROM `dte_hive_catalog`.`_default`.`ry_data_file_csv` AS `query_data_file_csv`) `query_data_logical_fact`
WHERE `query_data_logical_fact`.`string_01` = 'string00';
但是show create显示了列名,和数据并不对应
-- 创建时不论是否指定列名,其中列名总是string_01, expTmp, float_02
CREATE MATERIALIZED VIEW `DEFAULT__mv_col__test` (`string_01`, `expTmp`, `float_02`)
DISTRIBUTED BY RANDOM
REFRESH ASYNC START("2025-01-12 04:22:01") EVERY(INTERVAL 1 MINUTE)
PROPERTIES (
"replicated_storage" = "true",
"replication_num" = "1",
"session.query_timeout" = "3600",
"storage_medium" = "HDD"
)
AS SELECT exp(`query_data_logical_fact`.`integer_01`) AS `expTmp`, `query_data_logical_fact`.`float_02`, `query_data_logical_fact`.`string_01`
FROM (SELECT `query_data_file_csv`.`time_stamp`, `query_data_file_csv`.`string_01`, `query_data_file_csv`.`string_02`, `query_data_file_csv`.`string_03`, `query_data_file_csv`.`string_04`, `query_data_file_csv`.`string_05`, `query_data_file_csv`.`string_06`, `query_data_file_csv`.`string_07`, `query_data_file_csv`.`string_08`, `query_data_file_csv`.`string_09`, `query_data_file_csv`.`string_10`, `query_data_file_csv`.`integer_01`, `query_data_file_csv`.`integer_02`, `query_data_file_csv`.`integer_03`, `query_data_file_csv`.`integer_04`, `query_data_file_csv`.`integer_05`, `query_data_file_csv`.`integer_06`, `query_data_file_csv`.`integer_07`, `query_data_file_csv`.`integer_08`, `query_data_file_csv`.`integer_09`, `query_data_file_csv`.`integer_10`, `query_data_file_csv`.`long_01`, `query_data_file_csv`.`long_02`, `query_data_file_csv`.`long_03`, `query_data_file_csv`.`long_04`, `query_data_file_csv`.`long_05`, `query_data_file_csv`.`long_06`, `query_data_file_csv`.`long_07`, `query_data_file_csv`.`long_08`, `query_data_file_csv`.`long_09`, `query_data_file_csv`.`long_10`, `query_data_file_csv`.`float_01`, `query_data_file_csv`.`float_02`, `query_data_file_csv`.`float_03`, `query_data_file_csv`.`float_04`, `query_data_file_csv`.`float_05`, `query_data_file_csv`.`float_06`, `query_data_file_csv`.`float_07`, `query_data_file_csv`.`float_08`, `query_data_file_csv`.`float_09`, `query_data_file_csv`.`float_10`, `query_data_file_csv`.`double_01`, `query_data_file_csv`.`double_02`, `query_data_file_csv`.`double_03`, `query_data_file_csv`.`double_04`, `query_data_file_csv`.`double_05`, `query_data_file_csv`.`double_06`, `query_data_file_csv`.`double_07`, `query_data_file_csv`.`double_08`, `query_data_file_csv`.`double_09`, `query_data_file_csv`.`double_10`
FROM `dte_hive_catalog`.`_default`.`ry_data_file_csv
` AS `query_data_file_csv`) `query_data_logical_fact`
WHERE `query_data_logical_fact`.`string_01` = 'string00'; |
通过该现象倒推,修改select语句中的数据顺序,即在AS 后的select顺序改成string-expTmp-float,命中时取到了正确的varchar数据进行拼接,SQL执行成功。(注:类似exp(query_data_logical_fact
.integer_01
) AS expTmp
在string和float中加AS指定参数也是没有用的,必须按照该顺序排布才可执行成功)
该物化视图是基于select语句创建的,如果根据物化视图存储列再去对select中的表进行调整,很明显逻辑有问题,为什么describe的时候是正确的结构,但是在命中时会提供错误的数据列。
【背景】
创建hive catalog查询外表,基于该查询建立物化视图。
在concat的时候,第一次explain costs显示命中外表,第二次explain costs显示命中物化视图,但是给出不支持的数据类型,导致向量错误,第三次执行concat语句,访问非法地址,be crash。
【业务影响】
【是否存算分离】否
【StarRocks版本】3.2.0
【集群规模】3fe + 3be
【联系方式】lrabbit125@gmail.com