【BrokerLoad】StarRocks 2.1.12 起 将会对字段长度进行严格管控。(2.1.12以前版本遇到字段长度越界问题会把值置为null, 而新版本则会报错)

  1. 背景

StarRocks2.1.12 起 将会对字段长度进行严格管控。(2.1.12以前版本遇到字段长度越界问题会把值置为null, 而新版本则会报错)。

  1. 问题呈现

在进行 broker load 时, 如出现下面问题,代表 字段长度不足(越界) type:LOAD_RUN_FAIL; msg: string length(39) > limit(36) 从而导致整个broker load 失败。

  1. 问题疑问

问:为什么以前导入正常的job现在反而失败了?

答:因为在2.1.12以前的版本,在 broker load 遇到越界字段行为时, 会把值置为null,后期的版本将会对字段长度进行严格管控。

问:怎么处理这个异常?
答:调整字段长度 eg. alter table ods.schema modify column column_name varchar(500)。

问:怎么复现这个问题?
答:使用 properties(“strict_mode” = “true”) 可以稳定复现。

  1. 问题预测

  • 4.1. broker load**
    正常 BROKER LOAD 语句
    ${LABELNAME}_${TIMESTAMPS} (随意字符串, 但需保持唯一性,所以这边使用了timestamp时间戳)

${HDFSPATH} (HDFS路径,如: hdfs://cnprod1ha/warehouse/tablespace/external/hive/cn_ods_bulkpo.db/o_bas_sku_on_hand_audit/*)

${SRTB} (需要导入数据的StarRocks表名, 如: bulkpo__sku_on_hand_audit)

${COLUMNS} (表字段, 如: item_nbr,dept_nbr,xxx, …)

${PRINCIPAL} (需要申请, 如: USER@HADOOP_CNPROD1.WAL-MART.COM)

${KEYTAB} (需要申请, 如: /home/dorisops/.vn52xxx.keytab)

Broker Load 语法

StarRocks(StarRocks)中执行语句:

LOAD LABEL ${LABELNAME}_${TIMESTAMPS}
(
DATA INFILE(" ${HDFSPATH} ")
INTO TABLE ${SRTB}
format as “orc”
( ${COLUMNS} )
)
WITH BROKER prdbroker
(
“hadoop.security.authentication” = “kerberos”,
“kerberos_principal” = " ${PRINCIPAL} ",
“kerberos_keytab” = " ${KEYTAB} ",
“dfs.nameservices” = “cnprod1ha”,
“dfs.ha.namenodes.cnprod1ha” = “nn1,nn2”,
“dfs.namenode.rpc-address.cnprod1ha.nn1” = “oser406435.cn.wal-mart.com:8020”,
“dfs.namenode.rpc-address.cnprod1ha.nn2” = “oser406433.cn.wal-mart.com:8020”,
“dfs.client.failover.proxy.provider.cnprod1ha” = “org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider”
)
PROPERTIES
(
“timezone” = “UTC”
)

检测 BROKER LOAD 语句

(增加 properties(“strict_mode” = “true”) )

Broker Load 语法

StarRocks(StarRocks)中执行语句:

LOAD LABEL ${LABELNAME}_${TIMESTAMPS}
(
DATA INFILE("${HDFSPATH}")
INTO TABLE ${DORISTB}
format as “orc”
(${COLUMNS})
)
WITH BROKER prdbroker
(
“hadoop.security.authentication” = “kerberos”,
“kerberos_principal” = “${PRINCIPAL}”,
“kerberos_keytab” = “${KEYTAB}”,
“dfs.nameservices” = “cnprod1ha”,
“dfs.ha.namenodes.cnprod1ha” = “nn1,nn2”,
“dfs.namenode.rpc-address.cnprod1ha.nn1” = “oser406435.cn.wal-mart.com:8020”,
“dfs.namenode.rpc-address.cnprod1ha.nn2” = “oser406433.cn.wal-mart.com:8020”,
“dfs.client.failover.proxy.provider.cnprod1ha” = “org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider”
)
PROPERTIES
(
“timezone” = “UTC”,
“strict_mode” = “true”
)

4.2 找到label

StarRocks(StarRocks)中执行语句:

show load from ods where label=’${LABELNAME}_${TIMESTAMPS}’

*************************** 1. row ***************************
JobId: 191176490
Label: po_automation__po_change_order_info_1661333110
State: CANCELLED
Progress: ETL:N/A; LOAD:N/A
Type: BROKER
EtlInfo: unselected.rows=0; dpp.abnorm.ALL=2279; dpp.norm.ALL=814076
TaskInfo: cluster:N/A; timeout(s):14400; max_filter_ratio:0.0
ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to cancel
CreateTime: 2022-08-24 09:25:10
EtlStartTime: 2022-08-24 09:25:13
EtlFinishTime: 2022-08-24 09:25:13
LoadStartTime: 2022-08-24 09:25:13
LoadFinishTime: 2022-08-24 09:25:27
URL: http://10.233.76.31:8040/api/_load_error_log?file=error_log_3e447272f1e645e3_851dd6205d7e8cdd
JobDetails: {“Unfinished backends”:{“3e447272-f1e6-45e3-851d-d6205d7e8cdc”:[]},“ScannedRows”:816355,“TaskNumber”:1,“All backends”:{“3e447272-f1e6-45e3-851d-d6205d7e8cdc”:[10005]},“FileNumber”:1,“FileSize”:23082903}
1 row in set (0.004 sec)

4.3 提取URL

a. 浏览器打开url

b. linux执行命令

/bin/curl -s http://10.233.76.31:8040/api/_load_error_log?file=error_log_3e447272f1e645e3_851dd6205d7e8cdd


这样就拿到相关越界数据, 字段,长度。

例如:

Error: String ‘椰树箱装超1/3保质期,供应商提供现成本6.22折折扣’ is too long. The type of ’ updatereason ’ is VARCHAR(50) '. Row:

updatereason 字段 varchar(50) 存不下 “椰树箱装超1/3保质期,供应商提供现成本6.22折折扣” 这笔数据

从而导致整个 broker load 任务失败!

5.问题解决

(拓展字段长度)

StarRocks(StarRocks)中执行语句:

alter table ods.schema modify column column_name **varchar(500)。