【详述】sr 2.3版本,kafka的json导入遇到错误编码格式,abortedTaskNum数一直上升,卡在当下的offset不动了。
【业务影响】
【StarRocks版本】例如:2.3
【集群规模】例如:3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【表模型】例如:主键模型
【导入或者导出方式】
报错信息如下:
Error: Data quality error: Failed to iterate document stream as object. error: The input is not valid UTF-8. Row: parser current location: {“uniid”:“1d58cc41769d78cf2af98f19f68e47b4”,“uni_type”:“imei”,“brand”:“亿ç¾è®¯èå¼åç§ææéå “,“model”:””,“os”:“ANDROID”,“os_version”:“4.1.0”,“user_category”:null}
有办法跳过吗?
导入语句:
CREATE ROUTINE LOAD load_dim_RealTime_adx_baidu_uniid_categorys_res on dim_RealTime_adx_baidu_uniid_categorys
COLUMNS(uniid, uni_type, brand, model, os, os_version, user_category, timedt=CURRENT_TIMESTAMP())
WHERE user_category is not null
PROPERTIES
(
“desired_concurrent_number”=“6”,
“max_batch_rows”=“200000”,
“max_error_number”=“200”, – 允许出错的最大条数
“format” =“json”
)
FROM KAFKA
(
“kafka_broker_list”= “alikafka-pre-cn-zvp2qsdyu007-1-vpc.alikafka.aliyuncs.com:9092,alikafka-pre-cn-zvp2qsdyu007-2-vpc.alikafka.aliyuncs.com:9092,alikafka-pre-cn-zvp2qsdyu007-3-vpc.alikafka.aliyuncs.com:9092”,
“kafka_topic” = “adx_baidu_user_label”,
“property.group.id” = “adx_baidu_user_label-v1”,
“property.kafka_default_offsets” = “OFFSET_BEGINNING”
);
出错的消息:
{
“uniid”: “1d58cc41769d78cf2af98f19f68e47b4”,
“uni_type”: “imei”,
“brand”: “亿ç¾è®¯èå¼åç§ææéå�”,
“model”: “”,
“os”: “ANDROID”,
“os_version”: “4.1.0”,
“user_category”: null
}
表结构:
CREATE TABLE dim_RealTime_adx_baidu_uniid_category
(
uniid
string NOT NULL default ‘0’ COMMENT “设备号”,
uni_type
string NULL COMMENT “设备号类型”,
brand
string NULL COMMENT “品牌”,
model
string NULL COMMENT “机型”,
os
string NULL COMMENT “操作系统”,
os_version
string NULL COMMENT “版本号”,
user_category
ARRAY not NULL COMMENT “用户标签”,
timedt
datetime NULL COMMENT “更新时间”
) ENGINE=OLAP
PRIMARY KEY(uniid
)
COMMENT “OLAP”
DISTRIBUTED BY HASH(uniid
) BUCKETS 32
PROPERTIES (
“replication_num” = “3”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”
);
be.log 相关报错
I0321 10:30:27.497920 16898 data_consumer_group.cpp:101] start consumer group: 944404f4ce2f1251-be9775d8e14de7a4. max time(ms): 15000, batch size: 4294967296. id=4d7c375a6b304297-85f4a7c241858709, job_id=6409570, txn_id: 14642551, label=load_dim_RealTime_adx_baidu_uniid_categorys_res-6409570-4d7c375a-6b30-4297-85f4-a7c241858709-14642551, db=default_cluster:dim
W0321 10:30:28.024767 16761 stream_load_executor.cpp:89] fragment execute failed, query_id=4d7c375a6b304297-85f4a7c241858709, err_msg=Failed to iterate document stream as object. error: The input is not valid UTF-8, id=4d7c375a6b304297-85f4a7c241858709, job_id=6409570, txn_id: 14642551, label=load_dim_RealTime_adx_baidu_uniid_categorys_res-6409570-4d7c375a-6b30-4297-85f4-a7c241858709-14642551, db=default_cluster:dim
I0321 10:30:28.026865 16898 routine_load_task_executor.cpp:191] finished routine load task id=4d7c375a6b304297-85f4a7c241858709, job_id=6409570, txn_id: 14642551, label=load_dim_RealTime_adx_baidu_uniid_categorys_res-6409570-4d7c375a-6b30-4297-85f4-a7c241858709-14642551, db=default_cluster:dim, status: Failed to iterate document stream as object. error: The input is not valid UTF-8, current tasks num: 3
- Table: dim_RealTime_adx_baidu_uniid_categorys
- Rollup: dim_RealTime_adx_baidu_uniid_categorys
这些乱码字符,应该是字节转字符乱码,但是我解析还是utf-8的。不知道为啥数据库不认。
【联系方式】dyuan_vip@126.com