Routine Load作业设置了max_error_number为2000000,与错误检测窗口范围一样,预期是为了让其忽略所有的错误数据,但是实际使用时发现消费到kafka中的错误数据时,会卡住,无法继续往后消费数据

【StarRocks版本】3.1.2-4f3a2ee
【集群规模】3个FE,3个BE,混合部署
【服务器配置】16C,64G内存,500G硬盘,万兆网络,共3台
【联系方式】本站
【问题】Routine Load作业设置了 max_error_number(允许的错误数据行数的上限)为2000000,与错误检测窗口范围一样,预期是为了让其忽略所有的错误数据,不会因任何错误数据而停止消费kafka中的数据。
但实际情况是消费到kafka中的非json数据(错误数据)时,没有自动跳过,而是卡住了,无法继续往后消费数据

附:【Routine Load作业创建语句】
CREATE ROUTINE LOAD test_db.kafka2sr_ods_detail_test_2023111416 ON detail_table
COLUMNS(tmp_event_time,uid,event_time=from_unixtime(tmp_event_time,‘yyyy-MM-dd HH:mm:ss’))
PROPERTIES
(
“desired_concurrent_number” = “3”,
“format” = “json”,
“strip_outer_array” = “true”,
“json_root” = “$.records”,
“jsonpaths” = “[”$.event_time","$.uid"]",
“max_error_number”=“2000000”,
“strict_mode” = “true”,
“log_rejected_record_num”= “-1”
)
FROM KAFKA
(
“kafka_broker_list” = “node-103:9092,node-104:9092,node-105:9092”,
“kafka_topic” = “ods_detail_test”,
“property.kafka_default_offsets” = “OFFSET_BEGINNING”
);

附:【kafka中的数据示例】
1、正常数据:
{“records”:[{“event_time”:1640966401,“uid”:3030881},{“event_time”:1640966402,“uid”:3030882}]}


2、错误数据(非json数据),以下这个内容为kafka中的某条数据:
--------------------------6b5a916796e2c62d
Content-Disposition: attachment; name=“records”

Array
--------------------------6b5a916796e2c62d–

SHOW ROUTINE LOAD TASK WHERE JobName = "kafka2sr_ods_detail_test_2023111416 ";   结果请发下,建表也请发下

【查看ROUTINE LOAD TASK】
mysql> SHOW ROUTINE LOAD TASK WHERE JobName = “kafka2sr_ods_detail_test_2023111416”\G
*************************** 1. row ***************************
TaskId: a10c3c5f-a8e8-4149-be61-b2abc1754d59
TxnId: -1
TxnStatus: UNKNOWN
JobId: 12456
CreateTime: 2023-11-14 16:00:01
LastScheduledTime: 2023-11-14 16:00:59
ExecuteStartTime: NULL
Timeout: 60
BeId: -1
DataSourceProperties: Progress:{“1”:268},LatestOffset:null
Message: there is no new data in kafka/pulsar, wait for 10 seconds to schedule again
*************************** 2. row ***************************
TaskId: 9d2e6798-2545-4314-8e48-4dc7281e3747
TxnId: -1
TxnStatus: UNKNOWN
JobId: 12456
CreateTime: 2023-11-14 16:00:03
LastScheduledTime: NULL
ExecuteStartTime: NULL
Timeout: 60
BeId: -1
DataSourceProperties: Progress:{“0”:257},LatestOffset:null
Message: previous task aborted because of illegal json started with 45
*************************** 3. row ***************************
TaskId: 38d05739-1369-4fe2-9043-f07aa61e7002
TxnId: -1
TxnStatus: UNKNOWN
JobId: 12456
CreateTime: 2023-11-14 16:00:03
LastScheduledTime: NULL
ExecuteStartTime: NULL
Timeout: 60
BeId: -1
DataSourceProperties: Progress:{“2”:253},LatestOffset:null
Message: previous task aborted because of illegal json started with 45
3 rows in set (0.00 sec)

【starrocks建表语句】
CREATE TABLE test_db.detail_table (
event_time DATETIME NOT NULL COMMENT “时间”,
uid INT NOT NULL COMMENT “用户ID”
)
DUPLICATE KEY(event_time,uid)
PARTITION BY date_trunc(“day”,event_time)
DISTRIBUTED BY HASH(uid)
PROPERTIES(
“replication_num” = “2”,
“bloom_filter_columns” = “uid”
);

这个问题有解决方案了吗

您好,DATETIME 类型无法直接通过时间戳写入,需要写入日期格式

可以检查下数据和表结构是否匹配