【hdfs导入失败】type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to cancel

【详述】
这边执行语句LOAD WITH BROKER尝试导入hive数据的时候,一直报错。具体错误内容如下


语句已经指定了COLUMNS TERMINATED BY “\t”
这边看HDFS的源文件,也确实是\t分割的,如下图,不知道为什么始终分不了列


【StarRocks版本】3.1.2-4f3a2ee
【集群规模】3fe+3be
具体的load语句如下
LOAD LABEL xxx.ads_plat_fraudusers_risk_alert(DATA INFILE(“hdfs://xxx/warehouse/xxx/ads_app_fraudinviters_risk_alert/*”) INTO TABLE ads_plat_fraudusers_risk_alert COLUMNS TERMINATED BY “\t” (created_at,user_id,regist_at,inviter_id,kyc_level,assumption_country,phone,email,fullname,gender,nationality,address_line1,rule1_deposit,rule2_withdraw,rule3_email,deposit_from_address,withdraw_to_address,email_root)) WITH BROKER(“username”=“xxx”,“password”=“xxx”) PROPERTIES(“timeout”=“3600”);

你试试指定分隔符为\\t


还是一样的报错,\t, \t, \x01都试过了

两个\都试过不行吗?

是的,\t, \t, \0x1, \0x1都试过了,都不行,下面是我hexdump查看的源文件

能生成一份测试的文件上传下吗?比如脱敏后只有一两行数据的,我们试下

tmp.txt (1.1 KB)
可以的,麻烦了哈

建表语句能麻烦也发下吗

LOAD LABEL xxx.ads_plat_fraudusers_risk_alert(DATA INFILE("hdfs://xxx/warehouse/ads_app_fraudinviters_risk_alert/*") INTO TABLE ads_plat_fraudusers_risk_alert COLUMNS TERMINATED BY "\t" (created_at,user_id,regist_at,inviter_id,kyc_level,assumption_country,phone,email,fullname,gender,nationality,address_line1,rule1_deposit,rule2_withdraw,rule3_email,deposit_from_address,withdraw_to_address,email_root)) WITH BROKER("username"="xxx","password"="xxx") PROPERTIES("timeout"="3600");

ads_plat_fraudusers_risk_alert的建表语句,starrocks里执行show create table ads_plat_fraudusers_risk_alert;

CREATE TABLE `ads_plat_fraudusers_risk_alert` (
  `created_at` varchar(96) NULL COMMENT "",
  `user_id` largeint(40) NULL COMMENT "",
  `regist_at` datetime NULL COMMENT "",
  `inviter_id` largeint(40) NULL COMMENT "",
  `kyc_level` largeint(40) NULL COMMENT "",
  `assumption_country` varchar(96) NULL COMMENT "",
  `phone` varchar(96) NULL COMMENT "",
  `email` varchar(96) NULL COMMENT "",
  `fullname` varchar(96) NULL COMMENT "",
  `gender` varchar(96) NULL COMMENT "",
  `nationality` varchar(96) NULL COMMENT "",
  `address_line1` varchar(96) NULL COMMENT "",
  `rule1_deposit` tinyint(4) NULL COMMENT "",
  `rule2_withdraw` tinyint(4) NULL COMMENT "",
  `rule3_email` tinyint(4) NULL COMMENT "",
  `deposit_from_address` varchar(65533) NULL COMMENT "",
  `withdraw_to_address` varchar(192) NULL COMMENT "",
  `email_root` varchar(96) NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`created_at`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 4
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"enable_persistent_index" = "true",
"replicated_storage" = "true",
"compression" = "LZ4"
);

表里有18列,为什么数据集里只有15列呢?

我试了下把数据列数和表里的列数对齐,使用\t做分隔符是可以导入的。这块报错有问题,我们优化下

多谢,这里我也再查一下