curl --location-trusted -u root -T /mnt/tpch-kit/dbgen/orders.tbl.8 -H “column_separator:|” http://192.186.0.21:8030/api/tpch/orders/_stream_load
Enter host password for user ‘root’:
{
“TxnId”: 1111,
“Label”: “83a7d9ea-0dc2-40ce-be69-567968b27471”,
“Status”: “Fail”,
“Message”: “too many filtered rows”,
“NumberTotalRows”: 75000000,
“NumberLoadedRows”: 11870911,
“NumberFilteredRows”: 63129089,
“NumberUnselectedRows”: 0,
“LoadBytes”: 8985137514,
“LoadTimeMs”: 41645,
“BeginTxnTimeMs”: 0,
“StreamLoadPutTimeMs”: 0,
“ReadDataTimeMs”: 34742,
“WriteDataTimeMs”: 41643,
“CommitAndPublishTimeMs”: 0,
“ErrorURL”: “http://192.186.0.21:8040/api/_load_error_log?file=__shard_0/error_log_insert_stmt_2c4e1d7f-314c-8d42-b1de-beba55774292_2c4e1d7f314c8d42_b1debeba55774292”
这个是由于数据质量带来的问题,你可以看一下errorURL里面的具体 的内容,看一下错误的原因,如果要忽略错误数据,可以在原有的curl语句中加入 -H “max_filter_ratio:0.1”,来过滤脏数据。例如这样:curl --location-trusted -u root -T /mnt/tpch-kit/dbgen/orders.tbl.8 -H “max_filter_ratio:0.1” -H “column_separator:|” http://192.186.0.21:8030/api/tpch/orders/_stream_load
都提示了这个
Reason: null value for not null column, column=O_ORDERKEY. src line: [];
这个是由于你建表的时候,改列指定了not null的方式,但是数据中对应改列列值为null,导致数据进不去,你可以修改建表方式,去掉not null的限制
建表的时候是int,id之超出了返回