routine load消费数据和kafka中数据对不上

采集的定单表,kafka 里pay_date的有2450条,confirm_date有2408条,routine load 用主键更新模式采集到pay_date 1900多条,confirm_date只有1100多条
image
image

建表语句
CREATE TABLE ods_orders (
order_id varchar(65533) NOT NULL COMMENT “订单id”,
order_no varchar(65533) NULL COMMENT “订单号”,
started_date varchar(65533) NULL COMMENT “下单时间”,
buyerid varchar(65533) NULL COMMENT “买家id”,
supplierid varchar(65533) NULL COMMENT “卖家id”,
ship_type varchar(65533) NULL COMMENT “运输方式”,
formula_type varchar(65533) NULL COMMENT “公式类型”,
commid varchar(65533) NULL COMMENT “佣金类型”,
fromdetailinfo varchar(65533) NULL COMMENT “下单来源”,
site_id varchar(65533) NULL COMMENT “下单站点”,
leading_time decimal64(4, 1) NULL COMMENT “备货期”,
is_shipoption varchar(65533) NULL COMMENT “是否自定义运费”,
create_date varchar(65533) NULL COMMENT “创建日期”,
pay_date varchar(65533) NULL COMMENT “支付时间”,
confirm_date varchar(65533) NULL COMMENT “风控通过时间”,
delivery_date varchar(65533) NULL COMMENT “发货时间”,
order_status_id varchar(65533) NULL COMMENT “订单状态”,
update_date varchar(65533) NULL COMMENT “更新时间”,
order_progress_id varchar(65533) NULL COMMENT “订单状态过程id”,
delivery_status_id varchar(65533) NULL COMMENT “发货状态”,
feedback_status varchar(65533) NULL COMMENT “评论状态”,
processing_date varchar(65533) NULL COMMENT “发货截止日”,
receive_date varchar(65533) NULL COMMENT “确认收货时间”,
complete_date varchar(65533) NULL COMMENT “订单关闭时间”,
cancel_date varchar(65533) NULL COMMENT “订单取消时间”,
version int(11) NULL COMMENT “版本”,
payout_date varchar(65533) NULL COMMENT “放款时间”,
shipping_to_day varchar(65533) NULL COMMENT “妥投时长”,
promise_shipped_date varchar(65533) NULL COMMENT “承诺运达时间”,
syn_nc tinyint(4) NULL COMMENT “是否同步 0:未同步 1:已同步”,
show_status tinyint(4) NULL COMMENT “可见状态 1,null:可见 2 :删除”,
type varchar(65533) NULL COMMENT “”,
table varchar(65533) NULL COMMENT “”
) ENGINE=OLAP
PRIMARY KEY(order_id)
COMMENT “OLAP”
DISTRIBUTED BY HASH(order_id) BUCKETS 10
PROPERTIES (
“replication_num” = “3”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“storage_medium” = “SSD”
);

routine 语句
create routine load test.ods_orders1 on ods_orders
columns(order_id,order_no,started_date,buyerid,supplierid,ship_type,formula_type,commid,fromdetailinfo,site_id,leading_time,is_shipoption,create_date,pay_date,confirm_date,delivery_date,order_status_id,update_date,order_progress_id,delivery_status_id,feedback_status,processing_date,receive_date,complete_date,cancel_date,version,payout_date,shipping_to_day,promise_shipped_date,syn_nc,show_status,type,table) ,
WHERE table like ‘orders%’
PROPERTIES (
“format”=“json”
)
FROM KAFKA (
“kafka_broker_list”= ******
“kafka_topic” = “kf-ods-orders”,
“property.group.id” = “test.ods_orders”,
“property.kafka_default_offsets” = “OFFSET_BEGINNING”

);

kafka里数据样式

{“order_no”:“3480774280”,“pay_date”:“2022-10-28 00:21:24”,“supplierid”:“8aaa94347f2fc341017f350c295f027b”,“order_status_id”:“102001”,“started_date”:“2022-10-28 00:21:06”,“fromdetailinfo”:“mobile_p333_androidphone”,“ship_type”:“4px-postlink standard registered mail(g)”,“buyerid”:“ff808081752ba30b0175434aa88b05d2”,“type”:“update”,“version”:“2”,“update_date”:“2022-10-28 00:21:24”,“is_shipoption”:“1”,“shipping_to_day”:“28”,“syn_nc”:“0”,“order_progress_id”:“8aaaa7bb84193f3901841a3f8f0e2902”,“commid”:“commissionwv2”,“site_id”:“en”,“feedback_status”:“0”,“formula_type”:“f007_g3”,“create_date”:“2022-10-28 00:21:17”,“order_id”:“8aaa959e8419410701841a3f72d16e46”,“leading_time”:“4.0”,“table”:“orders_7b”}

以前数据还能对上,近期发现对不上了

刚才改了明细模型可以对上


主建模型建表有问题吗

show routine load看下主键表和明细表消费的行数一样多不

24014199 明细表 ods_orders2 2022/10/28 15:17 default_cluster:test ods_orders_1 RUNNING KAFKA 5 {“partitions”:"",“partial_update”:“false”,“columnToColumnExpr”:“order_id,order_no,started_date,buyerid,supplierid,ship_type,formula_type,commid,fromdetailinfo,site_id,leading_time,is_shipoption,create_date,pay_date,confirm_date,delivery_date,order_status_id,update_date,order_progress_id,delivery_status_id,feedback_status,processing_date,receive_date,complete_date,cancel_date,version,payout_date,shipping_to_day,promise_shipped_date,syn_nc,show_status,type,table”,“maxBatchIntervalS”:“10”,“whereExpr”:"",“dataFormat”:“json”,“timezone”:“Asia/Shanghai”,“format”:“json”,“json_root”:"",“strict_mode”:“false”,“jsonpaths”:"",“desireTaskConcurrentNum”:“5”,“maxErrorNum”:“0”,“strip_outer_array”:“false”,“currentTaskConcurrentNum”:“5”,“maxBatchRows”:“200000”} {“topic”:“kf-ods-orders”,“currentKafkaPartitions”:“0,1,2,3,4,5,6,7,8,9”,“brokerList”:} {“kafka_default_offsets”:“OFFSET_BEGINNING”,“group.id”:“test.ods_orders1”} {“receivedBytes”:16943197564,“errorRows”:0,“committedTaskNum”:706,“loadedRows”:21442625,“loadRowsRate”:9000,“abortedTaskNum”:0,“totalRows”:21442625,“unselectedRows”:0,“receivedBytesRate”:7260000,“taskExecuteTimeMs”:2333610} {“0”:“12113030”,“1”:“12089861”,“2”:“11718806”,“3”:“10645973”,“4”:“10571379”,“5”:“10703362”,“6”:“10602480”,“7”:“12119226”,“8”:“12090015”,“9”:“11721077”}
24014155 主键表 ods_orders1 2022/10/28 15:10 default_cluster:test ods_orders RUNNING KAFKA 5 {“partitions”:"",“partial_update”:“false”,“columnToColumnExpr”:“order_id,order_no,started_date,buyerid,supplierid,ship_type,formula_type,commid,fromdetailinfo,site_id,leading_time,is_shipoption,create_date,pay_date,confirm_date,delivery_date,order_status_id,update_date,order_progress_id,delivery_status_id,feedback_status,processing_date,receive_date,complete_date,cancel_date,version,payout_date,shipping_to_day,promise_shipped_date,syn_nc,show_status,type,table”,“maxBatchIntervalS”:“10”,“whereExpr”:"",“dataFormat”:“json”,“timezone”:“Asia/Shanghai”,“format”:“json”,“json_root”:"",“strict_mode”:“false”,“jsonpaths”:"",“desireTaskConcurrentNum”:“5”,“maxErrorNum”:“0”,“strip_outer_array”:“false”,“currentTaskConcurrentNum”:“5”,“maxBatchRows”:“200000”} {“topic”:“kf-ods-orders”,“currentKafkaPartitions”:“0,1,2,3,4,5,6,7,8,9”,“brokerList”:""} {“kafka_default_offsets”:“OFFSET_BEGINNING”,“group.id”:“test.ods_orders”} {“receivedBytes”:16943196096,“errorRows”:0,“committedTaskNum”:916,“loadedRows”:21442623,“loadRowsRate”:7000,“abortedTaskNum”:0,“totalRows”:21442623,“unselectedRows”:0,“receivedBytesRate”:5554000,“taskExecuteTimeMs”:3050500} {“0”:“12113031”,“1”:“12089860”,“2”:“11718806”,“3”:“10645973”,“4”:“10571379”,“5”:“10703362”,“6”:“10602478”,“7”:“12119226”,“8”:“12090015”,“9”:“11721077”}

看起来是一样的…

这个看着消费是没问题的,这里的不一致应该是这样的:你的主键key是order_id,然后导入过程中同一个key的type可能发生了数据更新,导致你用type分组后数据会变少,直接用orderid去重看下数据就知道了


去重后查出来的数据是不一样的

直接count distinct orderid不要加type列就是哪个pay,con列

加pay和con只是为了区分卡的时间,我后面where 限制了时间

我用两个同样的主键表,同样的routine load,同样的sql查出来结果条数都是不一样的


这样不加时间限制跑出来是一样的


这样一限制时间,就感觉confirm_date的订单没有更新


下单数一样 ,付款的数据感觉没更新过来,或是数据乱序了?

你的主键是orderid,你加个type和时间过滤后去看countdistinct的结果对不上有什么意义呢?你再看看?

好的,我再看看,感觉数据可能是乱序了