【StarRocks版本】2.2.0
flink有两条数据:
ship_id | ins_db_tm
123 | 20220531 09:00:00
123 | 20220531 10:00:00
starrocks里有张表:
create table ods.reg_scan_dtl_lod(
ship_id int not null COMMENT ‘主单号’,
ins_db_tm datetime not null COMMENT ‘入库时间’
) PRIMARY KEY (ship_id,ins_db_tm)
COMMENT “”
PARTITION BY RANGE (ins_db_tm) ( START (“2022-03-01”) END (“2022-07-01”) EVERY (INTERVAL 1 day) )
DISTRIBUTED BY HASH(ship_id) BUCKETS 24
PROPERTIES( “replication_num” = “1”,
“colocate_with” = “int_ship_id”,
“dynamic_partition.enable” = “true”,
“dynamic_partition.time_unit” = “DAY”,
“dynamic_partition.start” = “-30”,
“dynamic_partition.end” = “10”,
“dynamic_partition.prefix” = “p”,
“dynamic_partition.buckets” = “24”
);
flink里的两条数据按ins_db_tm先后插入目标表ods.reg_scan_dtl_lod 就会两条数据都保留 ,
但是我只想保留同一维度ship_id下ins_db_tm最大的那条数据(123 20220531 10:00:00)
但是由于ins_db_tm必须出现在主键里,这样数据就无法达到去重的目的
如果主键模型表不强制要求分区键必须出现在主键中,那正好能满足我这样的需求 ,
如果不分区倒是也能实现去重,但是数据量大不分区又会影响效率。