SR-v2.5主键表数据写入丢失问题

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】对主键表dwt_user_label_member_id_middle_partition_tenant_daily同时进行delete操作和insert into操作,发现存在数据写入失败的场景。
【背景】

  • dwt_user_label_member_id_middle_partition_tenant_daily表结构,以(tenant_id, partition_version, label_code, identity_id)为主键,以partition_version为range分区。

  • 建表语句如下:
    CREATE TABLE dwt_user_label_member_id_middle_partition_tenant_daily (
    tenant_id bigint(20) NOT NULL COMMENT “租户ID”,
    partition_version int(11) NOT NULL COMMENT “版本分区”,
    label_code varchar(256) NOT NULL COMMENT “LabelCode”,
    identity_id varchar(64) NOT NULL COMMENT “身份ID”,
    label_value varchar(1024) NULL COMMENT “标签值”,
    data_update_time varchar(64) NULL COMMENT “数据更新日期”
    ) ENGINE=OLAP
    PRIMARY KEY(tenant_id, partition_version, label_code, identity_id)
    COMMENT “租户标签分区表”
    PARTITION BY RANGE(partition_version)
    (PARTITION p1 VALUES [(“1”), (“2”)),
    PARTITION p2 VALUES [(“2”), (“3”)),
    PARTITION p3 VALUES [(“3”), (“4”)))
    DISTRIBUTED BY HASH(tenant_id, identity_id) BUCKETS 200
    PROPERTIES (
    “replication_num” = “3”,
    “colocate_with” = “tenant_identity”,
    “in_memory” = “false”,
    “storage_format” = “DEFAULT”,
    “enable_persistent_index” = “true”,
    “replicated_storage” = “true”,
    “compression” = “LZ4”,
    “storage_medium” = “HDD”
    );
    SQL执行

    • 依次执行的sql为:(执行的sql是 insert into && delete,一共执行了三对)。
      timestamp, queryTime, stmt
      – 1.向2号分区写入G_200000001335817数据,该insert操作写入成功,且2号分区有数据
      2024-09-30 18:00:13,4315,“INSERT INTO rms_member.dwt_user_label_member_id_middle_partition_tenant_daily PARTITION(p2) WITH Label G_200000001335817_1840693120472338522_1727690413003 ( tenant_id, partition_version, label_code, identity_id, label_value, data_update_time ) SELECT tenant_id, ‘2’ AS partition_version, ‘G_200000001335817’ AS label_code, identity_id AS identity_id, label AS label_value, ‘2024-09-30 18:00:12’ AS data_update_time FROM ( SELECT identity_id, tenant_id, label FROM ( SELECT U.identityId AS identity_id, U.tenantId AS tenant_id, ‘1’ AS label, 0 AS label_index FROM ( SELECT u.identityId, u.tenantId FROM ( SELECT u.identity AS identityId, u.tenant_id AS tenantId, u.uattr1004, u.uattr32 FROM ( SELECT identity AS identity, u.tenant_id AS tenant_id, item1.wx_app_id AS uattr32, item1.wx_template_10000 AS uattr1004 FROM rms_member.dwt_member_profile_ss_daily AS u LEFT JOIN rms_member.dim_member_item_ss_daily AS item1 ON u.tenant_id = item1.tenant_id AND u.one_id = item1.identity_id AND item1.item_type = ‘wx_app_quota’ AND ( (item1.tenant_id IN (‘11750013’)) AND (item1.id_type = ‘@member_id’) ) WHERE ( (u.tenant_id IN (‘11750013’)) AND (u.id_type = ‘@member_id’) ) ) AS u WHERE 1 = 1 ) AS u WHERE 1 = 1 AND ( (u.uattr32 = ‘wx29850215cb395a68’) AND ((CAST(u.uattr1004 AS decimal(20, 6)) > 0)) ) GROUP BY u.tenantId, u.identityId ) AS U ) AS U ) AS U”
      – 2.删除表中之前可能存在的G_200000001335817数据,表中之前并没有G_200000001335817的数据。
      2024-09-30 18:00:17,4812,delete from rms_member.dwt_user_label_member_id_middle_partition_tenant_daily where partition_version in (2) and label_code = ‘G_200000001335817’ and tenant_id = 11750013 and data_update_time < ‘2024-09-30 18:00:12’
      – 3.向3号分区写入G_200000001335817数据,该insert操作成功,且3号分区有数据。
      2024-09-30 18:03:53,3896,“INSERT INTO rms_member.dwt_user_label_member_id_middle_partition_tenant_daily PARTITION(p3) WITH Label G_200000001335817_1840694046666891298_1727690633825 ( tenant_id, partition_version, label_code, identity_id, label_value, data_update_time ) SELECT tenant_id, ‘3’ AS partition_version, ‘G_200000001335817’ AS label_code, identity_id AS identity_id, label AS label_value, ‘2024-09-30 18:03:53’ AS data_update_time FROM ( SELECT identity_id, tenant_id, label FROM ( SELECT U.identityId AS identity_id, U.tenantId AS tenant_id, ‘1’ AS label, 0 AS label_index FROM ( SELECT u.identityId, u.tenantId FROM ( SELECT u.identity AS identityId, u.tenant_id AS tenantId, u.uattr1004, u.uattr32 FROM ( SELECT identity AS identity, u.tenant_id AS tenant_id, item1.wx_app_id AS uattr32, item1.wx_template_10000 AS uattr1004 FROM rms_member.dwt_member_profile_ss_daily AS u LEFT JOIN rms_member.dim_member_item_ss_daily AS item1 ON u.tenant_id = item1.tenant_id AND u.one_id = item1.identity_id AND item1.item_type = ‘wx_app_quota’ AND ( (item1.tenant_id IN (‘11750013’)) AND (item1.id_type = ‘@member_id’) ) WHERE ( (u.tenant_id IN (‘11750013’)) AND (u.id_type = ‘@member_id’) ) ) AS u WHERE 1 = 1 ) AS u WHERE 1 = 1 AND ( (u.uattr32 = ‘wx29850215cb395a68’) AND ((CAST(u.uattr1004 AS decimal(20, 6)) > 0)) ) GROUP BY u.tenantId, u.identityId ) AS U ) AS U ) AS U”
      – 4.删除第一步往2号分区写入的数据,delete操作执行了10s。
      2024-09-30 18:03:57,10748,“delete from rms_member.dwt_user_label_member_id_middle_partition_tenant_daily where partition_version in (2,3) and label_code = ‘G_200000001335817’ and tenant_id = 11750013 and data_update_time < ‘2024-09-30 18:03:53’”
      – 5.在删除2号分区语句执行过程中,往2号分区写入数据,(写入的数据和删除的数据主键一致),且该写入操作和上一步的delete操作结束时间非常接近。这一步数据写入失败,2号分区数据丢失。
      – 第4步删除的数据应该是data_update_time < '2024-09-30 18:03:53’的条件,但是当前写入数据并不满足这个条件,却还是没写入成功。
      2024-09-30 18:04:02,6066,“INSERT INTO rms_member.dwt_user_label_member_id_middle_partition_tenant_daily PARTITION(p2) WITH Label G_200000001335817_1840694080435241023_1727690641877 ( tenant_id, partition_version, label_code, identity_id, label_value, data_update_time ) SELECT tenant_id, ‘2’ AS partition_version, ‘G_200000001335817’ AS label_code, identity_id AS identity_id, label AS label_value, ‘2024-09-30 18:04:01’ AS data_update_time FROM ( SELECT identity_id, tenant_id, label FROM ( SELECT U.identityId AS identity_id, U.tenantId AS tenant_id, ‘1’ AS label, 0 AS label_index FROM ( SELECT u.identityId, u.tenantId FROM ( SELECT u.identity AS identityId, u.tenant_id AS tenantId, u.uattr1004, u.uattr32 FROM ( SELECT identity AS identity, u.tenant_id AS tenant_id, item1.wx_app_id AS uattr32, item1.wx_template_10000 AS uattr1004 FROM rms_member.dwt_member_profile_ss_daily AS u LEFT JOIN rms_member.dim_member_item_ss_daily AS item1 ON u.tenant_id = item1.tenant_id AND u.one_id = item1.identity_id AND item1.item_type = ‘wx_app_quota’ AND ( (item1.tenant_id IN (‘11750013’)) AND (item1.id_type = ‘@member_id’) ) WHERE ( (u.tenant_id IN (‘11750013’)) AND (u.id_type = ‘@member_id’) ) ) AS u WHERE 1 = 1 ) AS u WHERE 1 = 1 AND ( (u.uattr32 = ‘wx29850215cb395a68’) AND ((CAST(u.uattr1004 AS decimal(20, 6)) > 0)) ) GROUP BY u.tenantId, u.identityId ) AS U ) AS U ) AS U”
      – 6.删除第3步往3号分区写入的数据,数据删除成功。
      2024-09-30 18:04:09,8962,“delete from rms_member.dwt_user_label_member_id_middle_partition_tenant_daily where partition_version in (3,2) and label_code = ‘G_200000001335817’ and tenant_id = 11750013 and data_update_time < ‘2024-09-30 18:04:01’”
      – 7.表中没有任何G_200000001335817的数据。
  • 其中insert into 后面跟的select语句都一致,且查询结果都有数据。仅仅是partition_version和update_time字段不一致。

  • 初始dwt_user_label_member_id_middle_partition_tenant_daily中不存在label_code = 'G_200000001335817’的任何数据,但是每个分区存在很多其他数据。

猜测
对于SR主键表,执行数据删除的时候,同时插入和删除数据主键一样的数据,
就算插入的数据并不满足删除的条件,该插入操作也会写入不成功。
【业务影响】数据写入失败,导致未圈选到人
【是否存算分离】否
【StarRocks版本】例如:2.5
【集群规模】例如:3fe(1 follower+2observer)+8be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群4-小李或者邮箱,谢谢
【附件】
过了国庆,当时日志已经缺失。

涉及相同主键的场景下确实有可能 主键表是通过合并去重的方式处理数据,在并发执行时,可能会导致插入操作在删除操作完成前被阻塞或拒绝,从而无法成功写入 建议可以尽量将删除和插入操作分成不同的事务或批次执行,以免同一主键记录在短时间内被重复处理

1赞

你好,麻烦问一下,如果有这种操作,时间间隔大概多久会比较好