Bitmap 最佳实践和使用手册 4 - 建模

Bitmap 相关的问题可以加 wx: lxhhust350@qq.com 细聊

数据建模

数据建模需要考虑这几个问题:

  1. 明细数据每天是需要保存最新一条数据,还是全量数据

  2. Bitmap表是否需要历史上的快照

  3. Bitmap表的单行Bitmap是否需要更新

  4. UID, create_time, tag_id 在Key里的顺序

9.1 [明细表] 宽表

Duplicate/Primary/Unique

# 建表
create table t1(
    uid            BIGINT,
    create_time    DATETIME,
    tag_id_1       STRING,
    tag_id_2       BIGINT
)
PRIMARY KEY(uid, create_time)
PARTITION BY RANGE(create_time)()
DISTRIBUTED BY HASH(uid)
PROPERTIES(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-10",
    "dynamic_partition.end" = "10",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.history_partition_num" = "0"
);
# 写入测试数据
insert into t1 values (1, "2024-02-22 00:00:00", "beijing", 11);
insert into t1 values (2, "2024-02-22 00:00:00", "shanghai", 22);
# 查询
mysql> select * from t1;
+------+---------------------+----------+----------+
| uid  | create_time         | tag_id_1 | tag_id_2 |
+------+---------------------+----------+----------+
|    2 | 2024-02-22 00:00:00 | shanghai |       22 |
|    1 | 2024-02-22 00:00:00 | beijing  |       11 |
+------+---------------------+----------+----------+

优点:

  • 可以单独设置类型,根据 Value 过滤,因为整数有索引,过滤效果好。

  • 用户使用灵活

缺点:

  • 列多,导入性能不行

  • SchemaChange成本高(支持LightSchemaChange后,这个问题应该可以缓解)

  • 列级别的BE元数据占用大量内存。

9.2 [明细表] 窄表

9.2.1 tag_id 作为 Key 列的第一列

Duplicate/Primary/Unique

# 建表
create table t1(
    tag_id         STRING,
    uid            BIGINT,
    create_time    DATETIME,
    tag_value      STRING
)
PRIMARY KEY(tag_id, uid, create_time)
PARTITION BY RANGE(create_time)()
DISTRIBUTED BY HASH(uid)
PROPERTIES(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-10",
    "dynamic_partition.end" = "10",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.history_partition_num" = "0"
);

# 写入测试数据
insert into t1 values ("tag_1", 1, "2024-02-22 00:00:00", "beijing");
insert into t1 values ("tag_2", 1, "2024-02-22 00:00:00", "11");
insert into t1 values ("tag_1", 2, "2024-02-22 00:00:00", "shanghai");
insert into t1 values ("tag_2", 2, "2024-02-22 00:00:00", "22");

# 查询
mysql> select * from t1;
+--------+------+---------------------+-----------+
| tag_id | uid  | create_time         | tag_value |
+--------+------+---------------------+-----------+
| tag_1  |    2 | 2024-02-22 00:00:00 | shanghai  |
| tag_2  |    2 | 2024-02-22 00:00:00 | 22        |
| tag_1  |    1 | 2024-02-22 00:00:00 | beijing   |
| tag_2  |    1 | 2024-02-22 00:00:00 | 11        |
+--------+------+---------------------+-----------+

优点:

  • 不需要 SchemaChange

  • 导入性能好

  • 基于 tag_id 查或是基于 tag_id 生成 bitmap 性能好

缺点:

  • Value列都是String,过滤效果不行

  • 使用不灵活

9.2.2 uid 作为 Key 列的第一列

Duplicate/Primary/Unique

# 建表
create table t1(
    uid            BIGINT,
    tag_id         STRING,
    create_time    DATETIME,
    tag_value      STRING
)
PRIMARY KEY(uid, tag_id, create_time)
PARTITION BY RANGE(create_time)()
DISTRIBUTED BY HASH(uid)
PROPERTIES(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-10",
    "dynamic_partition.end" = "10",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.history_partition_num" = "0"
);

# 写入测试数据
insert into t1 values (1, "tag_1", "2024-02-22 00:00:00", "beijing");
insert into t1 values (1, "tag_2", "2024-02-22 00:00:00", "11");
insert into t1 values (2, "tag_1", "2024-02-22 00:00:00", "shanghai");
insert into t1 values (2, "tag_2", "2024-02-22 00:00:00", "22");

# 查询
mysql> select * from t1;
+------+--------+---------------------+-----------+
| uid  | tag_id | create_time         | tag_value |
+------+--------+---------------------+-----------+
|    2 | tag_1  | 2024-02-22 00:00:00 | shanghai  |
|    2 | tag_2  | 2024-02-22 00:00:00 | 22        |
|    1 | tag_1  | 2024-02-22 00:00:00 | beijing   |
|    1 | tag_2  | 2024-02-22 00:00:00 | 11        |
+------+--------+---------------------+-----------+

优点:

  • 不需要 SchemaChange

  • 导入性能好

  • 基于 uid 点查性能好

缺点:

  • Value列都是String,过滤效果不行

  • 使用不灵活

9.3 蛇行表

假设用户有 4 个 tag: 地域/年龄/身高分布/薪水

# 建表
create table t1(
    round          BIGINT,
    uid            BIGINT,
    create_time    DATETIME,
    tag_value_1    STRING,
    tag_value_2    BIGINT
)
PRIMARY KEY(round, uid, create_time)
PARTITION BY RANGE(create_time)()
DISTRIBUTED BY HASH(uid)
PROPERTIES(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-10",
    "dynamic_partition.end" = "10",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.history_partition_num" = "0"
);

# 写入测试数据
insert into t1 values (0, 1, "2024-02-22 00:00:00", "beijing", 11);
insert into t1 values (1, 1, "2024-02-22 00:00:00", "high", 30000);

# 根据地域查询
mysql> select * from t1 where round=0 and tag_value_1="beijing";
+-------+------+---------------------+-------------+-------------+
| round | uid  | create_time         | tag_value_1 | tag_value_2 |
+-------+------+---------------------+-------------+-------------+
|     0 |    1 | 2024-02-22 00:00:00 | beijing     |          11 |
+-------+------+---------------------+-------------+-------------+

# 根据薪水查询
mysql> select * from t1 where round=1 and tag_value_2="30000";
+-------+------+---------------------+-------------+-------------+
| round | uid  | create_time         | tag_value_1 | tag_value_2 |
+-------+------+---------------------+-------------+-------------+
|     1 |    1 | 2024-02-22 00:00:00 | high        |       30000 |
+-------+------+---------------------+-------------+-------------+

列数固定,根据 tag_id % 列数 计算出round,写入。查询的时候,也是根据这个规则来算。

优点:

  • 不需要SchemaChange

  • 导入性能好

  • 映射关系做好后,Value字段也可以使用不同的数据类型,从而实现查询加速

缺点:

  • 用户需要维护这个映射关系

  • 查询的时候,也需要知道这个映射关系

9.4 [Bitmap表] 聚合模型

聚合表中,Bitmap 类型支持三种语义:

  • BITMAP_UNION: 同一个 key 对应的两个 bitmap 会 Union

  • REPLACE: 同一个 key 对应的 Bitmap 会覆盖前一个

  • REPLACE_IF_NOT_NULL: 只有当 bitmap 不是 null 时,才会覆盖相同 key 对应的 bitmap, 常用于部分列更新

9.4.1 需要保留历史 bitmap 快照的场景

  • Compaction 成本高 (会消耗大量内存和CPU)

  • 查询性能差 (会消耗大量内存和CPU)

  • Delete后对性能影响比较大 (会消耗大量内存和CPU)

  • 支持Union语义

  • 如果用户需要 replace 语义的话,需要 Delete 后,再写入

# 建表
CREATE TABLE t1(
    tag_id         BIGINT,
    tag_value      STRING,
    refresh_time   DATETIME,
    value          BITMAP BITMAP_UNION
)
AGGREGATE KEY(tag_id, tag_value, refresh_time)
PARTITION BY RANGE(refresh_time) (
    PARTITION p20200321 VALUES LESS THAN ("2024-02-20 00:00:00"),
    PARTITION p20200322 VALUES LESS THAN ("2024-02-21 00:00:00"),
    PARTITION p20200323 VALUES LESS THAN ("2024-02-22 00:00:00")
)
DISTRIBUTED BY HASH(tag_id)
PROPERTIES(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-10",
    "dynamic_partition.end" = "10",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.history_partition_num" = "0"
);

# 写入测试数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 1,11,111,1111           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# 覆盖更新 (这里的 bitmap 是 union 语义,所以需要删除后再insert)
mysql> delete from t1 where tag_id=1 and tag_value="beijing" and refresh_time="2024-02-21 00:00:00";

mysql> insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("9,99,999,9999");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 9,99,999,9999           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# Union 更新
mysql> insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("44444,444444");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value)    |
+--------+-----------+---------------------+----------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 9,99,999,9999              |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333              |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222              |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555              |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444,44444,444444 |
+--------+-----------+---------------------+----------------------------+

9.4.2 不需要保留历史 bitmap 快照的场景

  • Compaction 成本高 (会消耗大量内存和CPU)

  • 查询性能差 (会消耗大量内存和CPU)

  • Delete后对性能影响比较大 (会消耗大量内存和CPU)

  • 支持Union语义

  • 如果用户需要 replace 语义的话,需要 Delete 后,再写入

# 建表
CREATE TABLE t1(
    tag_id         BIGINT,
    tag_value      STRING,
    refresh_time   DATETIME REPLACE,
    value          BITMAP BITMAP_UNION
)
AGGREGATE KEY(tag_id, tag_value)
DISTRIBUTED BY HASH(tag_id);

# 写入数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 1,11,111,1111           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# 覆盖更新 (这里的 bitmap 是 union 语义,所以需要删除后再insert)
mysql> delete from t1 where tag_id=1 and tag_value="beijing";

mysql> insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("9,99,999,9999");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 9,99,999,9999           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# Union 更新
mysql> insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("44444,444444");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value)    |
+--------+-----------+---------------------+----------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 9,99,999,9999              |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333              |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222              |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555              |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444,44444,444444 |
+--------+-----------+---------------------+----------------------------+

9.4.3 不需要保留历史 bitmap 快照的场景 [REPLACE]

  • Compaction 成本高 (会消耗大量内存和CPU)

  • 查询性能差 (会消耗大量内存和CPU)

  • Delete后对性能影响比较大 (会消耗大量内存和CPU)

  • 支持 Replace 语义

  • 如果用户需要 Union 语义的话,需要读出来,构造好,再写入

# 建表
CREATE TABLE t1(
    tag_id         BIGINT,
    tag_value      STRING,
    refresh_time   DATETIME REPLACE,
    value          BITMAP REPLACE
)
AGGREGATE KEY(tag_id, tag_value)
DISTRIBUTED BY HASH(tag_id);

# 写入数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 1,11,111,1111           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# 覆盖更新
mysql> insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("9,99,999,9999");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 9,99,999,9999           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# Union 更新 (读出来,构造好再写入)
mysql> insert into t1 select tag_id, tag_value, "2024-03-01", bitmap_or(value, bitmap_from_string("11111,111111")) from t1 where tag_id=1 and tag_value="beijing";

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value)    |
+--------+-----------+---------------------+----------------------------+
|      1 | beijing   | 2024-03-01 00:00:00 | 9,99,999,9999,11111,111111 |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333              |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222              |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555              |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444              |
+--------+-----------+---------------------+----------------------------+

9.4.4 需要保留历史 bitmap 快照的场景 (通过 List 分区优化更新或删除带来的性能问题)

  • Compaction 成本低

  • 查询性能好

  • Delete后对性能影响小

  • 支持Union语义

  • 如果用户需要 replace 语义的话,需要 Truncate 后,再写入

  • 导入数据前需要提前创建分区

# 建表
CREATE TABLE t1(
    tag_id         BIGINT NOT NULL,
    tag_value      STRING NOT NULL,
    refresh_time   DATETIME REPLACE,
    value          BITMAP BITMAP_UNION
)
AGGREGATE KEY(tag_id, tag_value)
PARTITION BY (tag_id, tag_value)
DISTRIBUTED BY HASH(tag_id);

# 写入数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 1,11,111,1111           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# 覆盖更新 (这里的 bitmap 是 union 语义,所以需要truncate后再insert)
mysql> truncate table t1 partition(p1_beijing);

mysql> insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("9,99,999,9999");

# 查询
mysql>  select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 9,99,999,9999           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# Union 更新
mysql> insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("44444,444444");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value)    |
+--------+-----------+---------------------+----------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 9,99,999,9999              |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333              |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222              |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555              |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444,44444,444444 |
+--------+-----------+---------------------+----------------------------+

9.5 [推荐] [Bitmap表] Primary Key 模型

  • 不需要 Delete

  • 查询性能好

  • Compaction 成本低

  • 不支持Union语义,需要用Union语义的话,需要读出来更新后再覆盖写入.

9.5.1 需要保留历史 Bitmap 快照的场景

# 建表
CREATE TABLE t1(
    tag_id         BIGINT,
    tag_value      STRING,
    refresh_time   DATETIME,
    value          BITMAP
)
PRIMARY KEY(tag_id, tag_value, refresh_time)
PARTITION BY RANGE(refresh_time) (
    PARTITION p20200321 VALUES LESS THAN ("2024-02-20 00:00:00"),
    PARTITION p20200322 VALUES LESS THAN ("2024-02-21 00:00:00"),
    PARTITION p20200323 VALUES LESS THAN ("2024-02-22 00:00:00")
)
DISTRIBUTED BY HASH(tag_id)
PROPERTIES(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-10",
    "dynamic_partition.end" = "10",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.history_partition_num" = "0"
);

# 写入测试数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 1,11,111,1111           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# 覆盖更新
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("6,66,666,6666");

# 查询
mysql> select tag_id, tag_value, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+-------------------------+
| tag_id | tag_value | bitmap_to_string(value) |
+--------+-----------+-------------------------+
|      1 | beijing   | 1,11,111,1111           |
|      1 | guangzhou | 3,33,333,3333           |
|      1 | shanghai  | 2,22,222,2222           |
|      2 | female    | 6,66,666,6666           |
|      2 | male      | 4,44,444,4444           |
+--------+-----------+-------------------------+

# union 后更新 (primary 无法支持 bitmap_union 语义,所以需要读出来,更新完再覆盖写入)
insert into t1 select tag_id, tag_value, refresh_time, bitmap_or(value, bitmap_from_string("11111,111111")) from t1 where tag_id=1 and tag_value="beijing" and refresh_time="2024-02-21 00:00:00";

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value)    |
+--------+-----------+---------------------+----------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 1,11,111,1111,11111,111111 |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333              |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222              |
|      2 | female    | 2024-02-21 00:00:00 | 6,66,666,6666              |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444              |
+--------+-----------+---------------------+----------------------------+

9.5.2 不需要保留历史 Bitmap 快照的场景

# 建表
CREATE TABLE t1(
    tag_id         BIGINT,
    tag_value      STRING,
    refresh_time   DATETIME,
    value          BITMAP
)
PRIMARY KEY(tag_id, tag_value)
DISTRIBUTED BY HASH(tag_id);

# 写入测试数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 1,11,111,1111           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-02-21 00:00:00 | 5,55,555,5555           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# 覆盖更新
insert into t1 select 2, "female", "2024-03-01 00:00:00", bitmap_from_string("6,66,666,6666");

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
|      1 | beijing   | 2024-02-21 00:00:00 | 1,11,111,1111           |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333           |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222           |
|      2 | female    | 2024-03-01 00:00:00 | 6,66,666,6666           |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444           |
+--------+-----------+---------------------+-------------------------+

# union 更新 (primary 无法支持 bitmap_union 语义,所以需要读出来,更新完再覆盖写入)
insert into t1 select tag_id, tag_value, "2024-03-01", bitmap_or(value, bitmap_from_string("11111,111111")) from t1 where tag_id=1 and tag_value="beijing";

# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time        | bitmap_to_string(value)    |
+--------+-----------+---------------------+----------------------------+
|      1 | beijing   | 2024-03-01 00:00:00 | 1,11,111,1111,11111,111111 |
|      1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333              |
|      1 | shanghai  | 2024-02-21 00:00:00 | 2,22,222,2222              |
|      2 | female    | 2024-03-01 00:00:00 | 6,66,666,6666              |
|      2 | male      | 2024-02-21 00:00:00 | 4,44,444,4444              |
+--------+-----------+---------------------+----------------------------+

9.6 正交建模

//TODO

2赞

都是纯干货,等待继续更新的更多干货… :smiley_cat: :smiley_cat: :smiley_cat: :smiley_cat:

感谢大佬分享,这不收藏一个可还行 :sunglasses: