Bitmap 相关的问题可以加 wx: lxhhust350@qq.com 细聊
数据建模
数据建模需要考虑这几个问题:
-
明细数据每天是需要保存最新一条数据,还是全量数据
-
Bitmap表是否需要历史上的快照
-
Bitmap表的单行Bitmap是否需要更新
-
UID, create_time, tag_id 在Key里的顺序
9.1 [明细表] 宽表
Duplicate/Primary/Unique
# 建表
create table t1(
uid BIGINT,
create_time DATETIME,
tag_id_1 STRING,
tag_id_2 BIGINT
)
PRIMARY KEY(uid, create_time)
PARTITION BY RANGE(create_time)()
DISTRIBUTED BY HASH(uid)
PROPERTIES(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-10",
"dynamic_partition.end" = "10",
"dynamic_partition.prefix" = "p",
"dynamic_partition.history_partition_num" = "0"
);
# 写入测试数据
insert into t1 values (1, "2024-02-22 00:00:00", "beijing", 11);
insert into t1 values (2, "2024-02-22 00:00:00", "shanghai", 22);
# 查询
mysql> select * from t1;
+------+---------------------+----------+----------+
| uid | create_time | tag_id_1 | tag_id_2 |
+------+---------------------+----------+----------+
| 2 | 2024-02-22 00:00:00 | shanghai | 22 |
| 1 | 2024-02-22 00:00:00 | beijing | 11 |
+------+---------------------+----------+----------+
优点:
-
可以单独设置类型,根据 Value 过滤,因为整数有索引,过滤效果好。
-
用户使用灵活
缺点:
-
列多,导入性能不行
-
SchemaChange成本高(支持LightSchemaChange后,这个问题应该可以缓解)
-
列级别的BE元数据占用大量内存。
9.2 [明细表] 窄表
9.2.1 tag_id 作为 Key 列的第一列
Duplicate/Primary/Unique
# 建表
create table t1(
tag_id STRING,
uid BIGINT,
create_time DATETIME,
tag_value STRING
)
PRIMARY KEY(tag_id, uid, create_time)
PARTITION BY RANGE(create_time)()
DISTRIBUTED BY HASH(uid)
PROPERTIES(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-10",
"dynamic_partition.end" = "10",
"dynamic_partition.prefix" = "p",
"dynamic_partition.history_partition_num" = "0"
);
# 写入测试数据
insert into t1 values ("tag_1", 1, "2024-02-22 00:00:00", "beijing");
insert into t1 values ("tag_2", 1, "2024-02-22 00:00:00", "11");
insert into t1 values ("tag_1", 2, "2024-02-22 00:00:00", "shanghai");
insert into t1 values ("tag_2", 2, "2024-02-22 00:00:00", "22");
# 查询
mysql> select * from t1;
+--------+------+---------------------+-----------+
| tag_id | uid | create_time | tag_value |
+--------+------+---------------------+-----------+
| tag_1 | 2 | 2024-02-22 00:00:00 | shanghai |
| tag_2 | 2 | 2024-02-22 00:00:00 | 22 |
| tag_1 | 1 | 2024-02-22 00:00:00 | beijing |
| tag_2 | 1 | 2024-02-22 00:00:00 | 11 |
+--------+------+---------------------+-----------+
优点:
-
不需要 SchemaChange
-
导入性能好
-
基于 tag_id 查或是基于 tag_id 生成 bitmap 性能好
缺点:
-
Value列都是String,过滤效果不行
-
使用不灵活
9.2.2 uid 作为 Key 列的第一列
Duplicate/Primary/Unique
# 建表
create table t1(
uid BIGINT,
tag_id STRING,
create_time DATETIME,
tag_value STRING
)
PRIMARY KEY(uid, tag_id, create_time)
PARTITION BY RANGE(create_time)()
DISTRIBUTED BY HASH(uid)
PROPERTIES(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-10",
"dynamic_partition.end" = "10",
"dynamic_partition.prefix" = "p",
"dynamic_partition.history_partition_num" = "0"
);
# 写入测试数据
insert into t1 values (1, "tag_1", "2024-02-22 00:00:00", "beijing");
insert into t1 values (1, "tag_2", "2024-02-22 00:00:00", "11");
insert into t1 values (2, "tag_1", "2024-02-22 00:00:00", "shanghai");
insert into t1 values (2, "tag_2", "2024-02-22 00:00:00", "22");
# 查询
mysql> select * from t1;
+------+--------+---------------------+-----------+
| uid | tag_id | create_time | tag_value |
+------+--------+---------------------+-----------+
| 2 | tag_1 | 2024-02-22 00:00:00 | shanghai |
| 2 | tag_2 | 2024-02-22 00:00:00 | 22 |
| 1 | tag_1 | 2024-02-22 00:00:00 | beijing |
| 1 | tag_2 | 2024-02-22 00:00:00 | 11 |
+------+--------+---------------------+-----------+
优点:
-
不需要 SchemaChange
-
导入性能好
-
基于 uid 点查性能好
缺点:
-
Value列都是String,过滤效果不行
-
使用不灵活
9.3 蛇行表
假设用户有 4 个 tag: 地域/年龄/身高分布/薪水
# 建表
create table t1(
round BIGINT,
uid BIGINT,
create_time DATETIME,
tag_value_1 STRING,
tag_value_2 BIGINT
)
PRIMARY KEY(round, uid, create_time)
PARTITION BY RANGE(create_time)()
DISTRIBUTED BY HASH(uid)
PROPERTIES(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-10",
"dynamic_partition.end" = "10",
"dynamic_partition.prefix" = "p",
"dynamic_partition.history_partition_num" = "0"
);
# 写入测试数据
insert into t1 values (0, 1, "2024-02-22 00:00:00", "beijing", 11);
insert into t1 values (1, 1, "2024-02-22 00:00:00", "high", 30000);
# 根据地域查询
mysql> select * from t1 where round=0 and tag_value_1="beijing";
+-------+------+---------------------+-------------+-------------+
| round | uid | create_time | tag_value_1 | tag_value_2 |
+-------+------+---------------------+-------------+-------------+
| 0 | 1 | 2024-02-22 00:00:00 | beijing | 11 |
+-------+------+---------------------+-------------+-------------+
# 根据薪水查询
mysql> select * from t1 where round=1 and tag_value_2="30000";
+-------+------+---------------------+-------------+-------------+
| round | uid | create_time | tag_value_1 | tag_value_2 |
+-------+------+---------------------+-------------+-------------+
| 1 | 1 | 2024-02-22 00:00:00 | high | 30000 |
+-------+------+---------------------+-------------+-------------+
列数固定,根据 tag_id % 列数 计算出round,写入。查询的时候,也是根据这个规则来算。
优点:
-
不需要SchemaChange
-
导入性能好
-
映射关系做好后,Value字段也可以使用不同的数据类型,从而实现查询加速
缺点:
-
用户需要维护这个映射关系
-
查询的时候,也需要知道这个映射关系
9.4 [Bitmap表] 聚合模型
聚合表中,Bitmap 类型支持三种语义:
-
BITMAP_UNION: 同一个 key 对应的两个 bitmap 会 Union
-
REPLACE: 同一个 key 对应的 Bitmap 会覆盖前一个
-
REPLACE_IF_NOT_NULL: 只有当 bitmap 不是 null 时,才会覆盖相同 key 对应的 bitmap, 常用于部分列更新
9.4.1 需要保留历史 bitmap 快照的场景
-
Compaction 成本高 (会消耗大量内存和CPU)
-
查询性能差 (会消耗大量内存和CPU)
-
Delete后对性能影响比较大 (会消耗大量内存和CPU)
-
支持Union语义
-
如果用户需要 replace 语义的话,需要 Delete 后,再写入
# 建表
CREATE TABLE t1(
tag_id BIGINT,
tag_value STRING,
refresh_time DATETIME,
value BITMAP BITMAP_UNION
)
AGGREGATE KEY(tag_id, tag_value, refresh_time)
PARTITION BY RANGE(refresh_time) (
PARTITION p20200321 VALUES LESS THAN ("2024-02-20 00:00:00"),
PARTITION p20200322 VALUES LESS THAN ("2024-02-21 00:00:00"),
PARTITION p20200323 VALUES LESS THAN ("2024-02-22 00:00:00")
)
DISTRIBUTED BY HASH(tag_id)
PROPERTIES(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-10",
"dynamic_partition.end" = "10",
"dynamic_partition.prefix" = "p",
"dynamic_partition.history_partition_num" = "0"
);
# 写入测试数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 1,11,111,1111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# 覆盖更新 (这里的 bitmap 是 union 语义,所以需要删除后再insert)
mysql> delete from t1 where tag_id=1 and tag_value="beijing" and refresh_time="2024-02-21 00:00:00";
mysql> insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("9,99,999,9999");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 9,99,999,9999 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# Union 更新
mysql> insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("44444,444444");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+----------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 9,99,999,9999 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444,44444,444444 |
+--------+-----------+---------------------+----------------------------+
9.4.2 不需要保留历史 bitmap 快照的场景
-
Compaction 成本高 (会消耗大量内存和CPU)
-
查询性能差 (会消耗大量内存和CPU)
-
Delete后对性能影响比较大 (会消耗大量内存和CPU)
-
支持Union语义
-
如果用户需要 replace 语义的话,需要 Delete 后,再写入
# 建表
CREATE TABLE t1(
tag_id BIGINT,
tag_value STRING,
refresh_time DATETIME REPLACE,
value BITMAP BITMAP_UNION
)
AGGREGATE KEY(tag_id, tag_value)
DISTRIBUTED BY HASH(tag_id);
# 写入数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 1,11,111,1111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# 覆盖更新 (这里的 bitmap 是 union 语义,所以需要删除后再insert)
mysql> delete from t1 where tag_id=1 and tag_value="beijing";
mysql> insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("9,99,999,9999");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 9,99,999,9999 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# Union 更新
mysql> insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("44444,444444");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+----------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 9,99,999,9999 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444,44444,444444 |
+--------+-----------+---------------------+----------------------------+
9.4.3 不需要保留历史 bitmap 快照的场景 [REPLACE]
-
Compaction 成本高 (会消耗大量内存和CPU)
-
查询性能差 (会消耗大量内存和CPU)
-
Delete后对性能影响比较大 (会消耗大量内存和CPU)
-
支持 Replace 语义
-
如果用户需要 Union 语义的话,需要读出来,构造好,再写入
# 建表
CREATE TABLE t1(
tag_id BIGINT,
tag_value STRING,
refresh_time DATETIME REPLACE,
value BITMAP REPLACE
)
AGGREGATE KEY(tag_id, tag_value)
DISTRIBUTED BY HASH(tag_id);
# 写入数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 1,11,111,1111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# 覆盖更新
mysql> insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("9,99,999,9999");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 9,99,999,9999 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# Union 更新 (读出来,构造好再写入)
mysql> insert into t1 select tag_id, tag_value, "2024-03-01", bitmap_or(value, bitmap_from_string("11111,111111")) from t1 where tag_id=1 and tag_value="beijing";
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+----------------------------+
| 1 | beijing | 2024-03-01 00:00:00 | 9,99,999,9999,11111,111111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+----------------------------+
9.4.4 需要保留历史 bitmap 快照的场景 (通过 List 分区优化更新或删除带来的性能问题)
-
Compaction 成本低
-
查询性能好
-
Delete后对性能影响小
-
支持Union语义
-
如果用户需要 replace 语义的话,需要 Truncate 后,再写入
-
导入数据前需要提前创建分区
# 建表
CREATE TABLE t1(
tag_id BIGINT NOT NULL,
tag_value STRING NOT NULL,
refresh_time DATETIME REPLACE,
value BITMAP BITMAP_UNION
)
AGGREGATE KEY(tag_id, tag_value)
PARTITION BY (tag_id, tag_value)
DISTRIBUTED BY HASH(tag_id);
# 写入数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 1,11,111,1111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# 覆盖更新 (这里的 bitmap 是 union 语义,所以需要truncate后再insert)
mysql> truncate table t1 partition(p1_beijing);
mysql> insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("9,99,999,9999");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 9,99,999,9999 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# Union 更新
mysql> insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("44444,444444");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+----------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 9,99,999,9999 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444,44444,444444 |
+--------+-----------+---------------------+----------------------------+
9.5 [推荐] [Bitmap表] Primary Key 模型
-
不需要 Delete
-
查询性能好
-
Compaction 成本低
-
不支持Union语义,需要用Union语义的话,需要读出来更新后再覆盖写入.
9.5.1 需要保留历史 Bitmap 快照的场景
# 建表
CREATE TABLE t1(
tag_id BIGINT,
tag_value STRING,
refresh_time DATETIME,
value BITMAP
)
PRIMARY KEY(tag_id, tag_value, refresh_time)
PARTITION BY RANGE(refresh_time) (
PARTITION p20200321 VALUES LESS THAN ("2024-02-20 00:00:00"),
PARTITION p20200322 VALUES LESS THAN ("2024-02-21 00:00:00"),
PARTITION p20200323 VALUES LESS THAN ("2024-02-22 00:00:00")
)
DISTRIBUTED BY HASH(tag_id)
PROPERTIES(
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-10",
"dynamic_partition.end" = "10",
"dynamic_partition.prefix" = "p",
"dynamic_partition.history_partition_num" = "0"
);
# 写入测试数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 1,11,111,1111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# 覆盖更新
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("6,66,666,6666");
# 查询
mysql> select tag_id, tag_value, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+-------------------------+
| tag_id | tag_value | bitmap_to_string(value) |
+--------+-----------+-------------------------+
| 1 | beijing | 1,11,111,1111 |
| 1 | guangzhou | 3,33,333,3333 |
| 1 | shanghai | 2,22,222,2222 |
| 2 | female | 6,66,666,6666 |
| 2 | male | 4,44,444,4444 |
+--------+-----------+-------------------------+
# union 后更新 (primary 无法支持 bitmap_union 语义,所以需要读出来,更新完再覆盖写入)
insert into t1 select tag_id, tag_value, refresh_time, bitmap_or(value, bitmap_from_string("11111,111111")) from t1 where tag_id=1 and tag_value="beijing" and refresh_time="2024-02-21 00:00:00";
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+----------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 1,11,111,1111,11111,111111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 6,66,666,6666 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+----------------------------+
9.5.2 不需要保留历史 Bitmap 快照的场景
# 建表
CREATE TABLE t1(
tag_id BIGINT,
tag_value STRING,
refresh_time DATETIME,
value BITMAP
)
PRIMARY KEY(tag_id, tag_value)
DISTRIBUTED BY HASH(tag_id);
# 写入测试数据
insert into t1 select 1, "beijing", "2024-02-21 00:00:00", bitmap_from_string("1,11,111,1111");
insert into t1 select 1, "shanghai", "2024-02-21 00:00:00", bitmap_from_string("2,22,222,2222");
insert into t1 select 1, "guangzhou", "2024-02-21 00:00:00", bitmap_from_string("3,33,333,3333");
insert into t1 select 2, "male", "2024-02-21 00:00:00", bitmap_from_string("4,44,444,4444");
insert into t1 select 2, "female", "2024-02-21 00:00:00", bitmap_from_string("5,55,555,5555");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 1,11,111,1111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-02-21 00:00:00 | 5,55,555,5555 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# 覆盖更新
insert into t1 select 2, "female", "2024-03-01 00:00:00", bitmap_from_string("6,66,666,6666");
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+-------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+-------------------------+
| 1 | beijing | 2024-02-21 00:00:00 | 1,11,111,1111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-03-01 00:00:00 | 6,66,666,6666 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+-------------------------+
# union 更新 (primary 无法支持 bitmap_union 语义,所以需要读出来,更新完再覆盖写入)
insert into t1 select tag_id, tag_value, "2024-03-01", bitmap_or(value, bitmap_from_string("11111,111111")) from t1 where tag_id=1 and tag_value="beijing";
# 查询
mysql> select tag_id, tag_value, refresh_time, bitmap_to_string(value) from t1 order by tag_id, tag_value;
+--------+-----------+---------------------+----------------------------+
| tag_id | tag_value | refresh_time | bitmap_to_string(value) |
+--------+-----------+---------------------+----------------------------+
| 1 | beijing | 2024-03-01 00:00:00 | 1,11,111,1111,11111,111111 |
| 1 | guangzhou | 2024-02-21 00:00:00 | 3,33,333,3333 |
| 1 | shanghai | 2024-02-21 00:00:00 | 2,22,222,2222 |
| 2 | female | 2024-03-01 00:00:00 | 6,66,666,6666 |
| 2 | male | 2024-02-21 00:00:00 | 4,44,444,4444 |
+--------+-----------+---------------------+----------------------------+
9.6 正交建模
//TODO