bitmap distinct去重条数有问题

【业务影响】目前是测试阶段
【StarRocks版本】2.5.3
【集群规模】3fe(3follower)+3be(fe与be混部)
【机器信息】80C/64G
【表模型】聚合模型
【导入或者导出方式】broker load
今天测试了一下聚合模型里的bitmap去重,去重之后数据条数不太对。
原hive表:
create table test.page2(dt string, page string, user_id BIGINT);
insert into page2 values
(“20191206”,“waimai”,101),
(“20191206”,“waimai”,102),
(“20191206”,“xiaoxiang”,101),
(“20191206”,“xiaoxiang”,101),
(“20191206”,“xiaoxiang”,101),
(“20191206”,“waimai”,101),
(“20191207”,“waimai”,22222);
StarRocks:
CREATE TABLE page5 (
dt varchar(65533) not NULL COMMENT “”,
page varchar(65533) not NULL COMMENT “”,
user_id BITMAP BITMAP_UNION not null COMMENT “”
) ENGINE=OLAP
AGGREGATE KEY(dt, page)
DISTRIBUTED BY HASH(dt) BUCKETS 8
PROPERTIES (
“replication_num” = “3”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “false”,
“compression” = “LZ4”
);
LOAD LABEL load_test.page50
(
DATA INFILE(“hdfs://nameservice1/user/hive/warehouse/test.db/page2/*”)
INTO TABLE page5
COLUMNS TERMINATED BY “\x01”
(dt,page,user_id)
set(
– user_id=to_bitmap(user_id)
user_id=bitmap_hash(user_id)
)
)
WITH BROKER
(
“username” = “hdfs”,
“password” = “hdfs”
)
PROPERTIES
(
“timeout” = “3600”
);
hive表里查询:
select count(distinct user_id) from page2 group by dt,page;
1680510530427
StarRocks里面:
SELECT COUNT(user_id) from page5 group by dt,page;
1680510768074
按理应该是4条数据不应该是3条啊
1680510875637
有没有大佬解释一下呀,急

dt,page分组后计算user_id应该就是1啊,你把在hive中的查询结果带上其它的列,都显示下看看

我找到怎么回事了,要加count(distinct )才能结果一致,不过没道理啊,它的数据版本结果不是已经去重过了吗,咋还要加distinct,很奇怪啊

bitmap本来就是count(distinct)啊

之前我以为只是去重而已,现在明白了