求问,starrocks怎么实现ck中的localQuery呀?有明细数据去重场景,希望把待去重id作为分桶键,然后去重的时候节点内去重即可呀?
ck中的localQuery节点内计算出的去重结果数值,可以减少很多网络传输
求问,starrocks怎么实现ck中的localQuery呀?有明细数据去重场景,希望把待去重id作为分桶键,然后去重的时候节点内去重即可呀?
ck中的localQuery节点内计算出的去重结果数值,可以减少很多网络传输
这个就是 colocate agg
那太好了,不过我在官方文档没找到colocate agg的相关资料,老师有空能辛苦帮忙举例说明下我需要怎么才能让我的聚合查询走colocate agg吗?
【场景】我的场景是分区的bitmap聚合表,表结构如下,按block_offset分桶,需要桶内求出bitmap的基数结果,然后各桶之间的数据进行求和即可
【表结构】
CREATE TABLE bitmap_table (
dt
varchar(65533) NOT NULL COMMENT “dt”,
dim1
varchar(65533) NOT NULL COMMENT “dim1”,
block_offset
bigint NOT NULL COMMENT “用户id的二进制低x位”,
user_bitmap
bitmap BITMAP_UNION NULL COMMENT “用户bitmap(剔除低x位后,其余位存储为bitmap)”
) ENGINE=OLAP
AGGREGATE KEY(dt
,dim1
,block_offset
)
COMMENT “bitmap_table”
PARTITION BY (dt
)
DISTRIBUTED BY HASH(block_offset
)
PROPERTIES (
“replication_num” = “3”,
“in_memory” = “false”,
“enable_persistent_index” = “false”,
“replicated_storage” = “true”,
“partition_live_number” = “100”,
“compression” = “LZ4”
);
【查询】
希望能走colocate agg,保障节点中直接计算出uv结果,然后节点间进行uv的求和
select
dim1
,sum(uv) as uv
from(
select
dim1
,block_offset
,bitmap_union_count(user_bitmap) as uv
from bitmap_table
group by dim1
,block_offset
) t1
group by dim1
你好,是说colocate agg可以直接参考colocate join是吗?
哦,没注意到你这个是单表不是join。
这个你可以explain看下里面的子查询是不是已经按你的期望进行了,因为这个schema和sql看起来节点之间并没有什么东西需要交换,毕竟已经按block_offset分桶了
好的,感谢大佬,我看了下执行计划,group by block_offset后,节点间的确只有结果数据的交换了,我这再实际测试下看看
大佬,我发现在查询分区表的单个分区的时候,group by带上分桶键,的确不会进行bitamp的shuffle,会直接在节点内计算出bitmap的整形结果,然后节点间汇总,但是同样的sql,查询多个分区的时候,就会多一步shuffle,节点内会进行bitmap_union,然后节点间shuffle后进行bitmap_union_count
case如下:
【查询单个分区】
EXPLAIN
select
dt
,dim2
,sum(pv2) as pv2
,bitmap_union_count(custom_bitmap2) as custom_num2
from dm.test_lzl_bitmap_20240822_005_part
where dt in (‘20241001’)
group by dt,dim2;
======「查询计划」:
Explain String
PLAN FRAGMENT 0
OUTPUT EXPRS:2: dt | 4: dim2 | 9: sum | 10: bitmap_union_count
PARTITION: UNPARTITIONED
RESULT SINK
2:EXCHANGE
PLAN FRAGMENT 1
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 02
UNPARTITIONED
1:AGGREGATE (update finalize)
| output: sum(6: pv2), bitmap_union_count(8: custom_bitmap2)
| group by: 2: dt, 4: dim2
|
0:OlapScanNode
TABLE: test_lzl_bitmap_20240822_005_part
PREAGGREGATION: ON
PREDICATES: 2: dt = ‘20241001’
partitions=1/121
rollup: test_lzl_bitmap_20240822_005_part
tabletRatio=2/2
tabletList=2318946,2318948
cardinality=1179
avgRowSize=1048595.9
【查询多个分区】
EXPLAIN
select
dt
,dim2
,sum(pv2) as pv2
,bitmap_union_count(custom_bitmap2) as custom_num2
from dm.test_lzl_bitmap_20240822_005_part
where dt in (‘20241001’,‘20241002’)
group by dt,dim2
=====执行计划
Explain String
PLAN FRAGMENT 0
OUTPUT EXPRS:2: dt | 4: dim2 | 9: sum | 10: bitmap_union_count
PARTITION: UNPARTITIONED
RESULT SINK
4:EXCHANGE
PLAN FRAGMENT 1 – !!!!多了一步!!!!
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 2: dt, 4: dim2
STREAM DATA SINK
EXCHANGE ID: 04
UNPARTITIONED
3:AGGREGATE (merge finalize)
| output: sum(9: sum), bitmap_union_count(10: bitmap_union_count)
| group by: 2: dt, 4: dim2
|
2:EXCHANGE
PLAN FRAGMENT 2
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 02
HASH_PARTITIONED: 2: dt, 4: dim2
1:AGGREGATE (update serialize)
| STREAMING
| output: sum(6: pv2), bitmap_union_count(8: custom_bitmap2)
| group by: 2: dt, 4: dim2
|
0:OlapScanNode
TABLE: test_lzl_bitmap_20240822_005_part
PREAGGREGATION: ON
PREDICATES: 2: dt IN (‘20241001’, ‘20241002’)
partitions=2/121
rollup: test_lzl_bitmap_20240822_005_part
tabletRatio=4/4
tabletList=2318954,2318956,2318946,2318948
cardinality=2358
avgRowSize=1048595.9
大佬能再请教下如何制定colocate key吗?分区表不指定colocate key的确会无法走colocate agg
“colocate_with” = “group_name”
试了下,是可以的,感谢~