3.1.1升至3.1.4,bitmap_from_string(group_concat(concat(user_id), ','))人数错误

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】3.1.1升至3.1.4,bitmap_from_string(group_concat(concat(user_id), ‘,’))人数错误
【背景】正常的话人数是160万+,实际得到的人数是5000+,确认是 这块函数的问题,但不知如何解决
【业务影响】
【是否存算分离】否
【StarRocks版本】3.1.4
【集群规模】例如:3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群4-小李或者邮箱,谢谢
【附件】

使用bitmap_union(to_bitmap(user_id))结果是正确的

具体参考 https://docs.starrocks.io/zh/docs/sql-reference/sql-functions/string-functions/group_concat/

结果还是错的:

先确认下user id的list是对的吗,group_concat(user_id SEPARATOR ‘,’)

另外user_id是什么类型,varchar?

number数字类型

查询结果集是5000多的explain costs + sql 和 profile发下

profies.txt (680.5 KB)

explain costs + sql也发下

bitmap_count(bitmap_from_string(group_concat(concat(user_id) SEPARATOR ‘,’))) mem_codes 这个的explain costs + sql哈

profile里有

耗时2s+

我的意思拿一下这个的执行计划

PLAN FRAGMENT 0
OUTPUT EXPRS:288: bitmap_count
PARTITION: UNPARTITIONED
RESULT SINK
82:Project
| <slot 288> : bitmap_count(bitmap_from_string(287: group_concat))
|
81:AGGREGATE (merge finalize)
| output: group_concat(287: group_concat, ‘,’, ‘,’)
| group by:
|
80:EXCHANGE
PLAN FRAGMENT 1
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 285: user_id
STREAM DATA SINK
EXCHANGE ID: 80
UNPARTITIONED
79:AGGREGATE (update serialize)
| output: group_concat(286: concat, ‘,’, ‘,’)
| group by:
|
78:Project
| <slot 286> : concat(CAST(285: user_id AS VARCHAR))
|
77:AGGREGATE (merge finalize)
| group by: 285: user_id
|
76:EXCHANGE
PLAN FRAGMENT 2
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 76
HASH_PARTITIONED: 285: user_id
75:AGGREGATE (update serialize)
| STREAMING
| group by: 285: user_id
|
0:UNION
|
|----74:EXCHANGE
|
35:EXCHANGE
PLAN FRAGMENT 3
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 195: user_id
STREAM DATA SINK
EXCHANGE ID: 74
RANDOM
73:AGGREGATE (update finalize)
| group by: 195: user_id
|
72:EXCHANGE
PLAN FRAGMENT 4
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 204: mem_code
STREAM DATA SINK
EXCHANGE ID: 72
HASH_PARTITIONED: 195: user_id
71:Project
| <slot 195> : 195: user_id
|
70:HASH JOIN
| join op: RIGHT OUTER JOIN (PARTITIONED)
| colocate: false, reason:
| equal join conjunct: 204: mem_code = 200: any_value
|
|----69:EXCHANGE
|
38:EXCHANGE
PLAN FRAGMENT 5
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 201: any_value
STREAM DATA SINK
EXCHANGE ID: 69
HASH_PARTITIONED: 200: any_value
68:Project
| <slot 195> : 195: user_id
| <slot 200> : 200: any_value
|
67:HASH JOIN
| join op: INNER JOIN (PARTITIONED)
| colocate: false, reason:
| equal join conjunct: 201: any_value = 224: open_id
|
|----66:EXCHANGE
|
63:EXCHANGE
PLAN FRAGMENT 6
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 66
HASH_PARTITIONED: 224: open_id
65:Project
| <slot 224> : 224: open_id
|
64:OlapScanNode
TABLE: dwd_cdp_crm_mem_comsume_p_mem
PREAGGREGATION: ON
PREDICATES: 229: consume_time >= ‘2023-05-23’, 229: consume_time <= ‘2023-11-23’, 222: group_code = 61990, 223: mem_code = 0, 229: consume_time > ‘2023-05-23’, 229: consume_time < ‘2023-11-23’
partitions=1/1
rollup: dwd_cdp_crm_mem_comsume_p_mem
tabletRatio=1/64
tabletList=708870
cardinality=1
avgRowSize=61.914787
numNodes=0
PLAN FRAGMENT 7
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 63
HASH_PARTITIONED: 201: any_value
62:Project
| <slot 195> : 195: user_id
| <slot 200> : 200: any_value
| <slot 201> : 201: any_value
|
61:HASH JOIN
| join op: INNER JOIN (BROADCAST)
| colocate: false, reason:
| equal join conjunct: 201: any_value = 247: any_value
|
|----60:EXCHANGE
|
41:Project
| <slot 195> : 195: user_id
| <slot 200> : 200: any_value
| <slot 201> : 201: any_value
|
40:AGGREGATE (update finalize)
| output: any_value(192: mem_code), any_value(193: open_id)
| group by: 191: group_code, 195: user_id
| having: 201: any_value IS NOT NULL
|
39:OlapScanNode
TABLE: dwd_cdp_crm_mem_open_real_p_group
PREAGGREGATION: ON
PREDICATES: 191: group_code = 61990
partitions=1/1
rollup: dwd_cdp_crm_mem_open_real_p_group
tabletRatio=1/64
tabletList=804277
cardinality=5181089
avgRowSize=48.630325
numNodes=0
PLAN FRAGMENT 8
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 247: any_value
STREAM DATA SINK
EXCHANGE ID: 60
UNPARTITIONED
59:Project
| <slot 247> : 247: any_value
|
58:AGGREGATE (merge finalize)
| output: sum(284: sum)
| group by: 247: any_value
| having: 284: sum > 0
|
57:EXCHANGE
PLAN FRAGMENT 9
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 246: any_value
STREAM DATA SINK
EXCHANGE ID: 57
HASH_PARTITIONED: 247: any_value
56:AGGREGATE (update serialize)
| STREAMING
| output: sum(283: case)
| group by: 247: any_value
|
55:Project
| <slot 247> : 247: any_value
| <slot 283> : if(char_length(281: add_item_id) > 0, 1, 0)
|
54:HASH JOIN
| join op: LEFT OUTER JOIN (BROADCAST)
| colocate: false, reason:
| equal join conjunct: 247: any_value = 270: open_id
|
|----53:EXCHANGE
|
50:Project
| <slot 247> : 247: any_value
|
49:HASH JOIN
| join op: LEFT OUTER JOIN (PARTITIONED)
| colocate: false, reason:
| equal join conjunct: 246: any_value = 250: mem_code
|
|----48:EXCHANGE
|
45:EXCHANGE
PLAN FRAGMENT 10
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 53
UNPARTITIONED
52:Project
| <slot 270> : 270: open_id
| <slot 281> : 281: add_item_id
|
51:OlapScanNode
TABLE: dwd_cdp_crm_mem_comsume_p_mem
PREAGGREGATION: ON
PREDICATES: 268: group_code = 61990, 269: mem_code = 0, 275: consume_time > ‘2023-05-23’, 275: consume_time < ‘2023-11-23’
partitions=1/1
rollup: dwd_cdp_crm_mem_comsume_p_mem
tabletRatio=1/64
tabletList=708870
cardinality=1
avgRowSize=62.720535
numNodes=0
PLAN FRAGMENT 11
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 48
HASH_PARTITIONED: 250: mem_code
47:Project
| <slot 250> : 250: mem_code
|
46:OlapScanNode
TABLE: dim_crm_member
PREAGGREGATION: ON
PREDICATES: 248: group_code = 61990
partitions=1/1
rollup: dim_crm_member
tabletRatio=1/64
tabletList=703771
cardinality=4743474
avgRowSize=16.0
numNodes=0
PLAN FRAGMENT 12
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 45
HASH_PARTITIONED: 246: any_value
44:Project
| <slot 246> : 246: any_value
| <slot 247> : 247: any_value
|
43:AGGREGATE (update finalize)
| output: any_value(238: mem_code), any_value(239: open_id)
| group by: 237: group_code, 241: user_id
|
42:OlapScanNode
TABLE: dwd_cdp_crm_mem_open_real_p_group
PREAGGREGATION: ON
PREDICATES: 237: group_code = 61990
partitions=1/1
rollup: dwd_cdp_crm_mem_open_real_p_group
tabletRatio=1/64
tabletList=804277
cardinality=5181089
avgRowSize=48.630325
numNodes=0
PLAN FRAGMENT 13
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 38
HASH_PARTITIONED: 204: mem_code
37:Project
| <slot 204> : 204: mem_code
|
36:OlapScanNode
TABLE: dim_crm_member
PREAGGREGATION: ON
PREDICATES: 202: group_code = 61990
partitions=1/1
rollup: dim_crm_member
tabletRatio=1/64
tabletList=703771
cardinality=4743474
avgRowSize=16.0
numNodes=0
PLAN FRAGMENT 14
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 101: user_id
STREAM DATA SINK
EXCHANGE ID: 35
RANDOM
34:AGGREGATE (update finalize)
| group by: 101: user_id
|
33:EXCHANGE
PLAN FRAGMENT 15
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 106: any_value
STREAM DATA SINK
EXCHANGE ID: 33
HASH_PARTITIONED: 101: user_id
32:Project
| <slot 101> : 101: user_id
|
31:HASH JOIN
| join op: RIGHT OUTER JOIN (BUCKET_SHUFFLE(S))
| colocate: false, reason:
| equal join conjunct: 110: mem_code = 106: any_value
|
|----30:Project
| | <slot 101> : 101: user_id
| | <slot 106> : 106: any_value
| |
| 29:HASH JOIN
| | join op: INNER JOIN (BUCKET_SHUFFLE(S))
| | colocate: false, reason:
| | equal join conjunct: 106: any_value = 152: any_value
| |
| |----28:Project
| | | <slot 152> : 152: any_value
| | |
| | 27:AGGREGATE (update finalize)
| | | output: sum(189: case)
| | | group by: 152: any_value
| | | having: 190: sum > 0
| | |
| | 26:Project
| | | <slot 152> : 152: any_value
| | | <slot 189> : if(char_length(187: add_item_id) > 0, 1, 0)
| | |
| | 25:HASH JOIN
| | | join op: LEFT OUTER JOIN (BUCKET_SHUFFLE(S))
| | | colocate: false, reason:
| | | equal join conjunct: 152: any_value = 175: mem_code
| | |
| | |----24:EXCHANGE
| | |
| | 21:Project
| | | <slot 152> : 152: any_value
| | |
| | 20:HASH JOIN
| | | join op: LEFT OUTER JOIN (PARTITIONED)
| | | colocate: false, reason:
| | | equal join conjunct: 152: any_value = 156: mem_code
| | |
| | |----19:EXCHANGE
| | |
| | 16:EXCHANGE
| |
| 12:Project
| | <slot 101> : 101: user_id
| | <slot 106> : 106: any_value
| |
| 11:HASH JOIN
| | join op: INNER JOIN (PARTITIONED)
| | colocate: false, reason:
| | equal join conjunct: 106: any_value = 129: mem_code
| |
| |----10:EXCHANGE
| |
| 7:EXCHANGE
|
3:EXCHANGE
PLAN FRAGMENT 16
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 24
HASH_PARTITIONED: 175: mem_code
23:Project
| <slot 175> : 175: mem_code
| <slot 187> : 187: add_item_id
|
22:OlapScanNode
TABLE: dwd_cdp_crm_mem_comsume_p_mem
PREAGGREGATION: ON
PREDICATES: 174: group_code = 61990, 175: mem_code != 0, 181: consume_time > ‘2023-05-23’, 181: consume_time < ‘2023-11-23’
partitions=1/1
rollup: dwd_cdp_crm_mem_comsume_p_mem
tabletRatio=1/64
tabletList=708870
cardinality=4786813
avgRowSize=35.805748
numNodes=0
PLAN FRAGMENT 17
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 19
HASH_PARTITIONED: 156: mem_code
18:Project
| <slot 156> : 156: mem_code
|
17:OlapScanNode
TABLE: dim_crm_member
PREAGGREGATION: ON
PREDICATES: 154: group_code = 61990
partitions=1/1
rollup: dim_crm_member
tabletRatio=1/64
tabletList=703771
cardinality=4743474
avgRowSize=16.0
numNodes=0
PLAN FRAGMENT 18
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 16
HASH_PARTITIONED: 152: any_value
15:Project
| <slot 152> : 152: any_value
|
14:AGGREGATE (update finalize)
| output: any_value(144: mem_code)
| group by: 143: group_code, 147: user_id
|
13:OlapScanNode
TABLE: dwd_cdp_crm_mem_open_real_p_group
PREAGGREGATION: ON
PREDICATES: 143: group_code = 61990
partitions=1/1
rollup: dwd_cdp_crm_mem_open_real_p_group
tabletRatio=1/64
tabletList=804277
cardinality=5181089
avgRowSize=24.0
numNodes=0
PLAN FRAGMENT 19
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 10
HASH_PARTITIONED: 129: mem_code
9:Project
| <slot 129> : 129: mem_code
|
8:OlapScanNode
TABLE: dwd_cdp_crm_mem_comsume_p_mem
PREAGGREGATION: ON
PREDICATES: 135: consume_time >= ‘2023-05-23’, 135: consume_time <= ‘2023-11-23’, 128: group_code = 61990, 129: mem_code != 0, 135: consume_time > ‘2023-05-23’, 135: consume_time < ‘2023-11-23’
partitions=1/1
rollup: dwd_cdp_crm_mem_comsume_p_mem
tabletRatio=1/64
tabletList=708870
cardinality=1196703
avgRowSize=35.0
numNodes=0
PLAN FRAGMENT 20
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 07
HASH_PARTITIONED: 106: any_value
6:Project
| <slot 101> : 101: user_id
| <slot 106> : 106: any_value
|
5:AGGREGATE (update finalize)
| output: any_value(98: mem_code)
| group by: 97: group_code, 101: user_id
| having: 106: any_value IS NOT NULL
|
4:OlapScanNode
TABLE: dwd_cdp_crm_mem_open_real_p_group
PREAGGREGATION: ON
PREDICATES: 97: group_code = 61990
partitions=1/1
rollup: dwd_cdp_crm_mem_open_real_p_group
tabletRatio=1/64
tabletList=804277
cardinality=5181089
avgRowSize=24.0
numNodes=0
PLAN FRAGMENT 21
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 03
HASH_PARTITIONED: 110: mem_code
2:Project
| <slot 110> : 110: mem_code
|
1:OlapScanNode
TABLE: dim_crm_member
PREAGGREGATION: ON
PREDICATES: 108: group_code = 61990
partitions=1/1
rollup: dim_crm_member
tabletRatio=1/64
tabletList=703771
cardinality=4743474
avgRowSize=16.0
numNodes=0

为了防止内存爆掉 3.1.3 额外有个防御性质的session变量 group_concat_max_len, 默认是65535,是这个导致的

1赞