SQL查询慢，加了bitmap索引也没有提高查询性能反而降低

zengyong · 2022年11月23日 02:43

【详述】问题详细描述
【背景】做过哪些操作？
【业务影响】
【StarRocks版本】2.4.0
【集群规模】1fe+3be（fe与be混部）
【机器信息】16C/32G/万兆
【附件】

Profile信息
并行度：未加bitmap索引前profile.txt (53.7 KB) 加了索引后profile.txt (52.3 KB)
pipeline是否开启：show variables like ‘%pipeline%’;
执行计划：explain costs + sql

单表数据量1700w
select water_point_id,
flux_collector_time,
device_code,
inside_source,
flux_collector_interval,
data_send_interval,
instant_flow_rate,
instant_flow_rate_interpolate_value,
positive_cumulative,
positive_cumulative_interpolate_value,
negative_cumulative,
negative_cumulative_interpolate_value,
net_cumulative,
net_cumulative_interpolate_value,
pressure,
warn,
is_interpolate,
is_delayed,
flux_receive_time,
gapWaterNow,
gapWaterNow - LAG(gapWaterNow, 1) over ( ORDER BY flux_collector_time ) AS gas_water
from (
SELECT water_point_id,
flux_collector_time,
device_code,
inside_source,
flux_collector_interval,
data_send_interval,
instant_flow_rate,
instant_flow_rate_interpolate_value,
positive_cumulative,
positive_cumulative_interpolate_value,
negative_cumulative,
negative_cumulative_interpolate_value,
net_cumulative,
net_cumulative_interpolate_value,
pressure,
warn,
is_interpolate,
is_delayed,
flux_receive_time,
IF
(is_interpolate = 0, positive_cumulative,
positive_cumulative_interpolate_value) AS gapWaterNow,
ROW_NUMBER() over ( PARTITION BY water_point_id,flux_collector_time ORDER BY create_time DESC ) AS new_index
FROM dws_potential_meter_pressure
where flux_collector_time >= ‘2022-10-20 00:00:00.000’
AND flux_collector_time <= ‘2022-11-22 00:00:00.000’
AND water_point_id = 10
) t
where new_index = 1 limit 10;

CREATE INDEX meter_water_point_index ON test_dws.dws_potential_meter_pressure (water_point_id) USING BITMAP ;
CREATE INDEX meter_water_device_index ON test_dws.dws_potential_meter_pressure (device_id) USING BITMAP;
CREATE INDEX meter_water_tenant_index ON test_dws.dws_potential_meter_pressure (tenant_id) USING BITMAP ;

PROPERTIES (
“replication_num” = “3”,
“bloom_filter_columns”=“flux_collector_time,create_time”,
“dynamic_partition.enable” = “true”,
“dynamic_partition.time_unit” = “MONTH”,
“dynamic_partition.time_zone” = “Asia/Shanghai”,
“dynamic_partition.start” = “-2147483648”,
“dynamic_partition.end” = “3”,
“dynamic_partition.prefix” = “p”,
“dynamic_partition.buckets” = “12”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “true”
);
-explain计划

PLAN FRAGMENT 0
OUTPUT EXPRS:3: water_point_id | 1: flux_collector_time | 8: device_code | 26: inside_source | 49: flux_collector_interval | 50: data_send_interval | 28: instant_flow_rate | 29: instant_flow_rate_interpolate_value | 30: positive_cumulative | 31: positive_cumulative_interpolate_value | 32: negative_cumulative | 33: negative_cumulative_interpolate_value | 38: net_cumulative | 41: net_cumulative_interpolate_value | 40: pressure | 39: warn | 42: is_interpolate | 43: is_delayed | 54: flux_receive_time | 62: if | 64: expr
PARTITION: UNPARTITIONED
“”
RESULT SINK
“”
9:Project
| <slot 1> : 1: flux_collector_time
| <slot 3> : 3: water_point_id
| <slot 8> : 8: device_code
| <slot 26> : 26: inside_source
| <slot 28> : 28: instant_flow_rate
| <slot 29> : 29: instant_flow_rate_interpolate_value
| <slot 30> : 30: positive_cumulative
| <slot 31> : 31: positive_cumulative_interpolate_value
| <slot 32> : 32: negative_cumulative
| <slot 33> : 33: negative_cumulative_interpolate_value
| <slot 38> : 38: net_cumulative
| <slot 39> : 39: warn
| <slot 40> : 40: pressure
| <slot 41> : 41: net_cumulative_interpolate_value
| <slot 42> : 42: is_interpolate
| <slot 43> : 43: is_delayed
| <slot 49> : 49: flux_collector_interval
| <slot 50> : 50: data_send_interval
| <slot 54> : 54: flux_receive_time
| <slot 62> : 62: if
" | <slot 64> : 62: if - 63: lag(62: if, 1, null)"
|
8:ANALYTIC
" | functions: [, lag(62: if, 1, NULL), ]"
| order by: 1: flux_collector_time ASC
| window: ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
|
7:MERGING-EXCHANGE
“”
PLAN FRAGMENT 1
OUTPUT EXPRS:
" PARTITION: HASH_PARTITIONED: 3: water_point_id, 1: flux_collector_time"
“”
STREAM DATA SINK
EXCHANGE ID: 07
UNPARTITIONED
“”
6:SORT
| order by: <slot 1> 1: flux_collector_time ASC
| offset: 0
|
5:Project
| <slot 1> : 1: flux_collector_time
| <slot 3> : 3: water_point_id
| <slot 8> : 8: device_code
| <slot 26> : 26: inside_source
| <slot 28> : 28: instant_flow_rate
| <slot 29> : 29: instant_flow_rate_interpolate_value
| <slot 30> : 30: positive_cumulative
| <slot 31> : 31: positive_cumulative_interpolate_value
| <slot 32> : 32: negative_cumulative
| <slot 33> : 33: negative_cumulative_interpolate_value
| <slot 38> : 38: net_cumulative
| <slot 39> : 39: warn
| <slot 40> : 40: pressure
| <slot 41> : 41: net_cumulative_interpolate_value
| <slot 42> : 42: is_interpolate
| <slot 43> : 43: is_delayed
| <slot 49> : 49: flux_collector_interval
| <slot 50> : 50: data_send_interval
| <slot 54> : 54: flux_receive_time
" | <slot 62> : if(42: is_interpolate = 0, 30: positive_cumulative, 31: positive_cumulative_interpolate_value)"
|
4:SELECT
| predicates: 61: row_number() = 1
|
3:ANALYTIC
" | functions: [, row_number(), ]"
" | partition by: 3: water_point_id, 1: flux_collector_time"
| order by: 2: create_time DESC
| window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
2:SORT
" | order by: <slot 3> 3: water_point_id ASC, <slot 1> 1: flux_collector_time ASC, <slot 2> 2: create_time DESC"
| offset: 0
|
1:EXCHANGE
“”
PLAN FRAGMENT 2
OUTPUT EXPRS:
PARTITION: RANDOM
“”
STREAM DATA SINK
EXCHANGE ID: 01
" HASH_PARTITIONED: 3: water_point_id, 1: flux_collector_time"
“”
0:OlapScanNode
TABLE: dws_potential_meter_pressure
PREAGGREGATION: ON
" PREDICATES: 1: flux_collector_time >= ‘2022-10-20 00:00:00.000’, 1: flux_collector_time <= ‘2022-11-22 00:00:00.000’, 3: water_point_id = 10"
partitions=23/38
rollup: dws_potential_meter_pressure
tabletRatio=276/276
" tabletList=47883200,47883204,47883208,47883212,47883216,47883220,47883224,47883228,47883232,47883236 …"
cardinality=2633900
avgRowSize=20.0
numNodes=0

shemplle · 2022年11月23日 07:23

bitmap有其适用场景，可以看下加bitmap索引的列的基数。列的基数太大，或者没有存在大量重复及时命中了bitmap索引可能也不会提高效率的。

zengyong · 2022年11月24日 11:24

是低基数列这个确定。