为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
主键表,使用distinct和不使用distinct,count计算,性能差5倍。
单层查询表,分组计算count(member_id)与count(distinct member_id) 耗时都最500ms左右
内嵌一层查询后,count(member_id)与count(distinct member_id) 耗时相差5倍以上。
【背景】
【业务影响】报表查询性能不满足业务要求
【是否存算分离】否(阿里云)
【StarRocks版本】3.2.14-b86884b
【集群规模】例如:3fe(1 follower+2observer)+3be
【机器信息】32C/128G/万兆
【联系方式】
【附件】
- fe.log/beINFO/相应截图
- 慢查询:
-
Profile信息,获取Profile,通过Profile分析查询瓶颈
-
并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
parallel_fragment_exec_instance_num 1
-
pipeline是否开启:show variables like ‘%pipeline%’;
-
|enable_pipeline_engine|true|
|max_pipeline_dop|64|
|pipeline_dop|0|
|pipeline_profile_level|1|
|pipeline_sink_dop|0|
- be节点cpu和内存使用率截图
- 查询报错:
- query_dump,怎么获取query_dump文件
- be crash
- be.out
- coredump,如何获取coredump
- 外表查询报错
- be.out和fe.warn.log
**-表数据量:**6000万+
-表结构定义:
CREATE TABLE fct_ads_obj_member_bind_rpt_df
(
member_id
varchar(255) NOT NULL ,
first_channel_id
varchar(255) NOT NULL,
first_channel_name
varchar(255) NULL,
tenant_id
varchar(255) NULL,
tenant_name
varchar(255) NULL ,
bind_date
varchar(20) NULL,
second_channel_id
varchar(255) NULL,
second_channel_name
varchar(255) NULL,
third_channel_id
varchar(255) NULL ,
third_channel_name
varchar(255) NULL,
fourth_source_id
varchar(255) NULL,
fourth_source_name
varchar(255) NULL,
reg_province_id
varchar(255) NULL,
reg_province_name
varchar(255) NULL,
reg_city_id
varchar(255) NULL,
reg_city_name
varchar(255) NULL,
bottling_plant_ownership
varchar(255) NULL ,
create_datetime
varchar(255) NULL,
create_user
varchar(255) NULL ,
create_workflow
varchar(255) NULL,
member_status
varchar(2) NULL
) ENGINE=OLAP
PRIMARY KEY(member_id
, first_channel_id
)
DISTRIBUTED BY HASH(member_id
) BUCKETS 20
PROPERTIES (
“replication_num” = “3”,
“in_memory” = “false”,
“enable_persistent_index” = “false”,
“replicated_storage” = “true”,
“compression” = “LZ4”
);
-业务sql: 执行时间16s,执行profile见附件 “SQL1 PROFILE”
SQL1 PROFILE (5.7 KB)
select
渠道,
count( distcint member_id) as 人数,
from
(
select
member_id,
group_concat( first_channel_name order by first_channel_name separator ‘,’) as 渠道
from fct_ads_obj_member_bind_rpt_df
where 1=1
group by
member_id
) t
group by 渠道
-业务sql去掉distinct:执行时间3.3s,执行profile见附件 “SQL2 PROFILE”
SQL2 PROFILE (4.7 KB)
select
渠道,
count( member_id) as 人数,
from
(
select
member_id,
group_concat( first_channel_name order by first_channel_name separator ‘,’) as 渠道
from fct_ads_obj_member_bind_rpt_df
where 1=1
group by
member_id
) t
group by 渠道
-表直接distinct操作,600ms。
select second_channel_name ,count(distinct member_id)
from fct_ads_obj_member_bind_rpt_df
group by second_channel_name;