Starrocks-BE v3.2.3 每天晚上一直把磁盘IO打到100%,性能损耗巨大

【详述】通过监控和日志发现starrocks BE 生产环境每天晚上11点到凌晨5点35分都在做定时compact,这段时间一直把磁盘IO打满100%,交换分区使用率100%,内存占用率正常。测试环境是15:30到凌晨3点
【背景】做过哪些操作?查看be.info,只能看到对starrocks里的几个表一直在做compact,表结构如下:
CREATE TABLE xxx (
f1 varchar(100) NOT NULL COMMENT “”,
f2 varchar(100) NOT NULL COMMENT “”,
f3 varchar(100) NOT NULL COMMENT “”,
f4 varchar(4) NOT NULL COMMENT “”,
datetime datetime NOT NULL COMMENT “”,
f5 int(11) NOT NULL COMMENT “”,
f6 largeint(40) NOT NULL COMMENT “”,
f7 varchar(70) NOT NULL COMMENT “”,
INDEX t_type (f4) USING BITMAP
) ENGINE=OLAP
DUPLICATE KEY(f1)
DISTRIBUTED BY HASH(f1) BUCKETS 1000
PROPERTIES (
“replication_num” = “1”,
“in_memory” = “false”,
“enable_persistent_index” = “false”,
“replicated_storage” = “true”,
“compression” = “LZ4”
);
每张表数据量大概有10T左右
【业务影响】这段时间内查询starrocks语句都会执行的特别慢
【是否存算分离】是
【StarRocks版本】3.2.3
【集群规模】例如:3fe+4be
【机器信息】CPU虚拟核/内存/网卡,例如:32C/64G/万兆
【联系方式】邮箱:crow8389@gmail.com
【附件】


  • be.info只能看到在一直做compact
    I0714 16:31:52.951963 8066 size_tiered_compaction_policy.cpp:353] pick tablet 193862 for size-tiered compaction rowset version=243341-243354 score=14.8193 level_size=2214 total_size=4228 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952040 3001164 size_tiered_compaction_policy.cpp:353] pick tablet 193894 for size-tiered compaction rowset version=243349-243354 score=29 level_size=1 total_size=2112 segment_num=6 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952096 3001165 size_tiered_compaction_policy.cpp:353] pick tablet 193916 for size-tiered compaction rowset version=243349-243354 score=22.1811 level_size=2684 total_size=13663 segment_num=6 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952138 3001164 size_tiered_compaction_policy.cpp:353] pick tablet 193942 for size-tiered compaction rowset version=243349-243354 score=29 level_size=1 total_size=4029 segment_num=6 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952186 8066 size_tiered_compaction_policy.cpp:353] pick tablet 193956 for size-tiered compaction rowset version=243345-243354 score=16.3563 level_size=3890 total_size=10418 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952209 3001166 size_tiered_compaction_policy.cpp:353] pick tablet 193964 for size-tiered compaction rowset version=243341-243354 score=13.0027 level_size=2919 total_size=2923 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952231 3001167 size_tiered_compaction_policy.cpp:353] pick tablet 193972 for size-tiered compaction rowset version=243341-243354 score=15.5253 level_size=3514 total_size=7951 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952322 8066 size_tiered_compaction_policy.cpp:353] pick tablet 194032 for size-tiered compaction rowset version=243345-243354 score=15.7977 level_size=6674 total_size=16010 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952322 3001166 size_tiered_compaction_policy.cpp:353] pick tablet 194030 for size-tiered compaction rowset version=243345-243354 score=15.1015 level_size=2029 total_size=4161 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952334 3001164 size_tiered_compaction_policy.cpp:353] pick tablet 194040 for size-tiered compaction rowset version=243345-243354 score=17.363 level_size=2832 total_size=9010 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952474 8066 size_tiered_compaction_policy.cpp:353] pick tablet 194148 for size-tiered compaction rowset version=243349-243354 score=29 level_size=1 total_size=8298 segment_num=6 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952478 3001164 size_tiered_compaction_policy.cpp:353] pick tablet 194158 for size-tiered compaction rowset version=243341-243354 score=16.1868 level_size=2639 total_size=6844 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952549 3001167 size_tiered_compaction_policy.cpp:353] pick tablet 194220 for size-tiered compaction rowset version=243349-243354 score=29 level_size=1 total_size=4042 segment_num=6 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952584 3001166 size_tiered_compaction_policy.cpp:353] pick tablet 194256 for size-tiered compaction rowset version=243341-243354 score=16.1132 level_size=2632 total_size=6729 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952628 8066 size_tiered_compaction_policy.cpp:353] pick tablet 194294 for size-tiered compaction rowset version=243345-243354 score=15.2772 level_size=4120 total_size=8811 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952759 3001170 compaction_task.cpp:137] compaction finish. status:OK,
    task info:[CompactionTaskInfo] task_id:22534125,
    tablet_id:10217,
    compaction score:14.9867,
    algorithm:VERTICAL_COMPACTION,
    state:COMPACTION_SUCCESS,
    compaction_type:cumulative,
    output_version:[284540-284573],
    start_time:2024-07-14 16:31:52.944,
    end_time:2024-07-14 16:31:52.952,
    elapsed_time:8708 us,
    input_rowsets_size:18528,
    input_segments_num:5,
    input_rowsets_num:5,
    input_rows_num:65,
    output_num_rows:65,
    merged_rows:0,
    filtered_rows:0,
    output_segments_num:1,
    output_rowset_size:10963,
    column_group_size:3,
    total_output_num_rows:195,
    total_merged_rows:0,
    total_del_filtered_rows:0,
    is_shortcut_compaction:0,
    is_manual_compaction:0,
    progress:100
    I0714 16:31:52.952775 3001165 size_tiered_compaction_policy.cpp:353] pick tablet 194450 for size-tiered compaction rowset version=243341-243354 score=13.0031 level_size=2576 total_size=2580 segment_num=5 force_base_compaction=0 reached_max_versions=0
    I0714 16:31:52.952818 3001170 compaction_task.cpp:39] start compaction. task_id:22534129,
    tablet:10263,
    algorithm:VERTICAL_COMPACTION,
    compaction_type:cumulative,
    compaction_score:14.1073,
    output_version:[284520-284573],
    input rowsets size:5
    I0714 16:31:52.952832 8069 compaction_manager.cpp:87] submit task to compaction pool,
    task_id:22534140,
    tablet_id:194220,
    compaction_type:cumulative,
    compaction_score:45.6048 for round:22560298,
    candidates_size:882

急!!!求救各位大佬:sob:

把swap关了吧

好,我试试

我按照官方文档https://docs.starrocks.io/zh/docs/deployment/environment_configurations/#swap-space
把swap space和swappiness都关掉了,但磁盘IO还是在固定时间特别高,查看be.info还是在不停的做compact:


这是我们的be.conf的配置:
cumulative_compaction_num_threads_per_disk = 4
base_compaction_num_threads_per_disk = 2
cumulative_compaction_check_interval_seconds = 2
update_compaction_size_threshold=67108864

cumulative_compaction_num_threads_per_disk = 1
base_compaction_num_threads_per_disk = 1
#cumulative_compaction_check_interval_seconds = 2 (去掉)
update_compaction_size_threshold=67108864

读IO高,还是写IO高,机械盘?

SSD,写IO高

我明天试试

补充一下机器参数:cpu64核,内存251Gi,万兆

关闭swap后:
image

我按照这个配置调整了一下,IO还是很高,话说为啥要降低compact线程数
image

高的时候发个iotop,看下是什么线程,截个图




@trueeyu

看起来主要还是Compaction,可能与Bitmap索引有些关系,可以把IO高的时候,的BE日志发下吗


我看没有表建了bitmap索引哦,IO高的时候日志就是显示一直在做compact

INDEX t_type ( f4 ) USING BITMAP

发日志吧, 减少点沟通成本

好的我上传了。202408011800 (50 MB)
@trueeyu

老师,不好意思,打扰您了,一个小时的日志大概有700多MB,我这边只能上传一小部分:partA_202408011800 (5 MB)
您之前提到可能跟建了bitmap索引有关,我看了下数据量比较大在1TB左右的那几张表都建了bitmap索引,但实际上建了bitmap索引的列只有两个枚举值,其他的表就是几个物化视图了。如果我直接drop掉bitmap索引,会影响性能吗

压缩下啊,压缩后会很小

好的,我重新上传了:be.INFO.log.20240804-060302-0630.zip (35.7 MB)
还有就是我发现关闭了swap内存后,相比之前打开swap内存compact的时间更长了,多了4-5个小时每天