CPU经常80%+,IO读写速率20MB/s

【详述】CPU经常80%+,IO读写速率20M+。
【背景】写的业务会比较多,有按固定频率写入(10秒一次)
【业务影响】
【StarRocks版本】例如:2.0.8
【集群规模】1fe +3be(fe与be分开部)
【机器信息】CPU虚拟核/内存/网卡,例如:8C/32G/万兆
【联系方式】thc1987@qq.com
【附件】
下面是三台BE的监控情况:

BE CPU使用率,基本都在凌晨3点开始,排查出来凌晨3点有个定时任务,用来更新表,显示批量查询出记录,500条,然后INSERT 进行修改,这个时候CPU会升高。

代码如下,ids有9w个,然后做了分片处理,每次拿500个,之前是每次拿2000个CPU还是高。

— 2023-5-23 16:08:36 更新
查看BE日志,发现有大量 too many tablet versions 日志,如下所示

 tablet writer add chunk failed, message=too many tablet versions, id=75cd81a1-43b0-11ec-8a6e-0242a61c1c56, index_id=17167, sender_id=0
W1112 20:02:51.204150 11522 internal_service.cpp:205] tablet writer add chunk failed, message=too many tablet versions, id=75d5e576-43b0-11ec-8a6e-0242a61c1c56, index_id=17167, sender_id=0
W1112 20:02:51.247951 11507 internal_service.cpp:205] tablet writer add chunk failed, message=too many tablet versions, id=75dc9c9b-43b0-11ec-8a6e-0242a61c1c56, index_id=17167, sender_id=0
W1112 20:02:51.248013 11506 tablet_sink.cpp:199] NodeChannel[17167-10002] add batch req success but status isn't ok, load_id=75dc9c9b-43b0-11ec-8a6e-0242a61c1c56, txn_id=7826, node=11:8060, errmsg=too many tablet versions
W1112 20:02:51.248064 11503 tablet_sink.cpp:199] NodeChannel[17167-10004] add batch req success but status isn't ok, load_id=75dc9c9b-43b0-11ec-8a6e-0242a61c1c56, txn_id=7826, node=22:8060, errmsg=too many tablet versions
W1112 20:02:51.248108 11506 tablet_sink.cpp:199] NodeChannel[17167-10003] add batch req success but status isn't ok, load_id=75dc9c9b-43b0-11ec-8a6e-0242a61c1c56, txn_id=7826, node=33:8060, errmsg=too many tablet versions

BE只有8C 配置比较低。

BE添加了如下配置,IO降下来了,参考的这个: CloudCanal数据导入常见问题 @ CloudCanal_load_faq @ StarRocks Docs (mirrorship.cn)

cumulative_compaction_num_threads_per_disk = 4
base_compaction_num_threads_per_disk = 2
cumulative_compaction_check_interval_seconds = 2
update_compaction_num_threads_per_disk = 2