2.5.3升级3.0.8 磁盘io100%

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
2.5.3升级3.0.8时,操作升级第一台 be 时,be 磁盘io 100%

be.WARNING 刷屏报错

W1219 15:59:08.051925 88874 compaction_manager.cpp:100] submit compaction task 203518 to compaction pool failed. status:Service unavailable: Thread pool is at capacity (48/48 tasks running, 1000/1000 tasks queued)
W1219 15:59:08.051932 88874 compaction_manager.cpp:100] submit compaction task 203519 to compaction pool failed. status:Service unavailable: Thread pool is at capacity (48/48 tasks running, 1000/1000 tasks queued)
W1219 15:59:08.051940 88874 compaction_manager.cpp:100] submit compaction task 203520 to compaction pool failed. status:Service unavailable: Thread pool is at capacity (48/48 tasks running, 1000/1000 tasks queued)
W1219 15:59:08.051949 88874 compaction_manager.cpp:100] submit compaction task 203521 to compaction pool failed. status:Service unavailable: Thread pool is at capacity (48/48 tasks running, 1000/1000 tasks queued)
W1219 15:59:08.051964 88874 compaction_manager.cpp:100] submit compaction task 203522 to compaction pool failed. status:Service unavailable: Thread pool is at capacity (48/48 tasks running, 1000/1000 tasks queued)
W1219 15:59:08.051977 88874 compaction_manager.cpp:100] submit compaction task 203523 to compaction pool failed. status:Service unavailable: Thread pool is at capacity (48/48 tasks running, 1000/1000 tasks queued)
W1219 15:59:08.051986 88874 compaction_manager.cpp:100] submit compaction task 203524 to compaction pool failed. status:Service unavailable: Thread pool is at capacity (48/48 tasks running, 1000/1000 tasks queued)
W1219 15:59:08.051995 88874 compaction_manager.cpp:100] submit compaction task 203525 to compaction pool failed. status:Service unavailable: Thread pool is at capacity (48/48 tasks running, 1000/1000 tasks queued)
W1219 15:59:08.052006 88874 compaction_manager.cpp:100] submit compaction task 203526 to compaction pool failed. status:Service unavailable: Thread pool is at capacity (48/48 tasks running, 1000/1000 tasks queued)

【背景】做过哪些操作?
be.conf 在2.5.3时,有些参数设置

cumulative_compaction_num_threads_per_disk=6
base_compaction_num_threads_per_disk = 2
cumulative_compaction_check_interval_seconds = 2
tablet_max_versions=20000
base_compaction_interval_seconds_since_last_operation=43200
trash_file_expire_time_sec=3600
thrift_client_retry_interval_ms=200
alter_tablet_worker_count=6
streaming_load_max_batch_size_mb=1000
clone_worker_count=24

【业务影响】
【是否存算分离】 否
【StarRocks版本】 2.5.3 升级 3.0.8过程中
【集群规模】3fe(1 follower+2observer)+3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,64C/320G/万兆
【联系方式】社区群2-巡山的大王大人

最终将enable_event_based_compaction_framework 关闭,问题解决了,很懵逼,这个参数不应该是优化的参数么,反而导致io 100% 。。。
enable_event_based_compaction_framework=false
#cumulative_compaction_num_threads_per_disk=6
#base_compaction_num_threads_per_disk = 2
#cumulative_compaction_check_interval_seconds = 2

1赞

2.5开启enable_event_based_compaction_framework=true, 把 其它compaction相关的配置都去掉,还有问题吗?

show backends; 看下单机有多少Tablet,1个BE上挂了几块盘?

enable_event_based_compaction_framework=true, 把 其它compaction相关的配置都去掉 – 这个一会我试下

TabletNum: 241488
一个BE 挂了 6块1.8T ssd 磁盘

顺便借楼问下大佬 ,总是出现下面这种报错如何处理,文档中没有搜到相关参数
2023-12-21 16:42:54,086 WARN (thrift-server-pool-25421|26593) [Coordinator.updateFragmentExecStatus():1669] exec state report failed status=errorCode GLOBAL_DICT_ERROR global dict greater than DICT_DECODE_MAX_SIZE, query_id=eea20164-9fdc-11ee-bdbf-12bc6512a2f4, instance_id=eea20164-9fdc-11ee-bdbf-12bc6512a2f8

这个报错,不影响,忽略就行

去掉这些Compaction相关的配置,然后enable_event_based_compaction_framework=true,应该就没问题了,而且CPU使用也会降低

请问最终这个参数是设置true还是false啊

你遇到的是什么问题》?