tablet存在严重倾斜,达到了20G,合并时影响整个集群的查询和摄入

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
W0206 15:26:13.798131 2700 query_context.cpp:645] Retrying ReportExecStatus: No more data to read.
W0206 15:26:43.803716 2700 query_context.cpp:645] Retrying ReportExecStatus: No more data to read.
W0206 15:27:12.321235 2846 tablet_updates.cpp:1591] wait_for_version slow(143473ms) version:11563927.1 tablet:164888 #version:865 [11563074 11563927.1@864 11563927.1] pending: rowsets:9[id/seg/row/del/byte/compaction]: [0/1/414384/0/175.54 MB/80.46 MB],[2652493/1/178309/0/74.71 MB/181.29 MB],[3077194/1/20379/0/9.47 MB/246.53 MB],[3088042/1/1/0/4.65 KB/256.00 MB],[3088043/1/1/0/4.38 KB/256.00 MB],[3088044/0/0/0/0/256.00 MB],[3088045/0/0/0/0/256.00 MB],[3088046/0/0/0/0/256.00 MB],[3088047/1/645/0/359.56 KB/255.65 MB]
W0206 15:27:12.321483 2846 storage_engine.cpp:943] Trace:
0206 15:24:46.789855 (+ 0us) storage_engine.cpp:947] start to perform update compaction
0206 15:24:48.833150 (+2043295us) storage_engine.cpp:955] found best tablet 164888
Metrics: {} 这里为啥打印的时间是过去的?
E0206 15:29:02.813109 2296 scan_operator.cpp:422] scan fragment 09b1b517-e45c-11ef-b562-d0946660c565 driver 0 Scan tasks error: Cancelled: Cancelled because of runtime state is cancelled

请问这个问题要怎么分析?已知某个表的tablet存在严重倾斜,达到了20G。我想搞清楚为啥会导致be不可用了,没有相关crash打印。

fe打印了
ts=2025-02-06 11:56:26;thread_name=starrocks-taskrun-pool-1214;id=def0a;is_daemon=true;priority=5;TCCL=jdk.internal.loader.ClassLoaders$AppClassLoader@659e0bfd
@com.starrocks.qe.SimpleScheduler.addToBlacklist()
at com.starrocks.qe.DefaultCoordinator.handleErrorExecution(DefaultCoordinator.java:580)
at com.starrocks.qe.scheduler.Deployer.waitForDeploymentCompletion(Deployer.java:244)
at com.starrocks.qe.scheduler.Deployer.deployFragments(Deployer.java:116)
at com.starrocks.qe.DefaultCoordinator.deliverExecFragments(DefaultCoordinator.java:564)
at com.starrocks.qe.DefaultCoordinator.startScheduling(DefaultCoordinator.java:491)
at com.starrocks.qe.scheduler.Coordinator.startScheduling(Coordinator.java:102)
at com.starrocks.qe.scheduler.Coordinator.exec(Coordinator.java:85)

且profile出现了deploywaittime时间耗时接近整个sql查询时间。实际上这个sql查询非常快。
【背景】做过哪些操作?
【业务影响】
期间整个集群暂停了,查询耗时很长时间。
【是否存算分离】
【StarRocks版本】例如:3.2.5