磁盘占用量与系统统计差距大/ 脏数据量大

【业务影响】
【StarRocks版本】2.5.12
【集群规模】例如:3fe+9be
【机器信息】be : 32c 128G 14TB
【详述】
show data; 统计目前所有db 累计数据量 40TB
show backends: 总计约 55TB
系统 df -h 总计约 63TB

mysql> show backends;
+-----------+-----------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------+----------------+--------------------------------------------------------+-------------------+-------------+----------+-------------------+------------+------------+
| BackendId | IP              | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime       | LastHeartbeat       | Alive | SystemDecommissioned | ClusterDecommissioned | TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct | MaxDiskUsedPct | ErrMsg | Version        | Status                                                 | DataTotalCapacity | DataUsedPct | CpuCores | NumRunningQueries | MemUsedPct | CpuUsedPct |
+-----------+-----------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------+----------------+--------------------------------------------------------+-------------------+-------------+----------+-------------------+------------+------------+
| 144937054 | 192.xxxxx.116 | 9050          | 9060   | 8040     | 8060     | 2023-10-19 06:52:31 | 2023-11-15 10:40:21 | true  | false                | false                 | 41991     | 6.363 TB         | 6.112 TB      | 13.759 TB     | 55.58 % | 55.62 %        |        | 2.5.12-cb07d99 | {"lastSuccessReportTabletsTime":"2023-11-15 10:39:44"} | 12.475 TB         | 51.01 %     | 32       | 1                 | 29.81 %    | 13.7 %     |
| 144998293 | 192.xxxxx.117 | 9050          | 9060   | 8040     | 8060     | 2023-11-13 11:56:27 | 2023-11-15 10:40:21 | true  | false                | false                 | 41947     | 6.363 TB         | 6.127 TB      | 13.759 TB     | 55.47 % | 55.49 %        |        | 2.5.12-cb07d99 | {"lastSuccessReportTabletsTime":"2023-11-15 10:40:17"} | 12.490 TB         | 50.94 %     | 32       | 0                 | 26.30 %    | 14.9 %     |
| 140849148 | 192.xxxxx.118 | 9050          | 9060   | 8040     | 8060     | 2023-11-13 11:58:20 | 2023-11-15 10:40:21 | true  | false                | false                 | 40589     | 6.137 TB         | 5.594 TB      | 13.571 TB     | 58.78 % | 59.05 %        |        | 2.5.12-cb07d99 | {"lastSuccessReportTabletsTime":"2023-11-15 10:40:05"} | 11.732 TB         | 52.31 %     | 32       | 0                 | 28.07 %    | 14.3 %     |
| 51234149  | 192.xxxxx.119 | 9050          | 9060   | 8040     | 8060     | 2023-09-14 06:25:11 | 2023-11-15 10:40:21 | true  | false                | false                 | 40980     | 6.181 TB         | 5.592 TB      | 13.662 TB     | 59.06 % | 59.67 %        |        | 2.5.12-cb07d99 | {"lastSuccessReportTabletsTime":"2023-11-15 10:40:05"} | 11.774 TB         | 52.50 %     | 32       | 0                 | 29.96 %    | 14.3 %     |
| 53629671  | 192.xxxxx.120 | 9050          | 9060   | 8040     | 8060     | 2023-09-18 10:05:04 | 2023-11-15 10:40:21 | true  | false                | false                 | 40718     | 6.141 TB         | 5.623 TB      | 13.662 TB     | 58.84 % | 58.96 %        |        | 2.5.12-cb07d99 | {"lastSuccessReportTabletsTime":"2023-11-15 10:40:22"} | 11.764 TB         | 52.20 %     | 32       | 1                 | 32.23 %    | 14.7 %     |
| 53629767  | 192.xxxxx.121 | 9050          | 9060   | 8040     | 8060     | 2023-09-18 10:05:04 | 2023-11-15 10:40:21 | true  | false                | false                 | 39722     | 5.969 TB         | 5.469 TB      | 13.662 TB     | 59.97 % | 60.87 %        |        | 2.5.12-cb07d99 | {"lastSuccessReportTabletsTime":"2023-11-15 10:39:32"} | 11.438 TB         | 52.18 %     | 32       | 0                 | 31.72 %    | 12.0 %     |
| 134159165 | 192.xxxxx.122 | 9050          | 9060   | 8040     | 8060     | 2023-09-18 10:05:04 | 2023-11-15 10:40:21 | true  | false                | false                 | 40686     | 6.143 TB         | 5.630 TB      | 13.662 TB     | 58.79 % | 59.04 %        |        | 2.5.12-cb07d99 | {"lastSuccessReportTabletsTime":"2023-11-15 10:39:51"} | 11.773 TB         | 52.18 %     | 32       | 1                 | 31.69 %    | 14.4 %     |
| 53820254  | 192.xxxxx.123 | 9050          | 9060   | 8040     | 8060     | 2023-09-18 10:05:04 | 2023-11-15 10:40:21 | true  | false                | false                 | 40595     | 6.129 TB         | 5.631 TB      | 13.662 TB     | 58.79 % | 59.18 %        |        | 2.5.12-cb07d99 | {"lastSuccessReportTabletsTime":"2023-11-15 10:40:01"} | 11.759 TB         | 52.12 %     | 32       | 0                 | 31.73 %    | 14.6 %     |
| 130825528 | 192.xxxxx.124 | 9050          | 9060   | 8040     | 8060     | 2023-09-18 10:05:04 | 2023-11-15 10:40:21 | true  | false                | false                 | 40435     | 6.117 TB         | 5.616 TB      | 13.665 TB     | 58.90 % | 59.30 %        |        | 2.5.12-cb07d99 | {"lastSuccessReportTabletsTime":"2023-11-15 10:40:17"} | 11.733 TB         | 52.14 %     | 32       | 0                 | 31.86 %    | 14.7 %     |
+-----------+-----------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+--------+----------------+--------------------------------------------------------+-------------------+-------------+----------+-------------------+------------+------------+
9 rows in set (0.00 sec)


系统统计,这里以 118节点为例 :
每个BE有两块盘,trash中数据可以忽略,基本是data占用:


是否有drop的操作,当前数据盘的文件只有在drop超过3天后会被清理

确实有drop 的操作,不过我们数据清理时间配置的是三小时 trash_file_expire_time_sec = 7200
上面的数也都是 drop 超过三小时后统计的,而且可以看到 trash目录中几乎没有数据

drop操作会在内存保留24小时,这个期间可以recover恢复,超过这个时间才会进入trash, catalog_trash_expire_second这个参数是控制内存中保留的时间

明白了,catalog_trash_expire_second 我们确实没有配置,现在 drop过了24h 再看数据量相差没有很大了。
目前各渠道统计的数据量如下,还是有大概 8T 的Gap
show data;40T
DataUsedCapacity: 40T
系统盘 data目录实际大小:47T

想问下多出的数据是运行中的缓存?或者是索引之类的没有统计吗?

明白了,catalog_trash_expire_second 我们确实没有配置,现在 drop过了24h 再看数据量相差没有很大了。
目前各渠道统计的数据量如下,但是还是有大概 8T 的Gap
show data;40T
DataUsedCapacity: 40T
系统盘 data目录实际大小:47T

想问下多出的数据是运行中的缓存?或者是索引之类的没有统计吗?