存算分离版本,简单limit 1内表查询耗时 30s - 2min左右

【详述】简单limit 1查询,除分区过滤外不加任何过滤条件的内表查询,耗时30s - 2min左右。(基于hdfs存储)
【是否存算分离】是
【StarRocks版本】3.1.11

SQL:select * from ads_report_ad_mi where partition_time > ‘2024-06-07’ limit 1;

Explain analyze:
--------------------------------+
| Explain String |
±--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Summary |
| QueryId: bcf74a8a-2d19-11ef-998c-525400648378 |
| Version: 3.1.11-c2bf2e2 |
| State: Finished |
| TotalTime: 38s844ms |
| ExecutionTime: 38s816ms [Scan: 38s670ms (99.62%), Network: 596.776ms (1.54%), ResultDeliverTime: 0ns (0.00%), ScheduleTime: 86.593ms (0.22%)] |
| CollectProfileTime: 109ms |
| FrontendProfileMergeTime: 4.788ms |
| QueryPeakMemoryUsage: 12.553 GB, QueryAllocatedMemoryUsage: 158.106 GB |
| Top Most Time-consuming Nodes: |
| 1. OLAP_SCAN (id=0) : 38s745ms (98.48%) |
| 2. EXCHANGE (id=1) [GATHER]: 597.091ms (1.52%) |
| 3. RESULT_SINK: 99.094us (0.00%) |
| Top Most Memory-consuming Nodes: |
| 1. EXCHANGE (id=1) [GATHER]: 71.195 KB |
| NonDefaultVariables: |
| consistent_hash_virtual_number: 128 -> 32 |
| enable_adaptive_sink_dop: false -> true |
| enable_async_profile: true -> false |
| enable_parallel_merge: true -> false |
| enable_profile: false -> true |
| group_concat_max_len: 1024 -> 65535 |
| Fragment 0 |
| │ BackendNum: 1 |
| │ InstancePeakMemoryUsage: 967.531 KB, InstanceAllocatedMemoryUsage: 4.569 MB |
| │ PrepareTime: 853.075us |
| └──RESULT_SINK |
| │ TotalTime: 99.094us (0.00%) [CPUTime: 99.094us] |
| │ OutputRows: 1 |
| │ SinkType: MYSQL_PROTOCAL |
| └──EXCHANGE (id=1) [GATHER] |
| Estimates: [row: 1, cpu: 0.00, memory: 0.00, network: 0.00, cost: 394.72] |
| TotalTime: 597.091ms (1.52%) [CPUTime: 315.007us, NetworkTime: 596.776ms] |
| OutputRows: 1 |
| PeakMemory: 71.195 KB, AllocatedMemory: 13.210 MB |
| |
| Fragment 1 |
| │ BackendNum: 17 |
| │ InstancePeakMemoryUsage: 7.285 GB, InstanceAllocatedMemoryUsage: 158.101 GB |
| │ PrepareTime: 1.312ms |
| └──DATA_STREAM_SINK (id=1) |
| │ PartitionType: UNPARTITIONED |
| └──OLAP_SCAN (id=0) |
| Estimates: [row: 1, cpu: 157.89, memory: 0.00, network: 0.00, cost: 78.94] |
| TotalTime: 38s745ms (98.48%) [CPUTime: 75.120ms, ScanTime: 38s670ms] |
| OutputRows: 16 |
| Table: : ads_report_ad_mi |
| SubordinateOperators: |
| CHUNK_ACCUMULATE |
| Detail Timers: [ScanTime = IOTaskExecTime + IOTaskWaitTime] |
| IOTaskExecTime: 4s146ms [min=0ns, max=38s670ms] |
| Aggr: 4s138ms [min=0ns, max=38s655ms] |
| IOStatistics: 0 |
| IOTimeRemote: 2s62ms [min=0ns, max=24s541ms] |
| IOTimeTotal: 2s63ms [min=0ns, max=24s549ms] |
| IOTime: 2s63ms [min=0ns, max=24s549ms] |
| LateMaterialize: 8s196ms [min=111.414ms, max=21s253ms] |
| SegmentInit: 2s336ms [min=0ns, max=27s495ms] |
| ColumnIteratorInit: 2s251ms [min=0ns, max=26s780ms] |
| Sort: 4s137ms [min=0ns, max=38s651ms] |
| IOTaskWaitTime: 876.876ms [min=0ns, max=35s455ms] |

补充1:很有意思的一个现象,加了partition_time > ‘2024-06-07’ 这个分区过滤条件反而会慢很多,1分多钟,但是如果只是单纯的select * from ads_report_ad_mi limit 1; 其实会稳定在10s - 20s。看explain,分区都已经被正确地裁剪过了。
补充2:这个表的写入比较频繁。

  1. 原表建表语句是否设置 本地盘cache
  2. 提供下完整的查询profile文件 https://docs.starrocks.io/zh/docs/sql-reference/sql-functions/utility-functions/get_query_profile/

1、内表的data cache没有开启,建表部分语句如下:
AGGREGATE KEY(partition_time, process_time, site_set, site_id, crm_advertiser_industry_id, buying_type, is_adx_ad, is_rta, is_ocpx, bid_type, dpa_ad_type, ocpx_type, optimization_goal, second_optimization_goal, roi_goal, adgroup_id, advertiser_id, agent_id, product_id)
COMMENT “行业实时分钟表”
PARTITION BY date_trunc(‘hour’, partition_time)
DISTRIBUTED BY HASH(adgroup_id)
PROPERTIES (
“replication_num” = “1”,
“bloom_filter_columns” = “optimization_goal, partition_time, adgroup_id, is_rta, agent_id, site_set, bid_type, dpa_ad_type, ocpx_type, process_time, buying_type, advertiser_id, second_optimization_goal, roi_goal, product_id, site_id, crm_advertiser_industry_id”,
“datacache.enable” = “false”,
“storage_volume” = “builtin_storage_volume”,
“enable_async_write_back” = “false”,
“enable_persistent_index” = “false”,
“partition_live_number” = “720”,
“compression” = “LZ4”
);
2、profile文件:
select_1_profile.rtf (35.7 KB)

补充下,问题应该是聚合表limit 1,没有做tablet裁剪,所以在init的时候打开了节点上的所有tablet(如果此时这个节点的tablet数比较多),所以耗时慢。
这里就有一个问题:为什么聚合表的limit 1查询,不会做tablet裁剪。
分桶key可以不是agg key的子集吗?如果不行,那分桶filter其实就是agg key的filter,那么limit 1应该也不需要读所有tablet?