查询时长不稳定

【详述】今天下午3点20到4点20 耗时突然变长, 这是流量很小, 也没有量比较大的导入任务。 fe和be的 CPU和内存也没有明显变化, 两个be节点,每个节点的缓存是1T, 已经满了。


【背景】
【业务影响】线上产品的使用,慢查询会打满连接池
【是否存算分离】是
【StarRocks版本】3.2.4
【集群规模】1 fe , 2 be 混布
【机器信息】16C 128G
【附件】

  • fe.log/beINFO/相应截图
  • 慢查询:
    • Profile信息

    • 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
      并行度为 1

    • pipeline是否开启:show variables like ‘%pipeline%’; 已开启
      Variable_name |Value|
      ----------------------±----+
      enable_pipeline_engine|true |
      max_pipeline_dop |64 |
      pipeline_dop |0 |
      pipeline_profile_level|1 |
      pipeline_sink_dop |0 |

    • be节点cpu和内存使用率截图

Query:barrage.profile (28.9 KB) Query:crawler.profile (31.8 KB) 两个查询的profile。

显示是 OutputFullTime 引起的

  • Variables: parallel_fragment_exec_instance_num=1,max_parallel_scan_instance_num=-1,pipeline_dop=0,enable_adaptive_sink_dop=true,enable_runtime_adaptive_dop=false,runtime_profile_report_interval=10
    • NonDefaultSessionVariables: {“cbo_push_down_topn_limit”:{“defaultValue”:1000,“actualValue”:0},“big_query_profile_threshold”:{“defaultValue”:“0s”,“actualValue”:“5s”},“enable_scan_datacache”:{“defaultValue”:false,“actualValue”:true},“enable_iceberg_column_statistics”:{“defaultValue”:false,“actualValue”:true},“exec_mem_limit”:{“defaultValue”:2147483648,“actualValue”:50179869184},“enable_adaptive_sink_dop”:{“defaultValue”:false,“actualValue”:true},“enable_spill”:{“defaultValue”:false,“actualValue”:true},“consistent_hash_virtual_number”:{“defaultValue”:128,“actualValue”:32},“full_sort_late_materialization_v2”:{“defaultValue”:true,“actualValue”:false},“spill_operator_min_bytes”:{“defaultValue”:52428800,“actualValue”:10485760}}

      • PendingTime: 8s632ms
        • __MAX_OF_PendingTime: 9s153ms
        • __MIN_OF_PendingTime: 8s113ms
        • InputEmptyTime: 35.560ms
          • __MAX_OF_InputEmptyTime: 35.570ms
          • __MIN_OF_InputEmptyTime: 35.551ms
          • FirstInputEmptyTime: 35.560ms
            • __MAX_OF_FirstInputEmptyTime: 35.570ms
            • __MIN_OF_FirstInputEmptyTime: 35.551ms
          • FollowupInputEmptyTime: 0ns
        • OutputFullTime: 8s596ms
          • __MAX_OF_OutputFullTime: 9s118ms
          • __MIN_OF_OutputFullTime: 8s78ms
        • PendingFinishTime: 0ns
        • PreconditionBlockTime: 0ns

±--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
±--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Summary |
| QueryId: ae89ef58-f617-11ee-a6a4-02427221d54e |
| Version: 3.2.4-613f0b5 |
| State: Finished |
| TotalTime: 11s189ms |
| ExecutionTime: 11s168ms [Scan: 8s147ms (72.95%), Network: 541.664us (0.00%), ResultDeliverTime: 0ns (0.00%), ScheduleTime: 386.211ms (3.46%)] |
| CollectProfileTime: 5ms |
| FrontendProfileMergeTime: 3.685ms |
| QueryPeakMemoryUsage: 546.417 MB, QueryAllocatedMemoryUsage: 51.756 GB |
| Top Most Time-consuming Nodes: |
| 1. OLAP_SCAN (id=0) : 8s255ms (97.11%) |
| 2. TOP_N (id=1) [ROW_NUMBER, TOP-N]: 243.237ms (2.86%) |
| 3. MERGE_EXCHANGE (id=2) [GATHER]: 1.178ms (0.01%) |
| 4. RESULT_SINK: 768.993us (0.01%) |
| Top Most Memory-consuming Nodes: |
| 1. MERGE_EXCHANGE (id=2) [GATHER]: 878.740 KB |
| 2. TOP_N (id=1) [ROW_NUMBER, TOP-N]: 19.849 KB |
| NonDefaultVariables: |
| autocommit: true -> false |
| big_query_profile_threshold: 0s -> 5s |
| cbo_push_down_topn_limit: 1000 -> 0 |
| character_set_results: utf8 -> NULL |
| consistent_hash_virtual_number: 128 -> 32 |
| enable_adaptive_sink_dop: false -> true |
| enable_iceberg_column_statistics: false -> true |
| enable_scan_datacache: false -> true |
| enable_spill: false -> true |
| exec_mem_limit: 2147483648 -> 50179869184 |
| full_sort_late_materialization_v2: true -> false |
| spill_operator_min_bytes: 52428800 -> 10485760 |
| sql_mode_v2: 32 -> 2097184 |
| Fragment 0 |
| │ BackendNum: 1 |
| │ InstancePeakMemoryUsage: 861.688 KB, InstanceAllocatedMemoryUsage: 2.212 MB |
| │ PrepareTime: 634.473us |
| └──RESULT_SINK |
| │ TotalTime: 768.993us (0.01%) [CPUTime: 768.993us] |
| │ OutputRows: 200 |
| │ SinkType: MYSQL_PROTOCAL |
| └──MERGE_EXCHANGE (id=2) [GATHER] |
| Estimates: [row: 200, cpu: 79847.67, memory: 79847.67, network: 79847.67, cost: 73889070881.04] |
| TotalTime: 1.178ms (0.01%) [CPUTime: 637.218us, NetworkTime: 541.664us] |
| OutputRows: 200 |
| PeakMemory: 878.740 KB, AllocatedMemory: 1.571 MB |
| SubordinateOperators: |
| LOCAL_EXCHANGE [Passthrough] |
| |
| Fragment 1 |
| │ BackendNum: 2 |
| │ InstancePeakMemoryUsage: 536.596 MB, InstanceAllocatedMemoryUsage: 51.754 GB |
| │ PrepareTime: 913.830us |
| └──DATA_STREAM_SINK (id=2) |
| │ PartitionType: UNPARTITIONED |
| └──TOP_N (id=1) [ROW_NUMBER, TOP-N] |
| │ Estimates: [row: 200, cpu: 29555436718.00, memory: 79847.67, network: 29555436718.00, cost: 73888751490.35] |
| │ TotalTime: 243.237ms (2.86%) [CPUTime: 243.237ms] |
| │ OutputRows: 400 |
| │ PeakMemory: 19.849 KB, AllocatedMemory: 120.551 MB |
| │ OrderByExprs: [<slot 3> 3: timestamp_5m] |
| │ SubordinateOperators: |
| │ LOCAL_EXCHANGE [Passthrough] |
| └──OLAP_SCAN (id=0) |
| Estimates: [row: 74029551, cpu: 29555436718.00, memory: 0.00, network: 0.00, cost: 14777718359.00] |
| TotalTime: 8s255ms (97.11%) [CPUTime: 108.348ms, ScanTime: 8s147ms] |
| OutputRows: 17.422M (17421721) |
| RuntimeFilter: 17.422M (17421721) -> 17.422M (17421721) (0.00%) |
| Table: : minute_index_20 |
| Detail Timers: [ScanTime = IOTaskExecTime + IOTaskWaitTime] |
| IOTaskExecTime: 4s456ms [min=640.193ms, max=8s71ms] |
| Aggr: 4s84ms [min=543.756ms, max=7s459ms] |
| Sort: 3s216ms [min=374.457ms, max=6s469ms] |
| IOTaskWaitTime: 77.682ms [min=20.782ms, max=140.575ms]