【执行计划超时】查询 2400+ 条的 default_catalog.information_schema.task_runs 表报执行计划超时

xlrei · 2025年02月17日 07:02

为了更快的定位您的问题，请提供以下信息，谢谢
【详述】问题详细描述
【背景】做过哪些操作？
查询 task_runs:
SELECT TASK_NAME,STATE,ERROR_MESSAGE AS ERR_MSG FROM default_catalog.information_schema.task_runs where task_name=‘79d9161dd9c28e08c25fff22f6d24833_2025_02_16_1’;

当前 new_planner_optimize_timeout 设置的 300000（300s）， default_catalog.information_schema.task_runs 总条数为 2448 条，并不算多，为何执行计划需要耗时如此之久？
【业务影响】查询计划超时
【是否存算分离】是
【StarRocks版本】例如：3.3.5
【集群规模】例如：3fe（1 follower+2observer）+35cn（fe与cn混部）
【机器信息】CN 节点 CPU虚拟核/内存/网卡，例如：8C/60G/万兆
【联系方式】xlrei@163.com
【附件】

fe.log/beINFO/相应截图
慢查询：
- Profile信息
- 并行度：show variables like ‘%parallel_fragment_exec_instance_num%’;
  ±------------------------------------±------+
  | Variable_name | Value |
  ±------------------------------------±------+
  | parallel_fragment_exec_instance_num | 1 |
  ±------------------------------------±------+
- pipeline是否开启：show variables like ‘%pipeline%’;
  ±-------------------------------------------±------+
  | Variable_name | Value |
  ±-------------------------------------------±------+
  | enable_pipeline_engine | true |
  | enable_pipeline_level_multi_partitioned_rf | false |
  | max_pipeline_dop | 64 |
  | pipeline_dop | 0 |
  | pipeline_profile_level | 1 |
  | pipeline_sink_dop | 0 |
  ±-------------------------------------------±------+

cn节点cpu和内存使用率截图

image732×560 115 KB

查询报错：
[2025-02-17T13:53:03.043+0000] {pod_manager.py:367} INFO - Query execution timed out or an error occurred: 1064 (HY000): StarRocks planner use long time 300000 ms in logical phase, This probably because 1. FE Full GC, 2. Hive external table fetch metadata took a long time, 3. The SQL is very complex. You could 1. adjust FE JVM config, 2. try query again, 3. enlarge new_planner_optimize_timeout session variable

xlrei · 2025年02月17日 07:17

补上 FE 节点信息：