ERROR 1064 (HY000): FE RPC failure

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】无法查询 information_schema.task_runs 和 materialized_views 表,其他表正常
【背景】BE宕机恢复后出现
【业务影响】
【是否存算分离】否
【StarRocks版本】3.3.16-a81c69f
【集群规模】例如:3fe(1 leader+2follower)+3be(fe与be 不混部)
【机器信息】k8s 部署FE 8C/16G; BE 16C64G
【联系方式】andy_xsh@163.com
【附件】
mysql> select count(1) from information_schema.task_runs;
ERROR 1064 (HY000): FE RPC failure, address=TNetworkAddress(hostname=kube-starrocks-fe-2.kube-starrocks-fe-search.starrocks.svc.cluster.local, port=9020), reason=Internal error processing getTaskRuns: BE:10001, host: unknown
mysql> select count(1) from information_schema.task_runs;
ERROR 1064 (HY000): FE RPC failure, address=TNetworkAddress(hostname=kube-starrocks-fe-2.kube-starrocks-fe-search.starrocks.svc.cluster.local, port=9020), reason=Internal error processing getTaskRuns: BE:10003, host: unknown

mysql> SHOW PROC ‘/backends’\G
*************************** 1. row ***************************
BackendId: 10002
IP: kube-starrocks-be-0.kube-starrocks-be-search.starrocks.svc.cluster.local
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
LastStartTime: 2025-07-18 19:24:27
LastHeartbeat: 2025-07-18 19:45:47
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 2482
DataUsedCapacity: 0.000 B
AvailCapacity: 1.535 TB
TotalCapacity: 1.564 TB
UsedPct: 1.84 %
MaxDiskUsedPct: 4.90 %
ErrMsg:
Version: 3.3.16-a81c69f
Status: {“lastSuccessReportTabletsTime”:“2025-07-18 19:45:29”}
DataTotalCapacity: 1.535 TB
DataUsedPct: 0.00 %
CpuCores: 16
MemLimit: 43.740GB
NumRunningQueries: 0
MemUsedPct: 0.43 %
CpuUsedPct: 0.0 %
DataCacheMetrics: Status: Normal, DiskUsage: 0B/1TB, MemUsage: 0B/0B
Location:
*************************** 2. row ***************************
BackendId: 10001
IP: kube-starrocks-be-1.kube-starrocks-be-search.starrocks.svc.cluster.local
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
LastStartTime: 2025-07-18 19:23:37
LastHeartbeat: 2025-07-18 19:45:47
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 3145
DataUsedCapacity: 0.000 B
AvailCapacity: 1.531 TB
TotalCapacity: 1.563 TB
UsedPct: 2.02 %
MaxDiskUsedPct: 5.65 %
ErrMsg:
Version: 3.3.16-a81c69f
Status: {“lastSuccessReportTabletsTime”:“2025-07-18 19:45:39”}
DataTotalCapacity: 1.531 TB
DataUsedPct: 0.00 %
CpuCores: 16
MemLimit: 43.740GB
NumRunningQueries: 0
MemUsedPct: 0.43 %
CpuUsedPct: 0.0 %
DataCacheMetrics: Status: Normal, DiskUsage: 0B/1TB, MemUsage: 0B/0B
Location:
*************************** 3. row ***************************
BackendId: 10003
IP: kube-starrocks-be-2.kube-starrocks-be-search.starrocks.svc.cluster.local
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
LastStartTime: 2025-07-18 19:22:52
LastHeartbeat: 2025-07-18 19:45:47
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 2470
DataUsedCapacity: 0.000 B
AvailCapacity: 1.534 TB
TotalCapacity: 1.563 TB
UsedPct: 1.84 %
MaxDiskUsedPct: 4.91 %
ErrMsg:
Version: 3.3.16-a81c69f
Status: {“lastSuccessReportTabletsTime”:“2025-07-18 19:44:54”}
DataTotalCapacity: 1.534 TB
DataUsedPct: 0.00 %
CpuCores: 16
MemLimit: 43.740GB
NumRunningQueries: 0
MemUsedPct: 0.43 %
CpuUsedPct: 0.0 %
DataCacheMetrics: Status: Normal, DiskUsage: 0B/1TB, MemUsage: 0B/0B
Location:
3 rows in set (0.00 sec)

已经解决,是多个数据目录的其中一个目录没有持久化

FE里搜一下这个 getTaskRuns

1赞