查询elasticsearch 外表失败

【详述】
sr 集群升级到最新的3.2.2 版本后,查询elasticsearch 外表出现问题
select order_time from pay_accounting_journal_info limit 1;
报错信息
ERROR 1064 (HY000): Backend node not found. Check if any backend node is down.backend: [10.112.130.13 alive: true inBlacklist: false] [10.58.108.228 alive: true inBlacklist: false] [10.58.108.229 alive: true inBlacklist: false]

但是be/fe状态都是ok的(且查询starrocks的表、routine load 导入数据 都是正常的)
backends
±----------±--------------±--------------±-------±---------±---------±--------------------±--------------------±------±---------------------±----------------------±----------±-----------------±--------------±--------------±--------±---------------±-------±--------------±-------------------------------------------------------±------------------±------------±---------±------------------±-----------±-----------+
| BackendId | IP | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime | LastHeartbeat | Alive | SystemDecommissioned | ClusterDecommissioned | TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct | MaxDiskUsedPct | ErrMsg | Version | Status | DataTotalCapacity | DataUsedPct | CpuCores | NumRunningQueries | MemUsedPct | CpuUsedPct |
±----------±--------------±--------------±-------±---------±---------±--------------------±--------------------±------±---------------------±----------------------±----------±-----------------±--------------±--------------±--------±---------------±-------±--------------±-------------------------------------------------------±------------------±------------±---------±------------------±-----------±-----------+
| 1587424 | 10.112.130.13 | 9050 | 9060 | 8041 | 8060 | 2024-01-04 14:01:01 | 2024-01-04 16:44:40 | true | false | false | 241686 | 6.419 TB | 2.962 TB | 10.314 TB | 71.29 % | 71.76 % | | 3.2.2-269e832 | {“lastSuccessReportTabletsTime”:“2024-01-04 16:43:44”} | 9.381 TB | 68.43 % | 64 | 4 | 42.59 % | 33.5 % |
| 10126 | 10.58.108.228 | 9050 | 9060 | 8041 | 8060 | 2024-01-04 13:58:31 | 2024-01-04 16:44:40 | true | false | false | 241687 | 6.422 TB | 2.973 TB | 10.311 TB | 71.16 % | 71.74 % | | 3.2.2-269e832 | {“lastSuccessReportTabletsTime”:“2024-01-04 16:44:40”} | 9.395 TB | 68.35 % | 64 | 0 | 46.63 % | 30.3 % |
| 10142 | 10.58.108.229 | 9050 | 9060 | 8041 | 8060 | 2024-01-04 13:52:25 | 2024-01-04 16:44:40 | true | false | false | 241686 | 6.416 TB | 3.081 TB | 10.311 TB | 70.12 % | 70.74 % | | 3.2.2-269e832 | {“lastSuccessReportTabletsTime”:“2024-01-04 16:43:26”} | 9.497 TB | 67.56 % | 64 | 2 | 45.09 % | 30.8 % |
±----------±--------------±--------------±-------±---------±---------±--------------------±--------------------±------±---------------------±----------------------±----------±-----------------±--------------±--------------±--------±---------------±-------±--------------±-------------------------------------------------------±------------------±------------±---------±------------------±-----------±-----------+
frontends
±---------------------------------±--------------±------------±---------±----------±--------±---------±-----------±-----±------±------------------±--------------------±---------±-------±--------------------±--------------+
| Name | IP | EditLogPort | HttpPort | QueryPort | RpcPort | Role | ClusterId | Join | Alive | ReplayedJournalId | LastHeartbeat | IsHelper | ErrMsg | StartTime | Version |
±---------------------------------±--------------±------------±---------±----------±--------±---------±-----------±-----±------±------------------±--------------------±---------±-------±--------------------±--------------+
| 10.112.130.13_9010_1673264933579 | 10.112.130.13 | 9010 | 8031 | 9031 | 9020 | FOLLOWER | 1571137796 | true | true | 236773079 | 2024-01-04 16:45:10 | true | | 2024-01-02 19:19:12 | 3.2.2-269e832 |
| 10.58.108.229_9010_1667271311201 | 10.58.108.229 | 9010 | 8031 | 9031 | 9020 | LEADER | 1571137796 | true | true | 236773118 | 2024-01-04 16:45:10 | true | | 2024-01-02 19:16:16 | 3.2.2-269e832 |
| 10.58.108.228_9010_1667271890843 | 10.58.108.228 | 9010 | 8031 | 9031 | 9020 | FOLLOWER | 1571137796 | true | true | 236773079 | 2024-01-04 16:45:10 | true | | 2024-01-04 14:06:41 | 3.2.2-269e832 |
±---------------------------------±--------------±------------±---------±----------±--------±---------±-----------±-----±------±------------------±--------------------±---------±-------±--------------------±--------------+

compute nodes
±--------------±------------±--------------±-------±---------±---------±--------------------±--------------------±------±---------------------±----------------------±-------±--------------±---------±------------------±-----------±-----------±---------------+
| ComputeNodeId | IP | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime | LastHeartbeat | Alive | SystemDecommissioned | ClusterDecommissioned | ErrMsg | Version | CpuCores | NumRunningQueries | MemUsedPct | CpuUsedPct | HasStoragePath |
±--------------±------------±--------------±-------±---------±---------±--------------------±--------------------±------±---------------------±----------------------±-------±--------------±---------±------------------±-----------±-----------±---------------+
| 10638916 | 10.232.69.4 | 9050 | 9060 | 8041 | 8060 | 2024-01-04 11:22:56 | 2024-01-04 16:46:10 | true | false | false | | 3.2.2-269e832 | 48 | 0 | 0.97 % | 0.1 % | false |
| 10638917 | 10.232.69.5 | 9050 | 9060 | 8041 | 8060 | 2024-01-04 11:22:56 | 2024-01-04 16:46:10 | true | false | false | | 3.2.2-269e832 | 48 | 0 | 0.93 % | 0.5 % | false |
| 10642050 | 10.232.69.6 | 9050 | 9060 | 8041 | 8060 | 2024-01-04 16:41:25 | 2024-01-04 16:46:10 | true | false | false | | 3.2.2-269e832 | 48 | 8 | 0.94 % | 0.9 % | false |
| 10642051 | 10.232.69.7 | 9050 | 9060 | 8041 | 8060 | 2024-01-04 16:41:30 | 2024-01-04 16:46:10 | true | false | false | | 3.2.2-269e832 | 48 | 0 | 0.90 % | 0.5 % | false |
±--------------±------------±--------------±-------±---------±---------±--------------------±--------------------±------±---------------------±----------------------±-------±--------------±---------±------------------±-----------±-----------±---------------+

这里有个很奇怪的地方,我的cn节点总计有8台机器,将任意两台机器加入compute node ; 查询都是ok的,当cn节点个数3时,偶尔报上面的错误,当cn节点个数>=4时,所有的查询必失败

【业务影响】
【是否存算分离】
【StarRocks版本】3.2.2
【集群规模】例如:3fe(1 follower+2observer)+3be(fe与be混部) + 8 cn
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【附件】
- fe.log 报错信息(错误级别还是INFO的。。。)
2024-01-04 14:17:52,926 INFO (starrocks-mysql-nio-pool-6|631) [StmtExecutor.execute():710] execute Exception, sql: SELECT *
FROM pay_accounting_journal_info
LIMIT 100, error: Backend node not found. Check if any backend node is down.backend: [10.112.130.13 alive: true inBlacklist: false] [10.58.108.228 alive: true inBlacklist: false] [10.58.108.229 alive: true inBlacklist: false]

mysql> show create table pay_accounting_journal_info\G
*************************** 1. row ***************************
Table: pay_accounting_journal_info
Create Table: CREATE EXTERNAL TABLE pay_accounting_journal_info (
order_no varchar(2048) NULL COMMENT “”,
merchant_code varchar(2048) NULL COMMENT “”,
trans_code varchar(2048) NULL COMMENT “”,
busi_order_no varchar(2048) NULL COMMENT “”,
pair_account_no varchar(2048) NULL COMMENT “”,
request_no varchar(2048) NULL COMMENT “”,
account_category varchar(2048) NULL COMMENT “”,
sub_code varchar(2048) NULL COMMENT “”,
accounting_type varchar(2048) NULL COMMENT “”,
order_time varchar(2048) NULL COMMENT “”,
accounting_no varchar(2048) NULL COMMENT “”,
account_no varchar(2048) NULL COMMENT “”,
balance_flag varchar(2048) NULL COMMENT “”,
content varchar(2048) NULL COMMENT “”,
account_busi_type_id varchar(2048) NULL COMMENT “”,
anti_journal_time varchar(2048) NULL COMMENT “”,
id varchar(2048) NULL COMMENT “”,
accounting_status varchar(2048) NULL COMMENT “”,
batch_accounting_no varchar(2048) NULL COMMENT “”,
create_date varchar(2048) NULL COMMENT “”,
pair_sub_code varchar(2048) NULL COMMENT “”,
anti_accounting_no varchar(2048) NULL COMMENT “”,
amount varchar(2048) NULL COMMENT “”,
qunar_trade_no varchar(2048) NULL COMMENT “”,
off_balan varchar(2048) NULL COMMENT “”,
book_category varchar(2048) NULL COMMENT “”,
busi_type_id varchar(2048) NULL COMMENT “”,
trans_time varchar(2048) NULL COMMENT “”,
cur_id varchar(2048) NULL COMMENT “”,
fund_change_direction varchar(2048) NULL COMMENT “”,
user_id varchar(2048) NULL COMMENT “”,
debit_ind varchar(2048) NULL COMMENT “”,
org_code varchar(2048) NULL COMMENT “”,
accounting_date varchar(2048) NULL COMMENT “”,
journal_seq varchar(2048) NULL COMMENT “”,
journal_type varchar(2048) NULL COMMENT “”
) ENGINE=ELASTICSEARCH
COMMENT “ELASTICSEARCH”
PROPERTIES (
“hosts” = “http://xxx:80”,
“user” = “elastic”,
“password” = “”,
“index” = “pay_accounting_journal_info”,
“type” = “_doc”,
“transport” = “http”,
“enable_docvalue_scan” = “true”,
“max_docvalue_fields” = “50”,
“enable_keyword_sniff” = “true”,
“es.nodes.wan.only” = “true”
);

mysql> show variables like ‘%compute%’;
±--------------------±------+
| Variable_name | Value |
±--------------------±------+
| prefer_compute_node | false |
| use_compute_nodes | 0 |
±--------------------±------+
2 rows in set (0.05 sec)

be.out 提供下看看

相关报错信息,只出现在fe.log fe.warn be.out be.INFO be.WARNING 都没有
[powerop@SVR32908IN5112 log]$ cat be.out
start time: Thu Jan 4 13:51:48 CST 2024

fe.log
2024-01-04 14:17:52,926 INFO (starrocks-mysql-nio-pool-6|631) [StmtExecutor.execute():710] execute Exception, sql: SELECT *
FROM pay_accounting_journal_info
LIMIT 100, error: Backend node not found. Check if any backend node is down.backend: [10.112.130.13 alive: true inBlacklist: false] [10.58.108.228 alive: true inBlacklist: false] [10.58.108.229 alive: true inBlacklist: false]

有什么参数和cn节点的个数有关么?八台cn节点,任意两台加入集群,查询es外表都没有问题,>=4 必报错,3台偶尔报错

有BE或者是CN重启吗