【BE error】be集群扩容后sql查询报错

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
在对be集群进行扩容,增加了5个节点;sql查询报错日志如下:
ProgrammingError: (pymysql.err.ProgrammingError) (1064, u’error setting certificate verify locations: CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none backend:172.16.19.6’)
[SQL: select time, sum(item_count) item_count, count(shop_id) shop_count, sum(sales) sales, sum(sold) sold, case when sum(sold) != 0 then sum(sales)/sum(sold) else 0 end as avg_price from (select t1.time, count(distinct t1.item_id) item_count, t1.shop_id, sum(t1.sales) sales, sum(t1.sold) sold from (select t1.time, t1.item_id, t1.shop_id, coalesce(max(c1.sales), max(c2.sales), max(t1.sales)) sales, max(cp_sold(date_format(t1.time,’%%Y-%%m-%%d’), c1.sold, c1.sold_cp, c1.sold_min, c1.sold_min_v2, c2.sold, c2.sold_cp, c2.sold_min, c2.sold_min_v2, t1.sold, ‘’, ‘item’, ‘sold’)) sold from item2 t1 left JOIN item2_patch_1698818499 c1 ON t1.cat1 = c1.cat1 AND t1.time = c1.time AND t1.item_id = c1.item_id AND c1.username = ‘fan.yi@moojing.com’ and (t1.cat2 = c1.cat2) left JOIN item2_patch_1698818499 c2 ON t1.cat1 = c2.cat1 AND t1.time = c2.time AND t1.item_id = c2.item_id AND c2.username is null and (t1.cat2 = c2.cat2) where ‘fan.yi@moojing.com’ = ‘fan.yi@moojing.com’ and ((((t1.cat1=‘50002768’ and t1.cat2=‘50018398’ and t1.cat3 in (‘201801202’, ‘50018409’, ‘50018403’, ‘50023375’, ‘201339009’, ‘50018399’, ‘201161307’, ‘50018402’, ‘201163102’, ‘50018400’, ‘50018401’, ‘201157408’, ‘50018404’, ‘50018405’, ‘201801303’, ‘50018408’, ‘201325302’, ‘50018410’, ‘50018407’)) ))) and (t1.time in (‘2022-01-01’, ‘2022-02-01’, ‘2022-03-01’, ‘2022-04-01’, ‘2022-05-01’, ‘2022-06-01’, ‘2022-07-01’, ‘2022-08-01’, ‘2022-09-01’, ‘2022-12-01’, ‘2023-01-01’, ‘2023-02-01’)) and (replace(lower(t1.title), ’ ‘,’’) like ‘%%\u8fd0\u52a8%%’ or replace(lower(t1.title), ’ ‘,’’) like ‘%%\u5065\u8eab%%’ or replace(lower(t1.title), ’ ‘,’’) like ‘%%\u7b4b\u819c\u67aa%%’) group by t1.time, t1.shop_id, t1.item_id) t1 group by t1.shop_id, t1.time) t1 group by t1.time]
(Background on this error at: http://sqlalche.me/e/13/f405)
【背景】做过哪些操作?
对集群BE做了扩容,加了5个节点;
【业务影响】sql查询报错,集群无法使用
【StarRocks版本】例如:2.4.5
【集群规模】例如:3fe(1 follower+2observer)+10be
【机器信息】CPU虚拟核/内存/网卡,32C/192G/万兆
【联系方式】社区群12-金谡-jinsu@moojing.com
【附件】

  • fe.log/beINFO/相应截图
    故障时间点的集群状态
    mysql> show proc ‘/backends’;
    ±----------±--------------±--------------±-------±---------±---------±--------------------±--------------------±------±---------------------±----------------------±----------±-----------------±--------------±--------------±--------±---------------±-------±--------------±---------------------------------------±------------------±------------±---------+
    | BackendId | IP | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime | LastHeartbeat | Alive | SystemDecommissioned | ClusterDecommissioned | TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct | MaxDiskUsedPct | ErrMsg | Version | Status | DataTotalCapacity | DataUsedPct | CpuCores |
    ±----------±--------------±--------------±-------±---------±---------±--------------------±--------------------±------±---------------------±----------------------±----------±-----------------±--------------±--------------±--------±---------------±-------±--------------±---------------------------------------±------------------±------------±---------+
    | 7012482 | 172.16.16.219 | 9050 | 9060 | 8040 | 8060 | 2023-11-03 12:09:43 | 2023-11-03 12:09:43 | true | false | false | 9836 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 10008 | 172.16.16.24 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:45:59 | 2023-11-02 10:45:59 | true | false | false | 138558 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 10004 | 172.16.16.28 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:30:18 | 2023-11-02 10:30:18 | true | false | false | 138953 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 10005 | 172.16.16.36 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:45:38 | 2023-11-02 10:45:39 | true | false | false | 138699 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 10003 | 172.16.16.38 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:30:03 | 2023-11-02 10:30:03 | true | false | false | 140475 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 10007 | 172.16.16.40 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:45:49 | 2023-11-02 10:45:49 | true | false | false | 138789 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 10006 | 172.16.16.47 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:45:44 | 2023-11-02 10:45:44 | true | false | false | 141672 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 3760638 | 172.16.16.55 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:48:59 | 2023-11-02 10:48:59 | true | false | false | 141754 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 7012486 | 172.16.16.93 | 9050 | 9060 | 8040 | 8060 | 2023-11-03 12:09:43 | 2023-11-03 12:09:43 | true | false | false | 9808 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 7012484 | 172.16.17.172 | 9050 | 9060 | 8040 | 8060 | 2023-11-03 12:09:43 | 2023-11-03 12:09:43 | true | false | false | 9783 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 3760830 | 172.16.17.38 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:49:29 | 2023-11-02 10:49:29 | true | false | false | 142947 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 7012483 | 172.16.18.40 | 9050 | 9060 | 8040 | 8060 | 2023-11-03 12:09:43 | 2023-11-03 12:09:43 | true | false | false | 9823 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 4948302 | 172.16.18.88 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:49:50 | 2023-11-02 10:49:50 | true | false | false | 143102 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 4948303 | 172.16.19.120 | 9050 | 9060 | 8040 | 8060 | 2023-11-02 10:50:06 | 2023-11-02 10:50:06 | true | false | false | 143131 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    | 7012485 | 172.16.19.6 | 9050 | 9060 | 8040 | 8060 | 2023-11-03 12:09:43 | 2023-11-03 12:09:43 | true | false | false | 9786 | 0.000 | 2.736 TB | 2.883 TB | 5.08 % | 5.08 % | | 2.4.5-b910c2b | {“lastSuccessReportTabletsTime”:“N/A”} | 2.736 TB | 0.00 % | 32 |
    ±----------±--------------±--------------±-------±---------±---------±--------------------±--------------------±------±---------------------±----------------------±----------±-----------------±--------------±--------------±--------±---------------±-------±--------------±---------------------------------------±------------------±------------±---------+
    15 rows in set (0.00 sec)

fe.warn.log
2023-11-03 13:49:27,914 WARN (starrocks-mysql-nio-pool-14651|18801) [Coordinator.deliverExecBatchFragmentsRequests():1065] exec plan fragment failed, errmsg=error setting certificate verify locations: CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none, code: INTERNAL_ERROR, fragmentId=F04, backend=172.16.17.172:9060
2023-11-03 13:49:27,916 WARN (starrocks-mysql-nio-pool-14651|18801) [Coordinator.deliverExecBatchFragmentsRequests():1065] exec plan fragment failed, errmsg=error setting certificate verify locations: CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none, code: INTERNAL_ERROR, fragmentId=F04, backend=172.16.18.40:9060
2023-11-03 13:49:27,916 WARN (starrocks-mysql-nio-pool-14651|18801) [Coordinator.deliverExecBatchFragmentsRequests():1065] exec plan fragment failed, errmsg=error setting certificate verify locations: CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none, code: INTERNAL_ERROR, fragmentId=F04, backend=172.16.16.219:9060
2023-11-03 13:49:27,920 WARN (starrocks-mysql-nio-pool-14651|18801) [Coordinator.deliverExecBatchFragmentsRequests():1065] exec plan fragment failed, errmsg=error setting certificate verify locations: CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none, code: INTERNAL_ERROR, fragmentId=F04, backend=172.16.16.93:9060

  • 慢查询:
    • Profile信息
    • 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
    • pipeline是否开启:show variables like ‘%pipeline%’;
    • be节点cpu和内存使用率截图
  • 查询报错:
  • be crash
  • 外表查询报错
    • be.out和fe.warn.log

对照错误信息,手动复制了文件:
cp /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt

查询错误的现象消失;

但不明白为什么会出现这样的错误;在集群已有的be节点上查看这个文件/etc/pki/tls/certs/ca-bundle.crt 都是提示不存在