BACKUP失败:Couldn't resolve host name

【详述】
使用2.5.6版本,没有搭建 broker 节点。

  1. 使用的 BACKUP指令

    > create REPOSITORY big_data_backup_repo  
      WITH BROKER 
      ON LOCATION "s3a://big-data-repo" 
      PROPERTIES (
         "fs.s3a.access.key"="minio",
         "fs.s3a.secret.key"="xxx",
          "fs.s3a.endpoint"="http://ip:port"
      );
    

    在mino中这个目录也创建出来了

    在 fe.log 中也捞到日志

    [BlobStorage.listWithoutBroker():798] finished to list remote path s3a://big-data-repo/__starrocks_repository_big_data_backup_repo/__ss_*. get files: []
    
  2. 备份数据

 > BACKUP SNAPSHOT test.test_data_sync_snapshot 
   TO big_data_backup_repo 
   ON ( 
      data_sync 
    );
  1. 报错
> show backup from test\G;
*************************** 1. row ***************************
               JobId: 107147698
        SnapshotName: test_data_sync_snapshot
              DbName: test
               State: UPLOADING
          BackupObjs: [test.data_sync]
          CreateTime: 2023-06-06 16:12:44
SnapshotFinishedTime: 2023-06-06 16:12:49
  UploadFinishedTime: NULL
        FinishedTime: NULL
     UnfinishedTasks: 107147699=8693692
            Progress: [107147699: 0/0]
          TaskErrMsg: [107147699: S3: fail to list s3a://big-data-repo/__starrocks_repository_big_data_backup_repo/__ss_test_data_sync_snapshot/__ss_content/__db_1859003/__tbl_106934759/__part_106934758/__idx_106934760/__106934761: curlCode: 6, Couldn't resolve host name]
              Status: [OK]
             Timeout: 86400
1 row in set (0.13 sec)

【业务影响】无法备份
【StarRocks版本】例如:2.5.6
【集群规模】例如:5fe +5be(fe与be混部)

broker 节点是必须的,你启动 broker 节点的服务在试试

不是2.5之后不用broker了吗 还得启动啊

贴下对应时间点的be的日志看下

W0124 01:59:58.918219 3079 snapshot_loader.cpp:809] failed to list files in remote path: s3a://srbackrepo/__starrocks_repository_srbackrepo/__ss_snapshot24010401/__ss_content/__db_10005/__tbl_46103/__part_46069/__idx_46104/__47105, msg: S3: fail to list s3a://srbackrepo/__starrocks_repository_srbackrepo/__ss_snapshot24010401/__ss_content/__db_10005/__tbl_46103/__part_46069/__idx_46104/__47105: curlCode: 6, Couldn’t resolve host name
W0124 01:59:58.918334 3079 agent_task.cpp:473] Fail to upload job id=206617 msg=S3: fail to list s3a://srbackrepo/__starrocks_repository_srbackrepo/__ss_snapshot24010401/__ss_content/__db_10005/__tbl_46103/__part_46069/__idx_46104/__47105: curlCode: 6, Couldn’t resolve host name
I0124 01:59:58.919879 3079 agent_task.cpp:491] Uploaded task signature=206618 job id=206617
I0124 01:59:58.919929 3079 agent_task.cpp:463] Got upload task signature=206619 job id=206617
I0124 01:59:58.919936 3079 snapshot_loader.cpp:78] begin to upload snapshot files. num: 1, job: 206617, task206619
I0124 01:59:58.919943 3079 snapshot_loader.cpp:963] report to frontend. job id: 206617, task id: 206619, finished num: 0, total num:0
I0124 01:59:58.923324 3079 snapshot_loader.cpp:734] all local snapshot paths are existing. num: 1

sr的机器没配置hdfs的机器域名解析吧

使用的是minio,理论上来说他是连接上的,内容可以承接上面的内容,我们是一家公司

文件目录都在minio中正确创建了

看报错就是解析域名报错了,确认下sr机器解析minio的访问endpoint是否有问题,在/etc/hosts中配置下解析

使用的是ip地址

一定要用域名么

在sr的机器上用minio的客户端访问这个文件看看s3a://srbackrepo/__starrocks_repository_srbackrepo/__ss_snapshot24010401/__ss_content/__db_10005/__tbl_46103/__part_46069/__idx_46104/__47105

[root@master192 ~]# rclone ls -vv ops://srbackrepo/__starrocks_repository_srbackrepo/__ss_snapshot24010401/__ss_content/__db_10005/__tbl_46103/__part_46069/__idx_46104/__47105
2024/01/24 17:11:52 DEBUG : rclone: Version “v1.65.1” starting with parameters [“rclone” “ls” “-vv” “ops://srbackrepo/__starrocks_repository_srbackrepo/__ss_snapshot24010401/__ss_content/__db_10005/__tbl_46103/__part_46069/__idx_46104/__47105”]
2024/01/24 17:11:52 DEBUG : Creating backend with remote “ops://srbackrepo/__starrocks_repository_srbackrepo/__ss_snapshot24010401/__ss_content/__db_10005/__tbl_46103/__part_46069/__idx_46104/__47105”
2024/01/24 17:11:52 DEBUG : Using config file from “/root/.config/rclone/rclone.conf”
2024/01/24 17:11:52 DEBUG : Resolving service “s3” region “us-east-1”
2024/01/24 17:11:52 DEBUG : fs cache: renaming cache item “ops://srbackrepo/__starrocks_repository_srbackrepo/__ss_snapshot24010401/__ss_content/__db_10005/__tbl_46103/__part_46069/__idx_46104/__47105” to be canonical “ops:srbackrepo/__starrocks_repository_srbackrepo/__ss_snapshot24010401/__ss_content/__db_10005/__tbl_46103/__part_46069/__idx_46104/__47105”
2024/01/24 17:11:52 DEBUG : 6 go routines active

rclone没有报错,但是文件信息也不存在

[root@master192 ~]# rclone lsd ops:
-1 2024-01-23 17:50:42 -1 srbackrepo

能提供下CREATE REPO对应的脱敏的SQL吗?

创建repo的时候改成下面的格式,另外ssl关闭
aws.s3.access_key=xxx
aws.s3.secret_key=xxxx
aws.s3.enable_ssl=false
aws.s3.endpoint=‘xxxx’

create REPOSITORY srbackrepo WITH BROKER ON LOCATION “s3a://srbackrepo” PROPERTIES (“fs.s3a.access.key”=“minio”,“fs.s3a.secret.key”="!xxxx",“fs.s3a.endpoint”=“http://192.168.60.155:28933”);

mysql> create REPOSITORY srbackrepo2 WITH BROKER ON LOCATION “s3a://srbackrepo” PROPERTIES (“aws.s3.access_key”=“minio”,“aws.s3.secret_key”=“xxxx”,“aws.s3.endpoint”=“http://minio.ops.svc:9000”,“aws.s3.enable_ssl”=“false”);

ERROR 1064 (HY000): Unexpected exception: Failed to create repository: failed to list remote path: s3a://srbackrepo/__starrocks_repository_srbackrepo2/__repo_info. msg: unknown error when get file status: Connection pool shut down

我看换了一个报错了。不是 host name 不能解析了,看看这个 fe.log 里面的堆栈吧。