HDFS备份后恢复失败

【详述】创建HDFS类型的Repository,并执行简单的表备份和恢复操作
【背景】

  1. 创建hdfs Repository
  2. backup snapshot 2张表
  3. 执行 restore snapshot 其中的一张表。
    【业务影响】
    测试中,暂无生产业务
    【StarRocks版本】3.1.2
MySQL [(none)]> select current_version();
+-------------------+
| current_version() |
+-------------------+
| 3.1.2-4f3a2ee     |
+-------------------+
1 row in set (0.01 sec)

MySQL [(none)]> 

【集群规模】3fe+3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡 48C/128G/万兆
【联系方式】社区群16-段方伟-Daniel Duan 邮箱 duanfangwei2012@sina.com
【附件】

创建 Repository

CREATE   REPOSITORY hz_hdfs_repo
WITH BROKER
ON LOCATION "hdfs://nameservice1/starrocks_repo/"
PROPERTIES (
"username"="hive",
"password"="xxxxxxxxx"
);

备份

BACKUP SNAPSHOT test.snap_20231107
TO hz_hdfs_repo
ON (colocate_tb1,colocate_tb2)

表结构

REATE TABLE `colocate_tb1` (
    `data_time` date NOT NULL COMMENT "",
    `order_id` int(11) NOT NULL COMMENT "",
    `price` int(11) SUM NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`data_time`, `order_id`)
PARTITION BY RANGE(`data_time`)
(
    PARTITION p1 VALUES LESS THAN ('2019-05-31'),
    PARTITION p2 VALUES LESS THAN ('2019-06-30')
)
DISTRIBUTED BY HASH(`order_id`) BUCKETS 32
PROPERTIES (
    "colocate_with" = "test_group",
    "replication_num" = "3",
    "enable_persistent_index" = "true",
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-3",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32",
    "dynamic_partition.history_partition_num" = "0"
);

CREATE TABLE `colocate_tb2` (
    `data_time` date NOT NULL COMMENT "",
    `order_id` int(11) NOT NULL COMMENT "",
    `order_count` int(11) SUM NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`data_time`, `order_id`)
PARTITION BY RANGE(`data_time`)
(
    PARTITION p1 VALUES LESS THAN ('2019-05-31'),
    PARTITION p2 VALUES LESS THAN ('2019-06-30')
)
DISTRIBUTED BY HASH(`order_id`)  BUCKETS 32
PROPERTIES (
    "colocate_with" = "test_group",
    "replication_num" = "3",
    "enable_persistent_index" = "true",
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-3",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32",
    "dynamic_partition.history_partition_num" = "0"
);

清空表

truncate table colocate_tb1

查看 快照

MySQL [(none)]> SHOW SNAPSHOT ON hz_hdfs_repo;
+---------------+-------------------------+--------+
| Snapshot      | Timestamp               | Status |
+---------------+-------------------------+--------+
| snap_20231107 | 2023-11-07-15-04-24-494 | OK     |
+---------------+-------------------------+--------+
1 row in set (0.02 sec)

MySQL [(none)]> 

Restore

RESTORE SNAPSHOT test.snap_20231107
FROM hz_hdfs_repo
ON (colocate_tb1)
PROPERTIES (
    "backup_timestamp"="2023-11-07-15-04-24-494",
    "replication_num" = "3"
);

查看进度

show restore\G;

-- 部分错误日志
 [27574: hdfsOpenFile failed, file=hdfs://nameservice1/starrocks_repo//__starrocks_repository_hz_hdfs_repo/__ss_snap_20231107/__ss_content/__db_10227/__tbl_23679/__part_25508/__idx_23680/__25549/5549.dcgs_snapshot.d8a7e7f14979e81f7a58077fa19a147e], [27575: hdfsOpenFile failed, file=hdfs://nameservice1/starrocks_repo//__starrocks_repository_hz_hdfs_repo/__ss_snap_20231107/__ss_content/__db_10227/__tbl_23679/__part_25508/__idx_23680/__25613/5613.dcgs_snapshot.d8a7e7f14979e81f7a58077fa19a147e], [27576: hdfsOpenFile failed, file=hdfs://nameservice1/starrocks_repo//__starrocks_repository_hz_hdfs_repo/__ss_snap_20231107/__ss_content/__db_10227/__tbl_23679/__part_25508/__idx_23680/__25581/5581.dcgs_snapshot.d8a7e7f14979e81f7a58077fa19a147e

比如最后一个错误文件是 5581.dcgs_snapshot.d8a7e7f14979e81f7a58077fa19a147e
而hdfs对应文件夹的文件是 25581.dcgs_snapshot.d8a7e7f14979e81f7a58077fa19a147e

好的,我们先复现一下这个问题

@DanielDuan 可以把仓库路径最后的“/”去掉再试试看,应该是这块路径解析有一些问题

去掉路径最后的“/” 重新测试后,可以恢复。