为了更快的定位您的问题,请提供以下信息,谢谢
【详述】有两套starrocks集群,需要把生产集群数据迁移至测试集群,因为网不通,只能先把生产集群数据备份至HDFS,然后测试集群再从HDFS恢复数据,按照官网进行操作的:https://docs.starrocks.io/zh/docs/administration/management/Backup_and_restore/,通过backup命令可以成功将生产集群数据备份至HDFS,然后使用restore命令将数据恢复至测试环境,但是restore任务提交后一直卡在commit阶段,查看be日志一直报错,错误信息见附件内容。
【背景】生产环境和测试环境的sr版本一致,都是3.3.5,表结构也都是一样的。
【业务影响】无
【是否存算分离】存算一体
【StarRocks版本】3.3.5
【集群规模】生产环境:3fe+9be(都是物理机,分开部署),测试环境:3台物理机(fe与be混部)
【机器信息】48C/64G/万兆
【表模型】明细表
【导入或者导出方式】备份与恢复
【联系方式】
【附件】
部分be.log信息如下:
I20241218 14:51:59.078144 140155886192384 agent_server.cpp:503] Submit task success. type=MOVE, signature=150078, task_count_in_queue=211
I20241218 14:51:59.078146 140155886192384 agent_server.cpp:503] Submit task success. type=MOVE, signature=150079, task_count_in_queue=212
I20241218 14:51:59.078148 140155886192384 agent_server.cpp:503] Submit task success. type=MOVE, signature=150081, task_count_in_queue=213
I20241218 14:51:59.082742 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 0200000000e9f2bb2b4fade09f1df283bdaa115d733248a8 with 020000001044683acd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.082787 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 0200000000eaa58c2b4fade09f1df283bdaa115d733248a8 with 020000001044683bcd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.083332 139576068798208 tablet_updates.cpp:4750] load full snapshot start #rowset:2 version:18 tablet:148181 #version:1 [1 1@0 1] pending: rowsets:0
W20241218 14:51:59.083420 139576068798208 agent_task.cpp:796] Fail to move job id=147192, mismatched schema hash
W20241218 14:51:59.083434 139576068798208 agent_task.cpp:817] Fail to move dir=/opt/starrocks/starrocks_3.3.5/be/storage/snapshot/20241218145122.225.86400/148181/60140210 tablet id=148181 signature=149634 job id=147192
I20241218 14:51:59.087112 139576068798208 agent_task.cpp:157] Remove task success. type=MOVE, signature=149634, task_count_in_queue=212
I20241218 14:51:59.087130 139576068798208 agent_task.cpp:806] Got move dir task signature=149635 job id=147192
I20241218 14:51:59.087136 139576068798208 snapshot_loader.cpp:444] begin to move snapshot files. from: /opt/starrocks/starrocks_3.3.5/be/storage/snapshot/20241218145122.230.86400/147855/60140210, to: /opt/starrocks/starrocks_3.3.5/be/storage/data/427/147855/60140210, store: /opt/starrocks/starrocks_3.3.5/be/storage, job: 147192, task id: 147855
I20241218 14:51:59.087297 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 02000000009a356f734ef578daea932662f3ea1c05a4dc80 with 020000001044683ccd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.087318 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 0200000000ea7669734ef578daea932662f3ea1c05a4dc80 with 020000001044683dcd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.087327 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 0200000000eaed7a734ef578daea932662f3ea1c05a4dc80 with 020000001044683ecd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.111161 140156812031744 stream_load.cpp:243] new income streaming load request.id=6242328cb7b9e681-7820706af4bb888c, job_id=-1, txn_id: -1, label=f2f53ae8-df64-4e46-9e4e-cc8113d5e61a, db=dsg_fdcas, db=dsg_fdcas, tbl=ees_fd_uva_hist
I20241218 14:51:59.119480 140156812031744 stream_load_executor.cpp:77] begin to execute job. label=f2f53ae8-df64-4e46-9e4e-cc8113d5e61a, txn_id: 1042530, query_id=6242328c-b7b9-e681-7820-706af4bb888c
I20241218 14:51:59.119523 140156812031744 plan_fragment_executor.cpp:83] Prepare(): query_id=6242328c-b7b9-e681-7820-706af4bb888c fragment_instance_id=6242328c-b7b9-e681-7820-706af4bb888d backend_num=0
I20241218 14:51:59.123286 140171456026368 plan_fragment_executor.cpp:192] Open(): fragment_instance_id=6242328c-b7b9-e681-7820-706af4bb888d
I20241218 14:51:59.131667 139576068798208 tablet_updates.cpp:4750] load full snapshot start #rowset:3 version:16 tablet:147855 #version:1 [1 1@0 1] pending: rowsets:0
W20241218 14:51:59.131739 139576068798208 agent_task.cpp:796] Fail to move job id=147192, mismatched schema hash
W20241218 14:51:59.131748 139576068798208 agent_task.cpp:817] Fail to move dir=/opt/starrocks/starrocks_3.3.5/be/storage/snapshot/20241218145122.230.86400/147855/60140210 tablet id=147855 signature=149635 job id=147192
I20241218 14:51:59.131915 139576068798208 agent_task.cpp:157] Remove task success. type=MOVE, signature=149635, task_count_in_queue=211
I20241218 14:51:59.131928 139576068798208 agent_task.cpp:806] Got move dir task signature=149636 job id=147192
I20241218 14:51:59.131932 139576068798208 snapshot_loader.cpp:444] begin to move snapshot files. from: /opt/starrocks/starrocks_3.3.5/be/storage/snapshot/20241218145122.229.86400/148018/60140210, to: /opt/starrocks/starrocks_3.3.5/be/storage/data/454/148018/60140210, store: /opt/starrocks/starrocks_3.3.5/be/storage, job: 147192, task id: 148018
I20241218 14:51:59.132052 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 02000000009aac42734ef578daea932662f3ea1c05a4dc80 with 0200000010446c52cd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.132069 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 0200000000eaed56734ef578daea932662f3ea1c05a4dc80 with 0200000010446c55cd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.132481 140157253773056 local_tablets_channel.cpp:725] LocalTabletsChannel txn_id: 1042530 load_id: 6242328c-b7b9-e681-7820-706af4bb888c open 1091 delta writers, 0 failed_tablets: _num_remaining_senders: 1
I20241218 14:51:59.176252 139576068798208 tablet_updates.cpp:4750] load full snapshot start #rowset:2 version:5 tablet:148018 #version:1 [1 1@0 1] pending: rowsets:0
W20241218 14:51:59.176310 139576068798208 agent_task.cpp:796] Fail to move job id=147192, mismatched schema hash
W20241218 14:51:59.176319 139576068798208 agent_task.cpp:817] Fail to move dir=/opt/starrocks/starrocks_3.3.5/be/storage/snapshot/20241218145122.229.86400/148018/60140210 tablet id=148018 signature=149636 job id=147192
I20241218 14:51:59.176479 139576068798208 agent_task.cpp:157] Remove task success. type=MOVE, signature=149636, task_count_in_queue=210
I20241218 14:51:59.176492 139576068798208 agent_task.cpp:806] Got move dir task signature=149637 job id=147192
I20241218 14:51:59.176496 139576068798208 snapshot_loader.cpp:444] begin to move snapshot files. from: /opt/starrocks/starrocks_3.3.5/be/storage/snapshot/20241218145122.226.86400/148248/60140210, to: /opt/starrocks/starrocks_3.3.5/be/storage/data/492/148248/60140210, store: /opt/starrocks/starrocks_3.3.5/be/storage, job: 147192, task id: 148248
I20241218 14:51:59.176636 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 020000000099bf852b4fade09f1df283bdaa115d733248a8 with 0200000010446c84cd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.176673 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 0200000000ea10702b4fade09f1df283bdaa115d733248a8 with 0200000010446c85cd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.176681 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 0200000000eac31e2b4fade09f1df283bdaa115d733248a8 with 0200000010446c86cd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.176967 139576068798208 tablet_updates.cpp:4750] load full snapshot start #rowset:3 version:18 tablet:148248 #version:1 [1 1@0 1] pending: rowsets:0
W20241218 14:51:59.177019 139576068798208 agent_task.cpp:796] Fail to move job id=147192, mismatched schema hash
W20241218 14:51:59.177026 139576068798208 agent_task.cpp:817] Fail to move dir=/opt/starrocks/starrocks_3.3.5/be/storage/snapshot/20241218145122.226.86400/148248/60140210 tablet id=148248 signature=149637 job id=147192
I20241218 14:51:59.177223 139576068798208 agent_task.cpp:157] Remove task success. type=MOVE, signature=149637, task_count_in_queue=209
I20241218 14:51:59.177248 139576068798208 agent_task.cpp:806] Got move dir task signature=149638 job id=147192
I20241218 14:51:59.177252 139576068798208 snapshot_loader.cpp:444] begin to move snapshot files. from: /opt/starrocks/starrocks_3.3.5/be/storage/snapshot/20241218145122.232.86400/129994/60140210, to: /opt/starrocks/starrocks_3.3.5/be/storage/data/343/129994/60140210, store: /opt/starrocks/starrocks_3.3.5/be/storage, job: 147192, task id: 129994
I20241218 14:51:59.177377 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 02000000009a02b1a9420e4e7397134cfa0b7779f1575c8e with 0200000010446c87cd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.177394 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 0200000000ea582aa9420e4e7397134cfa0b7779f1575c8e with 0200000010446c88cd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.177403 139576068798208 snapshot_manager.cpp:784] Replacing rowset id 0200000000eaed56a9420e4e7397134cfa0b7779f1575c8e with 0200000010446c89cd4bcfac18f10c9504bb753e65e419b1
I20241218 14:51:59.177679 139576068798208 tablet_updates.cpp:4750] load full snapshot start #rowset:3 version:16 tablet:129994 #version:1 [1 1@0 1] pending: rowsets:0
W20241218 14:51:59.177729 139576068798208 agent_task.cpp:796] Fail to move job id=147192, mismatched schema hash
W20241218 14:51:59.177735 139576068798208 agent_task.cpp:817] Fail to move dir=/opt/starrocks/starrocks_3.3.5/be/storage/snapshot/20241218145122.232.86400/129994/60140210 tablet id=129994 signature=149638 job id=147192
I20241218 14:51:59.177941 139576068798208 agent_task.cpp:157] Remove task success. type=MOVE, signature=149638, task_count_in_queue=208
I20241218 14:51:59.177951 139576068798208 agent_task.cpp:806] Got move dir task signature=149639 job id=147192
测试集群在restore之前是否已经建好这些需要被恢复的表?还是这些表是restore语句创建出来的