starrocks跨集群数据迁移工具数据迁移报空指针异常

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
【背景】starrocks 数据迁移
【业务影响】
【是否存算分离】否
【StarRocks版本】例如:starrocks 2.4.1 向3.1.9
【集群规模】例如:5fe(3 follower+2observer)+20be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【表模型】例如:主键模型
【导入或者导出方式】例如:跨集群数据迁移工具
【附件】

  • fe.log/be.INFO/相应截图
  • 完整的报错异常栈
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:54): config one_time_run_mode: true
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:55): config source_fe_host: 192.168.190.35
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:56): config target_fe_host: 192.168.190.203
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:57): config target_cluster_storage_volume:
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:58): config meta_job_interval_seconds: 180
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:59): config meta_job_threads: 4
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:60): config ddl_job_interval_seconds: 10
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:61): config ddl_job_batch_size: 10
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:62): config ddl_job_allow_drop_target_only: false
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:63): config ddl_job_allow_drop_schema_change_table: true
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:64): config ddl_job_allow_drop_inconsistent_partition: true
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:65): config replication_job_interval_seconds: 10
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:66): config replication_job_batch_size: 10
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:67): config report_interval_seconds: 300
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:68): config include databases: []
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:69): config include tables: []
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:70): config exclude databases: []
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:71): config exclude tables: []
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:72): config source cluster hosts: {}
    24/08/29 11:55:00 INFO [main] initConf(SyncJob.java:73): config target cluster hosts: {}
    24/08/29 11:55:00 INFO [replication-job-handler] lambda$start$2(SyncJob.java:106): replication handler started.
    24/08/29 11:55:00 INFO [ddl-handler] lambda$start$1(SyncJob.java:93): DDL handler started.
    24/08/29 11:55:00 INFO [meta-handler] lambda$start$0(SyncJob.java:78): meta handler started.
    24/08/29 11:55:00 INFO [sync-reporter] lambda$start$3(SyncJob.java:119): report handler started.
    24/08/29 11:55:00 INFO [sync-reporter] report(SyncJob.java:189): DDL job queue size: 0
    24/08/29 11:55:00 INFO [sync-reporter] report(SyncJob.java:191): Replication job queue size: 0
    24/08/29 11:55:00 INFO [sync-reporter] report(SyncJob.java:196): Total estimated size of data pending synchronization: 0
    24/08/29 11:55:00 INFO [source-cluster-meta-collector] lambda$updateClusterMeta$0(ClusterMetaKeeper.java:65): Source cluster metadata synchronization started.
    24/08/29 11:55:00 INFO [target-cluster-meta-collector] lambda$updateClusterMeta$1(ClusterMetaKeeper.java:77): Target cluster metadata synchronization started.
    24/08/29 11:55:00 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1188): Databases pending synchronization: []
    24/08/29 11:55:00 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1189): Databases that only exist in the target cluster: []
    24/08/29 11:55:00 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1224): Total source size: 0.000 B, total target size: 0.000 B, total diff: 0.000 B
    24/08/29 11:55:00 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1227): Total source running txn: 0, total source finished txn: 0, total target running txn: 0, total target finished txn: 0
    24/08/29 11:55:00 INFO [sync-reporter] report(SyncJob.java:257): Sync progress: 0.00%, total: 0, ddlPending: 0, ddlRunning: 0, jobPending: 0, sent: 0, jobRunning: 0, finished: 0, failed: 0, unknown: 0
    24/08/29 11:55:00 INFO [sync-reporter] report(SyncJob.java:279): Sync table progress: 0.00%, total table: 0, finished table: 0, unfinished table: 0, unfinished detail: [].
    24/08/29 11:55:01 INFO [source-cluster-meta-collector] lambda$updateClusterBasicMetaInfo$6(ClusterMetaKeeper.java:519): Get cluster 192.168.190.35 run mode: SHARED_NOTHING.
    24/08/29 11:55:01 INFO [target-cluster-meta-collector] lambda$updateClusterBasicMetaInfo$6(ClusterMetaKeeper.java:519): Get cluster 192.168.190.203 run mode: SHARED_NOTHING.
    24/08/29 12:26:45 ERROR [sync-reporter] lambda$start$3(SyncJob.java:125): Failed to report sync progress
    java.lang.NullPointerException: null
    24/08/29 12:26:45 INFO [sync-reporter] report(SyncJob.java:189): DDL job queue size: 0
    24/08/29 12:26:45 INFO [sync-reporter] report(SyncJob.java:191): Replication job queue size: 223
    24/08/29 12:26:45 INFO [sync-reporter] report(SyncJob.java:196): Total estimated size of data pending synchronization: 22276759029670
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1188): Databases pending synchronization: []
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1189): Databases that only exist in the target cluster: []
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1208): Database bds, source size: 0.000 B, target size: 0.000 B, diff: 0.000 B
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1220): Database bds , source running txn: 0, source finished txn: 0, target running txn: 0, target finished txn: 0
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1208): Database iot, source size: 24.606 TB, target size: 635.781 GB, diff: 23.985 TB
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1220): Database iot , source running txn: 0, source finished txn: 0, target running txn: 14, target finished txn: 1002
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1224): Total source size: 24.606 TB, total target size: 635.781 GB, total diff: 23.985 TB
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1227): Total source running txn: 0, total source finished txn: 0, total target running txn: 14, total target finished txn: 1002
    24/08/29 12:26:45 ERROR [sync-reporter] lambda$start$3(SyncJob.java:125): Failed to report sync progress
    java.lang.NullPointerException: null
    24/08/29 12:26:45 INFO [sync-reporter] report(SyncJob.java:189): DDL job queue size: 0
    24/08/29 12:26:45 INFO [sync-reporter] report(SyncJob.java:191): Replication job queue size: 223
    24/08/29 12:26:45 INFO [sync-reporter] report(SyncJob.java:196): Total estimated size of data pending synchronization: 22276759029670
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1188): Databases pending synchronization: []
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1189): Databases that only exist in the target cluster: []
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1208): Database bds, source size: 0.000 B, target size: 0.000 B, diff: 0.000 B
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1220): Database bds , source running txn: 0, source finished txn: 0, target running txn: 0, target finished txn: 0
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1208): Database iot, source size: 24.606 TB, target size: 635.781 GB, diff: 23.985 TB
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1220): Database iot , source running txn: 0, source finished txn: 0, target running txn: 14, target finished txn: 1002
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1224): Total source size: 24.606 TB, total target size: 635.781 GB, total diff: 23.985 TB
    24/08/29 12:26:45 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1227): Total source running txn: 0, total source finished txn: 0, total target running txn: 14, total target finished txn: 1002
    24/08/29 12:26:45 ERROR [sync-reporter] lambda$start$3(SyncJob.java:125): Failed to report sync progress
    java.lang.NullPointerException: null

重新下载最新版本的迁移工具试试 wget https://releases.starrocks.io/starrocks/starrocks-cluster-sync.tar.gz

使用迁移工具,从3.1.11迁移到3.1.11,表ddl信息同步成功,但是某些表的数据没有迁移过去。

24/11/01 16:54:47 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1704): Total source size: 1.007 TB, total target size: 81.968 GB, total diff: 949.340 GB
24/11/01 16:54:47 INFO [sync-reporter] reportDbProgress(ClusterMetaKeeper.java:1707): Total source running txn: 0, total source finished txn: 0, total target running txn: 0, total target finished txn: 72
24/11/01 16:54:47 INFO [sync-reporter] report(SyncJob.java:382): Sync job progress: 0.40%, total: 17930, ddlPending: 17857, ddlRunning: 1, jobPending: 0, sent: 0, jobRunning: 0, finished: 72, failed: 0, sent_failed: 0, unknown: 0
24/11/01 16:54:47 INFO [sync-reporter] report(SyncJob.java:415): Running table detail: []
24/11/01 16:54:47 INFO [sync-reporter] report(SyncJob.java:441): Sync table progress, finishedTableRatio: 100.00%, expiredTableRatio: 1.83%, total table: 109, finished table: 109, unfinished table: 0, expired table: 2.
24/11/01 16:54:47 INFO [sync-reporter] report(SyncJob.java:454): All tables have been synchronized, exit.

最后几行日志信息,可以看到还差total diff: 949.340 GB这些数据没有同步,就退出了。