starrocks同步starrocks外表数据

【详述】starrocks与starrocks做数据同步:
clusterA 为源同步到 clusterB
在A上新建一个db: tmp_ex__ 作为存放 extanel table的地方。
clusterB上目标表: tbl_a, 聚合模型、非分区表 ;tbl_b, 唯一模型、动态DAY分区
在tmp_ex__下创建外表: tbl_a_ex, tbl_b_ex
在clusterA上执行 insert into tmp_ex__.tbl_a_ex select * from other 报错:[1064][42000]: data cannot be inserted into table with empty partition。

尝试修改tbl_b_ex的分区表达模式,按照START (“2021-01-01”) END (“2023-01-06”) EVERY (INTERVAL 1 DAY) 模式创建表保证 tbl_b与tbl_b_ex分区一致。此时show create table tbl_b_ex是不会报错未知异常

【背景】

  1. 确认端口为fe rpc, AB网络通畅
  2. 确认tbl_a_ex properties中的host为 fe leader
    【业务影响】
    【StarRocks版本】2.3.3
    【集群规模】
    A : 3fe(1 follower+2observer)+5be
    A : 3fe(1 follower+2observer)+3be

tbl_b_ex使用分区表达式模式建表后。通过show create table 报错NPE

2023-01-03 19:51:50,353 WARN (starrocks-mysql-nio-pool-24133|232093) [StmtExecutor.execute():474] execute Exception, sql /* ApplicationName=DataGrip 2022.1.5 */ show create table tmp_ex_db__.cd_dm_label_user_after_sale_info_ex
java.lang.NullPointerException: null

CREATE EXTERNAL TABLE  `cd_dm_label_user_after_sale_info_ex` (
  `dt` date NULL COMMENT "分区字段",
  `user_id` varchar(255) NOT NULL ,
  `return_goods_num_1m` bigint(20) NOT NULL ,
  `return_goods_num_3m` bigint(20) NOT NULL ,
  `return_goods_num_6m` bigint(20) NOT NULL ,
  `return_goods_num_12m` bigint(20) NOT NULL,
  `complaint_num_1m` bigint(20) NOT NULL ,
  `complaint_num_3m` bigint(20) NOT NULL ,
  `complaint_num_6m` bigint(20) NOT NULL ,
  `complaint_num_12m` bigint(20) NOT NULL,
  `etl_load_time` datetime NULL
) ENGINE=olap
UNIQUE KEY(`dt`, `user_id`)
PARTITION BY RANGE (dt) (
START ("2021-01-01") END ("2023-01-06") EVERY (INTERVAL 1 DAY)
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 3
PROPERTIES
(
    "host"="xxx",
    "port"="9020",
    "user"="xxx",
    "password"="XXXX",
    "database"="xxx",
    "table"="cd_dm_label_user_after_sale_info"
);

分区表不带分区表达式时:
在clusterA上操作:

CREATE EXTERNAL TABLE  tmp_ex_db__.`cd_dm_label_user_after_sale_info_ex` (
  `dt` date NULL COMMENT "分区字段",
  `user_id` varchar(255) NOT NULL,
  `return_goods_num_1m` bigint(20) NOT NULL ,
  `return_goods_num_3m` bigint(20) NOT NULL ,
  `return_goods_num_6m` bigint(20) NOT NULL ,
  `return_goods_num_12m` bigint(20) NOT NULL,
  `complaint_num_1m` bigint(20) NOT NULL ,
  `complaint_num_3m` bigint(20) NOT NULL ,
  `complaint_num_6m` bigint(20) NOT NULL ,
  `complaint_num_12m` bigint(20) NOT NULL,
  `etl_load_time` datetime NULL
) ENGINE=olap
UNIQUE KEY(`dt`, `user_id`)
/*PARTITION BY RANGE (dt) (
START ("2021-01-01") END ("2023-01-06") EVERY (INTERVAL 1 DAY)
)*/
DISTRIBUTED BY HASH(`user_id`) BUCKETS 3
PROPERTIES
(
    "host"="clusterB fe leader",
    "port"="9020",
    "user"="xx",
    "password"="xxx",
    "database"="xxx",
    "table"="cd_dm_label_user_after_sale_info"
);

insert into  tmp_ex_db__.cd_dm_label_user_after_sale_info_ex select * from labelx.cd_dm_label_user_after_sale_info;

另外一张聚合模型、非分区表报错信息与这个一样都是:
[42000][1064] data cannot be inserted into table with empty partition.Use SHOW PARTITIONS FROM cd_dm_label_user_after_sale_info_ex to see the currently partitions of this table.

@Natsume729

tbl_a和tbl_b的建表语句请发一下

非分区表

CREATE TABLE `entity_50_dictionary` (
  `code` varchar(1024) NOT NULL,
  `parent_key` varchar(1024) NOT NULL  ,
  `value` varchar(1024) NOT NULL,
  `ordinal` int(11) REPLACE NOT NULL,
  `alias` varchar(1024) REPLACE NULL L,
  `level` int(11) REPLACE NULLL,
  `update_time` datetime REPLACE NOT NULL
) ENGINE=OLAP
AGGREGATE KEY(`code`, `parent_key`, `value`)
DISTRIBUTED BY HASH(`code`, `parent_key`, `value`) BUCKETS 3
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"storage_format" = "DEFAULT",
"enable_persistent_index" = "false"
);

分区表

CREATE TABLE `cd_dm_label_user_after_sale_info` (
  `dt` date NULL ,
  `user_id` varchar(255) NOT NULL,
  `return_goods_num_1m` bigint(20) NOT NULL,
  `return_goods_num_3m` bigint(20) NOT NULL,
  `return_goods_num_6m` bigint(20) NOT NULL,
  `return_goods_num_12m` bigint(20) NOT NULL,
  `complaint_num_1m` bigint(20) NOT NULL,
  `complaint_num_3m` bigint(20) NOT NULL,
  `complaint_num_6m` bigint(20) NOT NULL,
  `complaint_num_12m` bigint(20) NOT NULL,
  `etl_load_time` datetime NULL COMMENT
) ENGINE=OLAP 
UNIQUE KEY(`dt`, `user_id`)
PARTITION BY RANGE(`dt`)
(
    START ("2021-01-01") END ("2023-01-06") EVERY (INTERVAL 1 DAY)
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 3 
PROPERTIES (
"replication_num" = "3",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.time_zone" = "Asia/Shanghai",
"dynamic_partition.start" = "-2147483648",
"dynamic_partition.end" = "2",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "16",
"in_memory" = "false",
"storage_format" = "DEFAULT",
"enable_persistent_index" = "false"
);

您两个sr集群版本是一致的吗?

同一个版本,2.3.3

有看过FE日志吗?里有什么异常吗?

按照你发的建表语句,我本地测试了下,没有遇到"data cannot be inserted into table with empty partition"这个问题,可以发一下当时外表所在的FE日志吗?

@dongquan @wupan @U_1660533790308_5109 SR跨集群外表分区表,在目标是不是不能指定啊,不指定,show create 无异常
像这种:
PARTITION BY RANGE(data_day)
()
跟原表定义一样的话,实测就不行