为了更快的定位您的问题,请提供以下信息,谢谢
【详述】StarRocks 导出 CSV 格式异常
【背景】将 StarRocks 数据导出为 CSV 至 S3
【业务影响】无
【是否存算分离】否
【StarRocks版本】3.3.4
【connector版本】无
【集群规模】例如:1fe+3be
【机器信息】CPU虚拟核/内存/网卡,例如:8C/32G/万兆
【联系方式】社区群19-枫
【附件】
目标
我想将 StarRocks 的数据从 StarRocks 导入至 PostgrsSql 中,因此我执行了以下几步:
-
将 StarRocks 的数据以 CSV 格式,导出至 S3 上,这里有两种方案:
-
使用 AWS PostgresSql 将 S3 上面的 CSV 数据导入至数据库,方案如下:
基础数据
我的数据有几个 VARCHAR 字段,里面的数据存储了一些 JSON ,里面字符串中包含了 |
,,
和 #
等常见的分隔符。
数据表结构:
CREATE TABLE test_table
(
id INT NULL,
lrn VARCHAR(255) NULL,
awb VARCHAR(255) NULL,
tracking_number VARCHAR(255) NULL,
recipient VARCHAR(65533) NULL,
sender VARCHAR(65533) NULL,
currency VARCHAR(255) NULL,
item VARCHAR(65533) NULL,
container VARCHAR(255) NULL,
created_at DATETIME NULL,
updated_at DATETIME NULL,
ioss VARCHAR(255) NULL,
parcel_no VARCHAR(255) NULL,
vat_number VARCHAR(255) NULL,
origin_ioss VARCHAR(255) NULL,
update_ioss_at DATETIME NULL,
archived TINYINT(1) NULL,
transport_cost DECIMAL(9, 2) NULL
)
数据如下:
49321237,
FTLaaaaaaa256993,
235-111111934,
6G58638256993,
{"city": "| COI****THIER", "name": "an****HER", "address": "798 **** Mo", "country": "FR", "zipcode": "7***0", "address2": "", "cityCode": "73"},
{"city": "BEIJING", "name": "BEIJ *** LTD", "address": "R *** HENG", "country": "CN", "zipcode": "1***23", "address2": "SHI***R", "cityCode": "10"},
EUR,
{"count": 1, "value": 0.58, "hsCode": "6***9", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.187, "opedest": "O***ONNEL", "currency": "EUR", "describe": "CLOTHES", "nattrans": 11, "packagingType": ""},
FRFR1111111156,
2023-01-01 04:03:11.311006,
2023-01-01 04:03:11.311006,
\N,
\N,
\N,
\N,
\N,
false,
\N
EXPORT 方案
导出 SQL
EXPORT TABLE test_table
PARTITION ('p202212')
TO "s3://ftl-db-archive/starrocks/export/ods_parcel/s1/"
PROPERTIES
(
"column_separator"=","
)
WITH BROKER
(
"aws.s3.access_key" = "AKIaaaaaaaaaaaaSMKS",
"aws.s3.secret_key" = "TpiaaaaaaaaaaaaaaaxyCuDDaYZr",
"aws.s3.region" = "eu-west-1"
);
CSV 数据
49318060,FTL611111151839,157-1111113,6G58111111119,{"city": "BaaaaST", "name": "ARNaaaaRY", "address": "5 SaaaaN", "country": "FR", "zipcode": "21110", "address2": "", "cityCode": "29"},{"city": "SHaaaaHAI", "name": "SHANaaaaLTD", "address": "Pudoaaaat", "country": "CN", "zipcode": "211115", "address2": "", "cityCode": "20"},EUR,{"count": 1, "value": 24.89, "hsCode": "3111110", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.624, "opedest": "OCaaaaaNEL", "currency": "EUR", "describe": "Diaaaaent", "nattrans": 11, "packagingType": ""},YWFTL11111111103001,2022-12-31 10:47:50.911922,2022-12-31 10:47:50.911922,R8j5uaiaaaaaaaaaHBFw==,\N,\N,\N,\N,true,0.00
CSV 中的 JSON 数据包含了 ,
,应该使用 "
进行包裹:
"{""city"": ""BaaaaST"", ""name"": ""ARNaaaaRY""}"
INSERT INTO FILES 方案
sql
INSERT INTO
FILES(
"path" = "s3://ftl-db-archive/**/insert",
"format" = "csv",
"csv.column_separator"=",",
"csv.enclose"='"',
"csv.escape"="\\",
"compression" = "uncompressed",
"target_max_file_size" = "1024", -- 1KB
"aws.s3.access_key" = "AKIAaaaaaaaaSMKS",
"aws.s3.secret_key" = "TpiojLaaaaaaaaaaaauDDaYZr",
"aws.s3.region" = "eu-west-1"
)
SELECT * FROM test_table LIMIT 1000;
csv 数据
49318068,FTL61111541685,157-1111113,6G511111111185,{"city": "ME11C", "name": "CaaETTE", "address": "10aa rige", "country": "FR", "zipcode": "11110", "address2": "", "cityCode": "19"},{"city": "SH11AI", "name": "SHAN11SS CO.LTD", "address": "P11 ort", "country": "CN", "zipcode": "2115", "address2": "", "cityCode": "20"},EUR,{"count": 1, "value": 14.09, "hsCode": "8111", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.34, "opedest": "OCC111EL", "currency": "EUR", "describe": "Camping light", "nattrans": 11, "packagingType": ""},YW11113058,2022-12-31 10:47:50.911922,2022-12-31 10:47:50.911922,GUub04OA1111E51kG+p2Sg==,\N,\N,\N,\N,true,0.00
上面的 JSON 数据依旧显示不正确,所以,我该怎么做?