StarRocks 导出 CSV 格式异常

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】StarRocks 导出 CSV 格式异常
【背景】将 StarRocks 数据导出为 CSV 至 S3
【业务影响】无
【是否存算分离】否
【StarRocks版本】3.3.4
【connector版本】无
【集群规模】例如:1fe+3be
【机器信息】CPU虚拟核/内存/网卡,例如:8C/32G/万兆
【联系方式】社区群19-枫
【附件】

目标

我想将 StarRocks 的数据从 StarRocks 导入至 PostgrsSql 中,因此我执行了以下几步:

  1. 将 StarRocks 的数据以 CSV 格式,导出至 S3 上,这里有两种方案:

    1. Export data using EXPORT | StarRocks
    2. Unload data using INSERT INTO FILES | StarRocks
  2. 使用 AWS PostgresSql 将 S3 上面的 CSV 数据导入至数据库,方案如下:

    1. Importing data from Amazon S3 to your RDS for PostgreSQL DB instance - Amazon Relational Database Service

基础数据

我的数据有几个 VARCHAR 字段,里面的数据存储了一些 JSON ,里面字符串中包含了 |,# 等常见的分隔符。

数据表结构:

CREATE TABLE test_table
(
    id              INT            NULL,
    lrn             VARCHAR(255)   NULL,
    awb             VARCHAR(255)   NULL,
    tracking_number VARCHAR(255)   NULL,
    recipient       VARCHAR(65533) NULL,
    sender          VARCHAR(65533) NULL,
    currency        VARCHAR(255)   NULL,
    item            VARCHAR(65533) NULL,
    container       VARCHAR(255)   NULL,
    created_at      DATETIME       NULL,
    updated_at      DATETIME       NULL,
    ioss            VARCHAR(255)   NULL,
    parcel_no       VARCHAR(255)   NULL,
    vat_number      VARCHAR(255)   NULL,
    origin_ioss     VARCHAR(255)   NULL,
    update_ioss_at  DATETIME       NULL,
    archived        TINYINT(1)     NULL,
    transport_cost  DECIMAL(9, 2)  NULL
)

数据如下:

49321237,
FTLaaaaaaa256993,
235-111111934,
6G58638256993,
{"city": "| COI****THIER", "name": "an****HER", "address": "798 **** Mo", "country": "FR", "zipcode": "7***0", "address2": "", "cityCode": "73"},
{"city": "BEIJING", "name": "BEIJ ***  LTD", "address": "R *** HENG", "country": "CN", "zipcode": "1***23", "address2": "SHI***R", "cityCode": "10"},
EUR,
{"count": 1, "value": 0.58, "hsCode": "6***9", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.187, "opedest": "O***ONNEL", "currency": "EUR", "describe": "CLOTHES", "nattrans": 11, "packagingType": ""},
FRFR1111111156,
2023-01-01 04:03:11.311006,
2023-01-01 04:03:11.311006,
\N,
\N,
\N,
\N,
\N,
false,
\N

EXPORT 方案

导出 SQL

EXPORT TABLE test_table
PARTITION ('p202212') 
TO "s3://ftl-db-archive/starrocks/export/ods_parcel/s1/"
PROPERTIES
(
    "column_separator"=","
) 
WITH BROKER
(
    "aws.s3.access_key" = "AKIaaaaaaaaaaaaSMKS",
    "aws.s3.secret_key" = "TpiaaaaaaaaaaaaaaaxyCuDDaYZr",
    "aws.s3.region" = "eu-west-1"
);

CSV 数据

49318060,FTL611111151839,157-1111113,6G58111111119,{"city": "BaaaaST", "name": "ARNaaaaRY", "address": "5 SaaaaN", "country": "FR", "zipcode": "21110", "address2": "", "cityCode": "29"},{"city": "SHaaaaHAI", "name": "SHANaaaaLTD", "address": "Pudoaaaat", "country": "CN", "zipcode": "211115", "address2": "", "cityCode": "20"},EUR,{"count": 1, "value": 24.89, "hsCode": "3111110", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.624, "opedest": "OCaaaaaNEL", "currency": "EUR", "describe": "Diaaaaent", "nattrans": 11, "packagingType": ""},YWFTL11111111103001,2022-12-31 10:47:50.911922,2022-12-31 10:47:50.911922,R8j5uaiaaaaaaaaaHBFw==,\N,\N,\N,\N,true,0.00

CSV 中的 JSON 数据包含了 , ,应该使用 " 进行包裹:

"{""city"": ""BaaaaST"", ""name"": ""ARNaaaaRY""}"

INSERT INTO FILES 方案

sql

INSERT INTO 
FILES(
    "path" = "s3://ftl-db-archive/**/insert",
    "format" = "csv",
    "csv.column_separator"=",",
    "csv.enclose"='"',
    "csv.escape"="\\",
    "compression" = "uncompressed",
    "target_max_file_size" = "1024", -- 1KB
    "aws.s3.access_key" = "AKIAaaaaaaaaSMKS",
    "aws.s3.secret_key" = "TpiojLaaaaaaaaaaaauDDaYZr",
    "aws.s3.region" = "eu-west-1"
)
SELECT * FROM test_table LIMIT 1000;

csv 数据

49318068,FTL61111541685,157-1111113,6G511111111185,{"city": "ME11C", "name": "CaaETTE", "address": "10aa rige", "country": "FR", "zipcode": "11110", "address2": "", "cityCode": "19"},{"city": "SH11AI", "name": "SHAN11SS CO.LTD", "address": "P11 ort", "country": "CN", "zipcode": "2115", "address2": "", "cityCode": "20"},EUR,{"count": 1, "value": 14.09, "hsCode": "8111", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.34, "opedest": "OCC111EL", "currency": "EUR", "describe": "Camping light", "nattrans": 11, "packagingType": ""},YW11113058,2022-12-31 10:47:50.911922,2022-12-31 10:47:50.911922,GUub04OA1111E51kG+p2Sg==,\N,\N,\N,\N,true,0.00

上面的 JSON 数据依旧显示不正确,所以,我该怎么做?