dataX同步两套sr集群,表内含有json结构

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】使用dataX同步两套sr集群,表内含有json字段类型且字段无法使用varchar来代替,json中包含很多特殊符号,想使用json格式进行数据同步但出现同步前后数据格式发生了变化
【背景】datax进行两套sr集群数据同步
【业务影响】数据格式会影响线上
【是否存算分离】存算一体
【StarRocks版本】3.2.10 到 4.0版本
【集群规模】例如:3fe(2 follower)+6be(fe与be分开部署)
【机器信息】CPU虚拟核/内存/网卡,:64C/256G/万兆
【表模型】主键模型
【导入或者导出方式】datax 使用json格式
【联系方式】2422203515@qq.com
【附件】

1.因同步的表中含有大量的特殊符号,故而CSV方式不作为优先选择,想优先使用JSON格式
2.在使用csv格式时,数据在同步一段时间后,会出现lable标签已存在,引发be节点宕机。
日志如下:
[INFO] 2026-05-26 06:46:21.311 +0000 - -> 2026-05-26 14:46:20.373 [Thread-4] INFO StarRocksStreamLoadVisitor - Executing stream load to: ‘http://10.10.96.3:8030/api/dws/dws_paten_opma_patent_detail_colums/_stream_load’, size: ‘94401353’
2026-05-26 14:46:21.115 [Thread-4] WARN StarRocksWriterManager - Failed to flush batch data to StarRocks, retry times = 0
com.starrocks.shade.org.apache.http.NoHttpResponseException: 10.10.96.24:8040 failed to respond
at com.starrocks.shade.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[starrockswriter-release.jar:na]

对应集群查询获得:


查询报错sql后,如下:
Failed to access err: Internal error: URL rejected: Malformed input to a URL function
be/src/exec/schema_scanner/schema_load_tracking_logs_scanner.cpp:141 client->execute(&tracking_msg)

这里是dataX的配置:
1.csv的:
{

"job": {

    "content":[

        {

            "reader":{

      "name":"starrocksreader",

      "parameter":{

        "username":"root",

        "password":"XXXX",

        "connection":[

          {

        "jdbcUrl":["jdbc:mysql://XX.XX.XX.XX:9030"],

        "querySql":["SELECT /*+ SET_VAR(query_timeout = 1000000) */ * FROM tbl.tableA "]

        }

      ]

      }

    },

            "writer":{

                "name":"starrockswriter",

                "parameter":{

                    "username":"root",

                    "password":"XXXX",

                    "database": "tbl",

                                "table":"tableB",

                                "jdbcUrl":"jdbc:mysql://XX.XX.XX.XX:9030/tbl",

                    "loadUrl": ["XX.XX.XX.XX:8030"],

                    "column":["*"],

                    "loadProps": {

                            "column_separator": "\\x1F",

                            "row_delimiter": "\\x1D"

                        }

                }

            }

        }

    ],

    "setting":{

        "speed":{

            "channel":5,

            "record":500000

        },

        "errorLimit":{

            "record":0,

            "percentage":0

        }

    }

}

}

2.json格式的:
{

"job": {

    "content":[

        {

            "reader":{

      "name":"starrocksreader",

      "parameter":{

        "username":"root",

        "password":"XXXX",

        "connection":[

          {

        "jdbcUrl":["jdbc:mysql://XX.XX.XX.XX:9030"],

        "querySql":["SELECT /*+ SET_VAR(query_timeout = 100000) */ * FROM tbl.tableA"]

        }

      ]

      }

    },

            "writer":{

                "name":"starrockswriter",

                "parameter":{

                    "username":"root",

                    "password":"XXXX",

                    "database": "tbl",

                                "table":"tableB",

                                "jdbcUrl":"jdbc:mysql://XX.XX.XX.XX:9030/tbl",

                    "loadUrl": ["XX.XX.XX.XX:8030"],

                    "column":["*"],

                    "loadProps": {

                            "format": "json",

                            "strip_outer_array": "true"

                        }

                }

            }

        }

    ],

    "setting":{

        "speed":{

            "channel":5,

            "record":500000

        },

        "errorLimit":{

            "record":0,

            "percentage":0

        }

    }

}

}