为了更快的定位您的问题,请提供以下信息,谢谢
【详述】使用dataX同步两套sr集群,表内含有json字段类型且字段无法使用varchar来代替,json中包含很多特殊符号,想使用json格式进行数据同步但出现同步前后数据格式发生了变化
【背景】datax进行两套sr集群数据同步
【业务影响】数据格式会影响线上
【是否存算分离】存算一体
【StarRocks版本】3.2.10 到 4.0版本
【集群规模】例如:3fe(2 follower)+6be(fe与be分开部署)
【机器信息】CPU虚拟核/内存/网卡,:64C/256G/万兆
【表模型】主键模型
【导入或者导出方式】datax 使用json格式
【联系方式】2422203515@qq.com
【附件】
- 数据源表中的数据:
- 数据目标表的数据格式:
1.因同步的表中含有大量的特殊符号,故而CSV方式不作为优先选择,想优先使用JSON格式
2.在使用csv格式时,数据在同步一段时间后,会出现lable标签已存在,引发be节点宕机。
日志如下:
[INFO] 2026-05-26 06:46:21.311 +0000 - -> 2026-05-26 14:46:20.373 [Thread-4] INFO StarRocksStreamLoadVisitor - Executing stream load to: ‘http://10.10.96.3:8030/api/dws/dws_paten_opma_patent_detail_colums/_stream_load’, size: ‘94401353’
2026-05-26 14:46:21.115 [Thread-4] WARN StarRocksWriterManager - Failed to flush batch data to StarRocks, retry times = 0
com.starrocks.shade.org.apache.http.NoHttpResponseException: 10.10.96.24:8040 failed to respond
at com.starrocks.shade.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[starrockswriter-release.jar:na]
at com.starrocks.shade.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[starrockswriter-release.jar:na]
对应集群查询获得:
查询报错sql后,如下:
Failed to access err: Internal error: URL rejected: Malformed input to a URL function
be/src/exec/schema_scanner/schema_load_tracking_logs_scanner.cpp:141 client->execute(&tracking_msg)
这里是dataX的配置:
1.csv的:
{
"job": {
"content":[
{
"reader":{
"name":"starrocksreader",
"parameter":{
"username":"root",
"password":"XXXX",
"connection":[
{
"jdbcUrl":["jdbc:mysql://XX.XX.XX.XX:9030"],
"querySql":["SELECT /*+ SET_VAR(query_timeout = 1000000) */ * FROM tbl.tableA "]
}
]
}
},
"writer":{
"name":"starrockswriter",
"parameter":{
"username":"root",
"password":"XXXX",
"database": "tbl",
"table":"tableB",
"jdbcUrl":"jdbc:mysql://XX.XX.XX.XX:9030/tbl",
"loadUrl": ["XX.XX.XX.XX:8030"],
"column":["*"],
"loadProps": {
"column_separator": "\\x1F",
"row_delimiter": "\\x1D"
}
}
}
}
],
"setting":{
"speed":{
"channel":5,
"record":500000
},
"errorLimit":{
"record":0,
"percentage":0
}
}
}
}
2.json格式的:
{
"job": {
"content":[
{
"reader":{
"name":"starrocksreader",
"parameter":{
"username":"root",
"password":"XXXX",
"connection":[
{
"jdbcUrl":["jdbc:mysql://XX.XX.XX.XX:9030"],
"querySql":["SELECT /*+ SET_VAR(query_timeout = 100000) */ * FROM tbl.tableA"]
}
]
}
},
"writer":{
"name":"starrockswriter",
"parameter":{
"username":"root",
"password":"XXXX",
"database": "tbl",
"table":"tableB",
"jdbcUrl":"jdbc:mysql://XX.XX.XX.XX:9030/tbl",
"loadUrl": ["XX.XX.XX.XX:8030"],
"column":["*"],
"loadProps": {
"format": "json",
"strip_outer_array": "true"
}
}
}
}
],
"setting":{
"speed":{
"channel":5,
"record":500000
},
"errorLimit":{
"record":0,
"percentage":0
}
}
}
}
