使用Flink从MySQL实时同步到StarRocks失败

【详述】根据官方文档操作,使用Flink从MySQL实时同步到StarRocks总是失败,flink中的taskmanager进程总是挂而且flink的机器负载也非常高。机器的物理内存也够用,没有OOM,请问是何原因?

【StarRocks版本】2.5.14
【集群规模】例如:3fe+3be(fe与be混部)
【机器信息】fe/be是3台8c16g, flink机器是8c32g

taskmanager 的日志有打印什么错误信息么?使用的是flink-cdc还有connector版本是什么,sr集群机器负载高么?

感谢回复。
错误日志目前看到有两种:

  1. 2024-02-21 14:28:29,533 ERROR com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Stream Load response:
    com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response:

  2. 2024-02-21 12:00:01,542 ERROR io.debezium.pipeline.ErrorHandler [] - Producer failure
    io.debezium.DebeziumException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event ‘’ at 4, the last event
    read from ‘./mysql-bin.084959’ at 767, the last byte read from ‘./mysql-bin.084959’ at 767. Error code: 1236; SQLSTATE: HY000.
    2024-02-21 12:00:01,808 ERROR org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager [] - Received uncaught exception.
    Caused by: com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This con
    nector will be stopped.
    at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42) ~[flink-sql-connector-mysql-cdc-2.1.1.jar:2.1.1]
    Caused by: io.debezium.DebeziumException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event ‘’ at 4, the
    last event read from ‘./mysql-bin.084959’ at 767, the last byte read from ‘./mysql-bin.084959’ at 767. Error code: 1236; SQLSTATE: HY000.
    at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:223)

在flink的配置文件有单独设置了server-id

我是按官方文档操作的,版本是flink-connector-starrocks-1.2.3_flink-1.14_2.11.jar和flink-sql-connector-mysql-cdc-2.2.0.jar。 sr集群CPU负载还好,load average: 1.77, 2.35, 1.86

刚刚问题又复现了,抓到了新的错误日志
2024-02-21 14:45:47,756 ERROR com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Stream Load response:
{“Status”:“Fail”,“BeginTxnTimeMs”:0,“Message”:“Primary-key index exceeds the limit. tablet_id: 34434, consumption: 7073760808, limit: 6979756621. Memory stat
s of top five tablets: 34223(271M)34304(207M)34300(207M)34737(135M)33656(135M): be:10.10.9.138”,“NumberUnselectedRows”:0,“CommitAndPublishTimeMs”:0,“Label”:"
914d0279-26f5-4b08-b566-54bb11e85ac3",“LoadBytes”:94604536,“StreamLoadPlanTimeMs”:0,“NumberTotalRows”:0,“WriteDataTimeMs”:1,“TxnId”:311963,“LoadTimeMs”:250,"
ReadDataTimeMs":95,“NumberLoadedRows”:0,“NumberFilteredRows”:0}

2024-02-21 14:45:47,757 WARN com.starrocks.connector.flink.manager.StarRocksSinkManager [] - Failed to flush batch data to StarRocks, retry times = 3
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response:
{“Status”:“Fail”,“BeginTxnTimeMs”:0,“Message”:“Primary-key index exceeds the limit. tablet_id: 34434, consumption: 7073760808, limit: 6979756621. Memory stat
s of top five tablets: 34223(271M)34304(207M)34300(207M)34737(135M)33656(135M): be:10.10.9.138”,“NumberUnselectedRows”:0,“CommitAndPublishTimeMs”:0,“Label”:"
914d0279-26f5-4b08-b566-54bb11e85ac3",“LoadBytes”:94604536,“StreamLoadPlanTimeMs”:0,“NumberTotalRows”:0,“WriteDataTimeMs”:1,“TxnId”:311963,“LoadTimeMs”:250,"
ReadDataTimeMs":95,“NumberLoadedRows”:0,“NumberFilteredRows”:0}

在BE节点上也查了错误日志
W0221 14:45:00.855679 26949 fragment_mgr.cpp:186] Fail to open fragment ed42e625-ec45-c461-8a09-c75b49993a89: Memory limit exceeded: Primary-key index exceed
s the limit.

这是BE节点内存不足导致flink挂掉的吗

是的,主键模型的表导入时占用的内存太高了,您可以开启下主键索引落盘enable_persistent_index,减少内存使用 主键表 | StarRocks

好勒,我已经加上这个参数了,正在重新同步观察看看

问题应该解决了,同步到现在目前正常,谢谢您的答复