【详述】根据官方文档操作,使用Flink从MySQL实时同步到StarRocks总是失败,flink中的taskmanager进程总是挂而且flink的机器负载也非常高。机器的物理内存也够用,没有OOM,请问是何原因?
【StarRocks版本】2.5.14
【集群规模】例如:3fe+3be(fe与be混部)
【机器信息】fe/be是3台8c16g, flink机器是8c32g
【详述】根据官方文档操作,使用Flink从MySQL实时同步到StarRocks总是失败,flink中的taskmanager进程总是挂而且flink的机器负载也非常高。机器的物理内存也够用,没有OOM,请问是何原因?
【StarRocks版本】2.5.14
【集群规模】例如:3fe+3be(fe与be混部)
【机器信息】fe/be是3台8c16g, flink机器是8c32g
taskmanager 的日志有打印什么错误信息么?使用的是flink-cdc还有connector版本是什么,sr集群机器负载高么?
感谢回复。
错误日志目前看到有两种:
2024-02-21 14:28:29,533 ERROR com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Stream Load response:
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response:
2024-02-21 12:00:01,542 ERROR io.debezium.pipeline.ErrorHandler [] - Producer failure
io.debezium.DebeziumException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event ‘’ at 4, the last event
read from ‘./mysql-bin.084959’ at 767, the last byte read from ‘./mysql-bin.084959’ at 767. Error code: 1236; SQLSTATE: HY000.
2024-02-21 12:00:01,808 ERROR org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager [] - Received uncaught exception.
Caused by: com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This con
nector will be stopped.
at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42) ~[flink-sql-connector-mysql-cdc-2.1.1.jar:2.1.1]
Caused by: io.debezium.DebeziumException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event ‘’ at 4, the
last event read from ‘./mysql-bin.084959’ at 767, the last byte read from ‘./mysql-bin.084959’ at 767. Error code: 1236; SQLSTATE: HY000.
at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:223)
在flink的配置文件有单独设置了server-id
我是按官方文档操作的,版本是flink-connector-starrocks-1.2.3_flink-1.14_2.11.jar和flink-sql-connector-mysql-cdc-2.2.0.jar。 sr集群CPU负载还好,load average: 1.77, 2.35, 1.86
刚刚问题又复现了,抓到了新的错误日志
2024-02-21 14:45:47,756 ERROR com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Stream Load response:
{“Status”:“Fail”,“BeginTxnTimeMs”:0,“Message”:“Primary-key index exceeds the limit. tablet_id: 34434, consumption: 7073760808, limit: 6979756621. Memory stat
s of top five tablets: 34223(271M)34304(207M)34300(207M)34737(135M)33656(135M): be:10.10.9.138”,“NumberUnselectedRows”:0,“CommitAndPublishTimeMs”:0,“Label”:"
914d0279-26f5-4b08-b566-54bb11e85ac3",“LoadBytes”:94604536,“StreamLoadPlanTimeMs”:0,“NumberTotalRows”:0,“WriteDataTimeMs”:1,“TxnId”:311963,“LoadTimeMs”:250,"
ReadDataTimeMs":95,“NumberLoadedRows”:0,“NumberFilteredRows”:0}
2024-02-21 14:45:47,757 WARN com.starrocks.connector.flink.manager.StarRocksSinkManager [] - Failed to flush batch data to StarRocks, retry times = 3
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response:
{“Status”:“Fail”,“BeginTxnTimeMs”:0,“Message”:“Primary-key index exceeds the limit. tablet_id: 34434, consumption: 7073760808, limit: 6979756621. Memory stat
s of top five tablets: 34223(271M)34304(207M)34300(207M)34737(135M)33656(135M): be:10.10.9.138”,“NumberUnselectedRows”:0,“CommitAndPublishTimeMs”:0,“Label”:"
914d0279-26f5-4b08-b566-54bb11e85ac3",“LoadBytes”:94604536,“StreamLoadPlanTimeMs”:0,“NumberTotalRows”:0,“WriteDataTimeMs”:1,“TxnId”:311963,“LoadTimeMs”:250,"
ReadDataTimeMs":95,“NumberLoadedRows”:0,“NumberFilteredRows”:0}
在BE节点上也查了错误日志
W0221 14:45:00.855679 26949 fragment_mgr.cpp:186] Fail to open fragment ed42e625-ec45-c461-8a09-c75b49993a89: Memory limit exceeded: Primary-key index exceed
s the limit.
这是BE节点内存不足导致flink挂掉的吗
好勒,我已经加上这个参数了,正在重新同步观察看看
问题应该解决了,同步到现在目前正常,谢谢您的答复