flink1.17的checkpoint的时间耗费很长【starrocks3存算分离架构】

【StarRocks版本】3.1.2
【集群规模】3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】48C/256G/万兆
【详述】查看flink在streamload到StarRocks3的存算分离的架构中的日志显示,有时会出现超时的情况,这个需要怎么优化配置

2023-11-02 09:52:58,894 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction commit, label: flink-f2f8b0f4-bb43-48f3-bcb3-ee390bd1fd32, request : POST http://10.211.4.12:8030/api/transaction/commit HTTP/1.1

2023-11-02 09:53:00,802 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction prepared, label : flink-8f07c37b-1bfe-4fd9-a9ab-88481618d751, body : {

"Status": "OK",

"Message": "",

"Label": "flink-8f07c37b-1bfe-4fd9-a9ab-88481618d751",

"TxnId": 202740,

"NumberTotalRows": 19380,

"NumberLoadedRows": 19380,

"NumberFilteredRows": 0,

"NumberUnselectedRows": 0,

"LoadBytes": 43336962,

"LoadTimeMs": 115149,

"StreamLoadPlanTimeMs": 180,

"ReceivedDataTimeMs": 108,

"WriteDataTimeMs": 114964,

"CommitAndPublishTimeMs": 1

}

2023-11-02 09:53:00,802 INFO com.starrocks.data.load.stream.v2.StreamLoadManagerV2 [] - Receive load response, cacheByteBeforeFlush: 11425808, currentCacheBytes: 11425808, totalFlushRows : 1007446

2023-11-02 09:53:00,803 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction commit, label: flink-8f07c37b-1bfe-4fd9-a9ab-88481618d751, request : POST http://10.211.4.12:8030/api/transaction/commit HTTP/1.1

2023-11-02 09:53:00,947 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction prepared, label : flink-fcbf0c29-5737-4167-93c3-4b64024c998c, body : {

"Status": "OK",

"Message": "",

"Label": "flink-fcbf0c29-5737-4167-93c3-4b64024c998c",

"TxnId": 202741,

"NumberTotalRows": 19379,

"NumberLoadedRows": 19379,

"NumberFilteredRows": 0,

"NumberUnselectedRows": 0,

"LoadBytes": 43367608,

"LoadTimeMs": 115297,

"StreamLoadPlanTimeMs": 179,

"ReceivedDataTimeMs": 124,

"WriteDataTimeMs": 115112,

"CommitAndPublishTimeMs": 1

}

2023-11-02 09:53:00,947 INFO com.starrocks.data.load.stream.v2.StreamLoadManagerV2 [] - Receive load response, cacheByteBeforeFlush: 11431432, currentCacheBytes: 11431432, totalFlushRows : 1007446

2023-11-02 09:53:00,947 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction commit, label: flink-fcbf0c29-5737-4167-93c3-4b64024c998c, request : POST http://10.211.4.12:8030/api/transaction/commit HTTP/1.1

2023-11-02 09:53:02,162 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction committed, lable: flink-c2d2014c-87e7-497b-a5ca-f93370c349b0, body : {

“Status”: “OK”,

“Message”: “”,

“Label”: “flink-c2d2014c-87e7-497b-a5ca-f93370c349b0”

}

2023-11-02 09:53:02,162 INFO com.starrocks.data.load.stream.v2.StreamLoadManagerV2 [] - Receive load response, cacheByteBeforeFlush: 11854442, currentCacheBytes: 11854442, totalFlushRows : 1007447

2023-11-02 09:53:02,162 INFO com.starrocks.data.load.stream.v2.TransactionTableRegion [] - Success to commit transaction: Transaction{database=‘logData’, table=‘logData_url’, label=‘flink-c2d2014c-87e7-497b-a5ca-f93370c349b0’, finish=false}, duration: 417248 ms

2023-11-02 09:53:05,753 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction committed, lable: flink-f95b7167-5398-458b-881f-69a17368539b, body : {

“Status”: “FAILED”,

“Message”: “class com.starrocks.common.UserException: publish timeout: 20000”

}

2023-11-02 09:53:05,753 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction committed, lable: flink-f692e043-dddc-4129-b30b-a068bd7f3e0f, body : {

“Status”: “FAILED”,

“Message”: “class com.starrocks.common.UserException: publish timeout: 20000”

}

2023-11-02 09:53:05,754 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction committed, lable: flink-e3854c1c-e64a-438b-80c8-407e4a7361ff, body : {

“Status”: “FAILED”,

“Message”: “class com.starrocks.common.UserException: publish timeout: 20000”

}

2023-11-02 09:53:05,756 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction committed, lable: flink-953c673f-b357-492c-a71f-1c01761fdc7d, body : {

“Status”: “FAILED”,

“Message”: “class com.starrocks.common.UserException: publish timeout: 20000”

}

2023-11-02 09:53:05,756 INFO com.starrocks.data.load.stream.DefaultStreamLoader [] - Response for get_load_state, label: flink-e3854c1c-e64a-438b-80c8-407e4a7361ff, response status code: 200, response body : {“state”:“COMMITTED”,“status”:“OK”,“msg”:“Success”}

2023-11-02 09:53:05,756 INFO com.starrocks.data.load.stream.DefaultStreamLoader [] - Response for get_load_state, label: flink-f95b7167-5398-458b-881f-69a17368539b, response status code: 200, response body : {“state”:“COMMITTED”,“status”:“OK”,“msg”:“Success”}

2023-11-02 09:53:05,756 INFO com.starrocks.data.load.stream.v2.TransactionTableRegion [] - Success to commit transaction: Transaction{database=‘logData’, table=‘logData_url’, label=‘flink-e3854c1c-e64a-438b-80c8-407e4a7361ff’, finish=false}, duration: 425611 ms

2023-11-02 09:53:05,756 INFO com.starrocks.data.load.stream.v2.TransactionTableRegion [] - Success to commit transaction: Transaction{database=‘logData’, table=‘logData_url’, label=‘flink-f95b7167-5398-458b-881f-69a17368539b’, finish=false}, duration: 413962 ms

2023-11-02 09:53:05,756 INFO com.starrocks.data.load.stream.DefaultStreamLoader [] - Response for get_load_state, label: flink-f692e043-dddc-4129-b30b-a068bd7f3e0f, response status code: 200, response body : {“state”:“COMMITTED”,“status”:“OK”,“msg”:“Success”}

2023-11-02 09:53:05,756 INFO com.starrocks.data.load.stream.v2.TransactionTableRegion [] - Success to commit transaction: Transaction{database=‘logData’, table=‘logData_url’, label=‘flink-f692e043-dddc-4129-b30b-a068bd7f3e0f’, finish=false}, duration: 428003 ms

2023-11-02 09:53:05,757 INFO com.starrocks.data.load.stream.DefaultStreamLoader [] - Response for get_load_state, label: flink-953c673f-b357-492c-a71f-1c01761fdc7d, response status code: 200, response body : {“state”:“COMMITTED”,“status”:“OK”,“msg”:“Success”}

2023-11-02 09:53:05,757 INFO com.starrocks.data.load.stream.v2.TransactionTableRegion [] - Success to commit transaction: Transaction{database=‘logData’, table=‘logData_url’, label=‘flink-953c673f-b357-492c-a71f-1c01761fdc7d’, finish=false}, duration: 420841 ms

2023-11-02 09:53:18,897 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction committed, lable: flink-f2f8b0f4-bb43-48f3-bcb3-ee390bd1fd32, body : {

“Status”: “FAILED”,

“Message”: “class com.starrocks.common.UserException: publish timeout: 20000”

}

2023-11-02 09:53:18,899 INFO com.starrocks.data.load.stream.DefaultStreamLoader [] - Response for get_load_state, label: flink-f2f8b0f4-bb43-48f3-bcb3-ee390bd1fd32, response status code: 200, response body : {“state”:“COMMITTED”,“status”:“OK”,“msg”:“Success”}

2023-11-02 09:53:18,899 INFO com.starrocks.data.load.stream.v2.TransactionTableRegion [] - Success to commit transaction: Transaction{database=‘logData’, table=‘logData_url’, label=‘flink-f2f8b0f4-bb43-48f3-bcb3-ee390bd1fd32’, finish=false}, duration: 426393 ms

2023-11-02 09:53:20,805 INFO com.starrocks.data.load.stream.TransactionStreamLoader [] - Transaction committed, lable: flink-8f07c37b-1bfe-4fd9-a9ab-88481618d751, body : {

“Status”: “FAILED”,

“Message”: “class com.starrocks.common.UserException: publish timeout: 20000”

}

3.1.2 版本里可以通过调整以下参数提升 publish 的效率:

FE 开启 enable_new_publish_mechanism
BE 增加 transaction_publish_version_worker_count

同时我们也在优化 publish 的性能,在 3.1.5 版本里 pk 表的 publish 性能将会有质的提升

好的,我修改下试试,非常感谢

这两个参数干嘛用呢?