StarRocks 升级 3.2.4 -> 3.3.1 Flink-CDC 无法同步数据

问题描述

【详述】

将 StarRocks 升级,从 3.2.4 升级至 3.3.1,成功升级之后, FE 和 BE 节点都是正常运行的,但是当我开启 Flink-CDC 的同步任务时,Flink 同步任务报错,并且同步的数据表无法进行访问。

Flink 报错信息:

Tablet 信息:

数据表查询报错信息:

基本信息

【背景】只是进行了 StarRocks 的版本升级
【业务影响】Flink-CDC 的同步任务都无法进行,且进行同步后的数据表,都显示 Tablet 损坏,无法进行访问!
【是否存算分离】否
【StarRocks版本】3.2.4 升级至 3.3.1
【集群规模】例如:1fe(4v8g)+3be(8v32g)
【机器信息】AWS EC2,FE:c5a.xlarge,Amazon Linux;BE:m6a.2xlarge,Amazon Linux

【Flink 信息】Flink-CDC版本:2.3.0,Flink 版本:1.5.4,源数据库:PostGreSql,StarRocks Connector 版本:1.2.6_flink-1.15​

【联系方式】社区群19-枫

解决方案

目前解决方案是将 StarRocks 进行降级,降级的过程为:3.3.1 -> 3.2.9 -> 3.2.4

降级完成后, Flink-CDC 能够正常的导入数据,基本上完全恢复!

FE 日志:

2024-07-22 03:56:38.299Z INFO (ReportHandler|170) [ReportHandler.tabletReport():441] backend[25620097] reports 19551 tablet(s). report version: 17216146893513
2024-07-22 03:56:38.314Z INFO (ReportHandler|170) [TabletInvertedIndex.tabletReport():307] finished to do tablet diff with backend[25620097]. sync: 23. metaDel: 0. foundValid: 19513. foundInvalid: 0. migration: 0. found invalid transactions 0. found republish transactions 0  cost: 11 ms
2024-07-22 03:56:38.315Z INFO (ReportHandler|170) [ReportHandler.sync():643] before sync tablets in db[70006868]. report num: 23. backend[25620097]
2024-07-22 03:56:38.315Z INFO (ReportHandler|170) [ReportHandler.sync():777] sync 0 update 0 in 23 tablets in db[70006868]. backend[25620097]
2024-07-22 03:56:40.753Z INFO (starrocks-mysql-nio-pool-117|1626) [QeProcessorImpl.registerQuery():108] register query id = 66f66469-47de-11ef-abf0-064e9c7716eb
2024-07-22 03:56:40.756Z WARN (starrocks-mysql-nio-pool-117|1626) [FragmentInstanceExecState.waitForDeploymentCompletion():297] exec plan fragment failed, errmsg=get_applied_rowsets failed, tablet updates is in error state: tablet:96156411 apply tablet: 96156411 failed, status: Invalid argument: _apply_rowset_commit error: apply rowset update state failed: Invalid argument: Fail to do LZ4FRAME decompress, res=ERROR_frameType_unknown
be/src/storage/persistent_index.cpp:2495 codec->decompress(compressed_body, &decompressed_body)
be/src/storage/persistent_index.cpp:2611 _read_page(shard_idx, pageid, &page, stat)
be/src/storage/persistent_index.cpp:2907 _get_in_shard(shard_off + i, n, keys, keys_info_by_shard[i].key_infos, values, found_keys_info, stat)
be/src/storage/persistent_index.cpp:3604 _l1_vec[i - 1]->get(n, keys, keys_info, values, &found_keys_info, key_size, stat)
be/src/storage/persistent_index.cpp:3852 _get_from_immutable_index(n, keys, old_values, not_founds_by_key_size, stat)
be/src/storage/primary_index.cpp:1308 _persistent_index->upsert(n, vkeys, reinterpret_cast<IndexValue*>(values.data()), reinterpret_cast<IndexValue*>(old_values.data()), stat)
be/src/storage/tablet_updates.cpp:1865 index.upsert(rowset_id + upsert_idx, 0, *upserts[upsert_idx], new_deletes, iostat.get()) tablet:96156411 #version:6 [54451.1 54454.1@4 54455] pending: rowsets:4
  5738 [seg:1 row:2908699 del:82767 bytes:440658055 row_size:0 compaction_score:-109527999 compaction_level:-1 partial_update_by_column:false]
  6471 [seg:1 row:4620066 del:26215 bytes:693523049 row_size:0 compaction_score:-405411788 compaction_level:-1 partial_update_by_column:false]
  8089 [seg:1 row:72361 del:0 bytes:10734604 row_size:0 compaction_score:257700852 compaction_level:-1 partial_update_by_column:false]
  8090 [seg:1 row:1209 del:0 bytes:180960 row_size:461111 compaction_score:268254496 compaction_level:-1 partial_update_by_column:false] backend [id=45988457] [host=10.3.2.61], code=INTERNAL_ERROR, fragmentId=F00, backend=10.3.2.61:9060
2024-07-22 03:56:40.757Z WARN (starrocks-mysql-nio-pool-117|1626) [FragmentInstanceExecState.waitForDeploymentCompletion():297] exec plan fragment failed, errmsg=get_applied_rowsets failed, tablet updates is in error state: tablet:96156408 apply tablet: 96156408 failed, status: Invalid argument: _apply_rowset_commit error: apply rowset update state failed: Invalid argument: Fail to do LZ4FRAME decompress, res=ERROR_frameType_unknown
be/src/storage/persistent_index.cpp:2495 codec->decompress(compressed_body, &decompressed_body)
be/src/storage/persistent_index.cpp:2611 _read_page(shard_idx, pageid, &page, stat)
be/src/storage/persistent_index.cpp:2907 _get_in_shard(shard_off + i, n, keys, keys_info_by_shard[i].key_infos, values, found_keys_info, stat)
be/src/storage/persistent_index.cpp:3604 _l1_vec[i - 1]->get(n, keys, keys_info, values, &found_keys_info, key_size, stat)
be/src/storage/persistent_index.cpp:3852 _get_from_immutable_index(n, keys, old_values, not_founds_by_key_size, stat)
be/src/storage/primary_index.cpp:1308 _persistent_index->upsert(n, vkeys, reinterpret_cast<IndexValue*>(values.data()), reinterpret_cast<IndexValue*>(old_values.data()), stat)
be/src/storage/tablet_updates.cpp:1865 index.upsert(rowset_id + upsert_idx, 0, *upserts[upsert_idx], new_deletes, iostat.get()) tablet:96156408 #version:4 [54453 54454.1@2 54455] pending: rowsets:4
  5647 [seg:1 row:2849692 del:85510 bytes:431791456 row_size:0 compaction_score:-98572710 compaction_level:-1 partial_update_by_column:false]
  6490 [seg:1 row:4684775 del:29469 bytes:703291788 row_size:0 compaction_score:-412736482 compaction_level:-1 partial_update_by_column:false]
  8126 [seg:1 row:72714 del:0 bytes:10727884 row_size:0 compaction_score:257707572 compaction_level:-1 partial_update_by_column:false]
  8127 [seg:1 row:1208 del:0 bytes:180563 row_size:460846 compaction_score:268254893 compaction_level:-1 partial_update_by_column:false] backend [id=25620097] [host=10.3.2.94], code=INTERNAL_ERROR, fragmentId=F00, backend=10.3.2.94:9060
2024-07-22 03:56:40.757Z INFO (starrocks-mysql-nio-pool-117|1626) [QueryRuntimeProfile.finishAllInstances():239] unfinished instances: [66f66469-47de-11ef-abf0-064e9c7716ec]
2024-07-22 03:56:40.757Z WARN (starrocks-mysql-nio-pool-117|1626) [StmtExecutor.execute():645] Query 66f66469-47de-11ef-abf0-064e9c7716eb failed. Planner profile : Planner:
  Reason:

2024-07-22 03:56:40.757Z INFO (starrocks-mysql-nio-pool-117|1626) [QueryRuntimeProfile.finishAllInstances():239] unfinished instances: [66f66469-47de-11ef-abf0-064e9c7716ec]
2024-07-22 03:56:40.757Z INFO (starrocks-mysql-nio-pool-117|1626) [QeProcessorImpl.unregisterQuery():149] deregister query id = 66f66469-47de-11ef-abf0-064e9c7716eb
2024-07-22 03:56:40.757Z INFO (starrocks-mysql-nio-pool-117|1626) [StmtExecutor.execute():726] execute Exception, sql: SELECT * FROM bi_ods.ccms_clr_parcel_clearance_hot LIMIT 10, error: get_applied_rowsets failed, tablet updates is in error state: tablet:96156411 apply tablet: 96156411 failed, status: Invalid argument: _apply_rowset_commit error: apply rowset update state failed: Invalid argument: Fail to do LZ4FRAME decompress, res=ERROR_frameType_unknown
be/src/storage/persistent_index.cpp:2495 codec->decompress(compressed_body, &decompressed_body)
be/src/storage/persistent_index.cpp:2611 _read_page(shard_idx, pageid, &page, stat)
be/src/storage/persistent_index.cpp:2907 _get_in_shard(shard_off + i, n, keys, keys_info_by_shard[i].key_infos, values, found_keys_info, stat)
be/src/storage/persistent_index.cpp:3604 _l1_vec[i - 1]->get(n, keys, keys_info, values, &found_keys_info, key_size, stat)
be/src/storage/persistent_index.cpp:3852 _get_from_immutable_index(n, keys, old_values, not_founds_by_key_size, stat)
be/src/storage/primary_index.cpp:1308 _persistent_index->upsert(n, vkeys, reinterpret_cast<IndexValue*>(values.data()), reinterpret_cast<IndexValue*>(old_values.data()), stat)
be/src/storage/tablet_updates.cpp:1865 index.upsert(rowset_id + upsert_idx, 0, *upserts[upsert_idx], new_deletes, iostat.get()) tablet:96156411 #version:6 [54451.1 54454.1@4 54455] pending: rowsets:4
  5738 [seg:1 row:2908699 del:82767 bytes:440658055 row_size:0 compaction_score:-109527999 compaction_level:-1 partial_update_by_column:false]
  6471 [seg:1 row:4620066 del:26215 bytes:693523049 row_size:0 compaction_score:-405411788 compaction_level:-1 partial_update_by_column:false]
  8089 [seg:1 row:72361 del:0 bytes:10734604 row_size:0 compaction_score:257700852 compaction_level:-1 partial_update_by_column:false]
  8090 [seg:1 row:1209 del:0 bytes:180960 row_size:461111 compaction_score:268254496 compaction_level:-1 partial_update_by_column:false] backend [id=45988457] [host=10.3.2.61]
2024-07-22 03:56:40.757Z WARN (starrocks-mysql-nio-pool-117|1626) [DefaultCoordinator.cancel():855] cancel execState of query, this is outside invoke
2024-07-22 03:56:40.757Z INFO (starrocks-mysql-nio-pool-117|1626) [QueryRuntimeProfile.finishAllInstances():239] unfinished instances: [66f66469-47de-11ef-abf0-064e9c7716ec]
2024-07-22 03:56:40.757Z INFO (thrift-server-pool-26|335) [QeProcessorImpl.reportExecStatus():192] ReportExecStatus() failed, query does not exist, fragment_instance_id=66f66469-47de-11ef-abf0-064e9c7716ec, query_id=66f66469-47de-11ef-abf0-064e9c7716eb,
2024-07-22 03:56:42.270Z INFO (consistency checker|21) [CheckConsistencyJob.tryFinishJob():343] tablet[99948274] is consistent: [70021848, 45988457]
2024-07-22 03:56:42.272Z INFO (consistency checker|21) [CheckConsistencyJob.tryFinishJob():343] tablet[99948262] is consistent: [45988457, 25620097]
2024-07-22 03:56:42.274Z INFO (consistency checker|21) [CheckConsistencyJob.tryFinishJob():343] tablet[99948256] is consistent: [70021848, 45988457]
2024-07-22 03:56:42.276Z INFO (consistency checker|21) [CheckConsistencyJob.tryFinishJob():343] tablet[99948259] is consistent: [70021848, 25620097]
2024-07-22 03:56:42.277Z INFO (consistency checker|21) [CheckConsistencyJob.tryFinishJob():343] tablet[99948268] is consistent: [70021848, 25620097]
2024-07-22 03:56:42.279Z INFO (consistency checker|21) [CheckConsistencyJob.tryFinishJob():343] tablet[99948271] is consistent: [45988457, 25620097]
2024-07-22 03:56:42.281Z INFO (consistency checker|21) [CheckConsistencyJob.tryFinishJob():343] tablet[99948265] is consistent: [70021848, 45988457]
2024-07-22 03:56:42.283Z INFO (consistency checker|21) [CheckConsistencyJob.tryFinishJob():343] tablet[99948244] is consistent: [45988457, 25620097]

BE 日志
日志从 2552 行开始
be_war.log (1019.9 KB)

好的,谢谢