StarRocks Stream Load 导入失败

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】每天9点~10点出现Stream Load 导入失败。10:05出现Stream load 失败
【背景】6~10点是业务上导入高峰期,表模型包含主键模型和明细模型
【业务影响】
【是否存算分离】否
【StarRocks版本】例如:2.5.12
【集群规模】例如:3fe(1 follower+2observer)+15be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:32C/128G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群9-折耳根先生或者邮箱,谢谢
【附件】
对应机器:
内存:
image
CPU:
image
IO:
image

  • be.Warn

W1128 10:04:55.260007 29476 engine_storage_migration_task.cpp:102] could not migration because has unfinished txns. W1128 10:04:55.260023 29476 agent_task.cpp:292] local tablet migration failed. status: Internal error: could not migration because has unfinished txns., signature: 15289983 W1128 10:04:55.261433 2332 engine_storage_migration_task.cpp:102] could not migration because has unfinished txns. W1128 10:04:55.261453 2332 agent_task.cpp:292] local tablet migration failed. status: Internal error: could not migration because has unfinished txns., signature: 15303179 W1128 10:04:55.261544 2333 engine_storage_migration_task.cpp:102] could not migration because has unfinished txns. W1128 10:04:55.261565 2333 agent_task.cpp:292] local tablet migration failed. status: Internal error: could not migration because has unfinished txns., signature: 15302475 W1128 10:05:04.031605 29548 tablet_updates.cpp:1294] wait_for_version slow(3174ms) version:152259.1 tablet:8744200 #version:391 [151882 152259.1@390 152259.1] pending: rowsets:3[id/seg/row/del/byte/compaction]: [154973/1/5257/0/40.00 KB/255.96 MB],[154974/1/5372/0/40.77 KB/255.96 MB],[154975/1/5775629/10568/24.00 MB/232.22 MB] W1128 10:05:05.977300 29550 mem_hook.cpp:254] large memory alloc: 1074232162 bytes, stack: @ 0x4a360eb malloc @ 0x7faf765 operator new() @ 0x41b737e starrocks::PrimaryKeyEncoder::encode() @ 0x46a88eb starrocks::vectorized::CompactionState::_load_segments() @ 0x46a95fb starrocks::vectorized::CompactionState::_do_load() @ 0x46a96a5 _ZZSt9call_onceIZN9starrocks10vectorized15CompactionState4loadEPNS0_6RowsetEEUlvE_JEEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv @ 0x7f8d42282e40 __GI___pthread_once @ 0x46a7c48 starrocks::vectorized::CompactionState::load() @ 0x425dc07 starrocks::TabletUpdates::_commit_compaction() @ 0x4267a54 starrocks::TabletUpdates::_do_compaction() @ 0x4268d10 starrocks::TabletUpdates::compaction() @ 0x41c1f83 starrocks::StorageEngine::_perform_update_compaction() @ 0x443a6be starrocks::StorageEngine::_update_compaction_thread_callback() @ 0x802a4a0 execute_native_thread_routine @ 0x7f8d4227ddd5 start_thread @ 0x7f8d4189902d __clone @ (nil) (unknown) W1128 10:05:13.868614 29550 tablet_updates.cpp:1294] wait_for_version slow(6209ms) version:152259.1 tablet:8744180 #version:338 [151936 152259.1@336 152260] pending: rowsets:4[id/seg/row/del/byte/compaction]: [155053/1/3595/0/29.23 KB/255.97 MB],[155054/1/6909/0/51.15 KB/255.95 MB],[155055/1/10956315/10448/45.60 MB/210.62 MB],[155056/1/2540/0/22.01 KB/255.98 MB] W1128 10:05:13.871249 29442 tablet_updates.cpp:1294] wait_for_version slow(5526ms) version:152260 tablet:8744180 #version:338 [151936 152260@337 152260] pending: rowsets:4[id/seg/row/del/byte/compaction]: [155053/1/3595/0/29.23 KB/255.97 MB],[155054/1/6909/0/51.15 KB/255.95 MB],[155055/1/10956315/12974/45.60 MB/210.67 MB],[155056/1/2540/0/22.01 KB/255.98 MB] W1128 10:05:22.527350 29476 engine_storage_migration_task.cpp:102] could not migration because has unfinished txns. W1128 10:05:22.527410 29476 agent_task.cpp:292] local tablet migration failed. status: Internal error: could not migration because has unfinished txns., signature: 15289983 W1128 10:05:22.527580 2562 engine_storage_migration_task.cpp:102] could not migration because has unfinished txns. W1128 10:05:22.527599 2562 agent_task.cpp:292] local tablet migration failed. status: Internal error: could not migration because has unfinished txns., signature: 15303179 W1128 10:05:22.527693 2563 engine_storage_migration_task.cpp:102] could not migration because has unfinished txns. W1128 10:05:22.527709 2563 agent_task.cpp:292] local tablet migration failed. status: Internal error: could not migration because has unfinished txns., signature: 15302475 W1128 10:05:22.528018 29475 engine_storage_migration_task.cpp:148] tablet_meta already exist. tablet:12738435.71653699.0b46f4830faebba3-baeef847bbceafbf, tablet state:1, dest path:/data2/starrocks/storage, source path:/data/apps/StarRocks-2.3.0-rc01/be/storage W1128 10:05:22.528065 29475 agent_task.cpp:292] local tablet migration failed. status: Already exist: tablet_meta already exist. tablet: 12738435.71653699.0b46f4830faebba3-baeef847bbceafbf, signature: 12738435 W1128 10:05:22.630252 7153 agent_server.cpp:477] fail to make_snapshot. tablet_id:8744200 msg:Not found: get_rowsets_for_snapshot: no version to clone tablet:8744200 #version:338 [151936 152261@337 152261] #pending:0 request_version:152262, W1128 10:05:22.630597 7151 agent_server.cpp:477] fail to make_snapshot. tablet_id:8744220 msg:Not found: get_rowsets_for_snapshot: no version to clone tablet:8744220 #version:338 [151936 152261@337 152261] #pending:0 request_version:152262, W1128 10:05:37.988448 29502 utils.cpp:67] Fail to finish_task. host=xxx.36.112, port=9020, error=THRIFT_EAGAIN (timed out) W1128 10:05:37.988493 29502 finish_task.cpp:42] finish task failed retry: 1/3client_status: -1 status_code: OK W1128 10:05:38.853788 29481 utils.cpp:67] Fail to finish_task. host=xxx.36.112, port=9020, error=THRIFT_EAGAIN (timed out) W1128 10:05:38.853839 29481 finish_task.cpp:42] finish task failed retry: 1/3client_status: -1 status_code: OK W1128 10:05:47.185961 29548 tablet_updates.cpp:1294] wait_for_version slow(3563ms) version:235442.1 tablet:7471421 #version:43 [235412 235442.1@42 235442.1] pending: rowsets:1[id/seg/row/del/byte/compaction]: [323870/1/2025276/0/122.61 MB/133.39 MB] W1128 10:05:54.288962 29476 engine_storage_migration_task.cpp:102] could not migration because has unfinished txns. W1128 10:05:54.289003 29476 agent_task.cpp:292] local tablet migration failed. status: Internal error: could not migration because has unfinished txns., signature: 15289983 W1128 10:05:54.289240 2765 engine_storage_migration_task.cpp:102] could not migration because has unfinished txns. W1128 10:05:54.289259 2765 agent_task.cpp:292] local tablet migration failed. status: Internal error: could not migration because has unfinished txns., signature: 15303179 W1128 10:05:54.289326 2766 engine_storage_migration_task.cpp:102] could not migration because has unfinished txns. W1128 10:05:54.289343 2766 agent_task.cpp:292] local tablet migration failed. status: Internal error: could not migration because has unfinished txns., signature: 15302475 W1128 10:05:54.289676 29475 engine_storage_migration_task.cpp:148] tablet_meta already exist. tablet:12738435.71653699.0b46f4830faebba3-baeef847bbceafbf, tablet state:1, dest path:/data2/starrocks/storage, source path:/data/apps/StarRocks-2.3.0-rc01/be/storage W1128 10:05:54.289706 29475 agent_task.cpp:292] local tablet migration failed. status: Already exist: tablet_meta already exist. tablet: 12738435.71653699.0b46f4830faebba3-baeef847bbceafbf, signature: 12738435 W1128 10:06:07.725986 29480 utils.cpp:67] Fail to finish_task. host=xxx.36.112, port=9020, error=THRIFT_EAGAIN (timed out) W1128 10:06:07.726025 29480 finish_task.cpp:42] finish task failed retry: 1/3client_status: -1 status_code: OK W1128 10:06:10.362488 29550 tablet_updates.cpp:1294] wait_for_version slow(3365ms) version:235443.1 tablet:7471437 #version:42 [235412.1 235443.1@41 235443.1] pending: rowsets:1[id/seg/row/del/byte/compaction]: [323278/1/2023744/0/123.14 MB/132.86 MB] W1128 10:06:11.202448 29725 stream_load_executor.cpp:211] commit transaction failed, errmsg=Publish timeout. The data will be visible after a whileerrors: {tablet:8267433 quorum:2 version:207324 #replica:3 err: xxx.36.xxx:207323 xxx.36.xxx:207323 xxx.36.xxx:207323} {tablet:8267437 quorum:2 version:207324 #replica:3 err: xxx.36.xxx:207323 xxx.36.xxx:207323 xxx.36.xxx:...id=b2458b93606dd62b-814434b34d0229bd, job_id=-1, txn_id: 91431127, label=34c1be32-6f27-499e-905e-980957c9aab4, db=cn_report W1128 10:06:14.555198 29726 stream_load_executor.cpp:211] commit transaction failed, errmsg=Publish timeout. The data will be visible after a whileerrors: {tablet:15303215 quorum:2 version:13 #replica:3 err: xxx.36.115:12 xxx.36.113:12 xxx.36.113:12} {tablet:15303219 quorum:2 version:13 #replica:3 err: xxx.36.112:12 xxx.36.112:12 xxx.36.112:12} {tablet:15303223 quoru...id=ad44b38327caad2b-2ddf40dcf7f377b3, job_id=-1, txn_id: 91431142, label=f25d41c4-a5f3-415e-abf9-bef0bf6b02f3, db=dm_report W1128 10:06:15.328320 18065 agent_server.cpp:477] fail to make_snapshot. tablet_id:222497 msg:Not found: get_rowsets_for_snapshot: no version to clone tablet:222497 #version:42 [478666 478697.1@41 478697.1] #pending:0 request_version:478698, W1128 10:06:31.343701 29725 stream_load_executor.cpp:211] commit transaction failed, errmsg=Publish timeout. The data will be visible after a whileerrors: {tablet:15302423 quorum:2 version:195 #replica:3 err: xxx.36.xxx:194 xxx.36.xxx:194 xxx.36.114:194} {tablet:15302427 quorum:2 version:195 #replica:3 err: xxx.36.115:194 xxx.36.113:194 xxx.36.113:194} {tablet:153024...id=b5458d206e5dffed-198ef95edbb6e79d, job_id=-1, txn_id: 91431207, label=acaf6cea-bbf6-4a8f-a581-f2f449075c42, db=ec_report W1128 10:06:41.271317 29550 tablet_updates.cpp:1294] wait_for_version slow(3476ms) version:235443.1 tablet:7471397 #version:41 [235412.1 235443.1@40 235443.1] pending: rowsets:1[id/seg/row/del/byte/compaction]: [323070/1/2024351/0/122.10 MB/133.90 MB] W1128 10:06:45.883687 29477 utils.cpp:67] Fail to finish_task. host=xxx.36.112, port=9020, error=THRIFT_EAGAIN (timed out) W1128 10:06:45.883729 29477 finish_task.cpp:42] finish task failed retry: 1/3client_status: -1 status_code: OK W1128 10:06:46.008070 29492 utils.cpp:67] Fail to finish_task. host=xxx.36.112, port=9020, error=THRIFT_EAGAIN (timed out) W1128 10:06:46.008109 29492 finish_task.cpp:42] finish task failed retry: 1/3client_status: -1 status_code: OK W1128 10:06:46.275207 29497 utils.cpp:67] Fail to finish_task. host=xxx.36.112, port=9020, error=THRIFT_EAGAIN (timed out) W1128 10:06:46.275255 29497 finish_task.cpp:42] finish task failed retry: 1/3client_status: -1 status_code: OK W1128 10:06:46.647153 29503 utils.cpp:67] Fail to finish_task. host=xxx.36.112, port=9020, error=THRIFT_EAGAIN (timed out) W1128 10:06:46.647212 29503 finish_task.cpp:42] finish task failed retry: 1/3client_status: -1 status_code: OK

大佬,目前这个问题有解决吗?