be节点异常宕机,无法拉起

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
【背景】做过哪些操作?无
影响:be节点无法正常拉起,加入集群
存算一体
【StarRocks版本】例如:2.5.12
【集群规模】例如:7fe(1 follower+2observer)+10be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:80C/256G/万兆

【附件】
be.info.log (17.1 KB)
be.out (4.3 KB)

be节点运行中异常崩溃,重新启动提示文件损坏,无法正常启动。be节点升级至2.5.19版本还是报文件损坏,无法正常启动。

fe、be分开部署

是不是机器曾经重启过?

用meta_tool.sh把这个有问题的Tablet删除 323592188

内存故障,导致服务器重启过


使用meta_tool.sh删除323592188后,重新启动,还是报错

I0326 11:22:24.667174 308973 daemon.cpp:277] version 2.5.12 RELEASE (build cb07d99)
Built on 2023-09-04 06:17:57 by StarRocks@localhost
I0326 11:22:24.672102 308973 mem_info.cpp:91] Physical Memory: 251.34 GB
I0326 11:22:24.672118 308973 daemon.cpp:283] Cpu Info:
Model: Intel® Xeon® Gold 6133 CPU @ 2.50GHz
Cores: 80
Max Possible Cores: 80
L1 Cache: 32.00 KB (Line: 64.00 B)
L2 Cache: 1.00 MB (Line: 64.00 B)
L3 Cache: 27.50 MB (Line: 64.00 B)
Hardware Supports:
ssse3
sse4_1
sse4_2
popcnt
avx
avx2
Numa Nodes: 2
Numa Nodes of Cores: 0->0 | 1->1 | 2->0 | 3->1 | 4->0 | 5->1 | 6->0 | 7->1 | 8->0 | 9->1 | 10->0 | 11->1 | 12->0 | 13->1 | 14->0 | 15->1 | 16->0 | 17->1 | 18->0 | 19->1 | 20->0 | 21->1 | 22->0 | 23->1 | 24->0 | 25->1 | 26->0 | 27->1 | 28->0 | 29->1 | 30->0 | 31->1 | 32->0 | 33->1 | 34->0 | 35->1 | 36->0 | 37->1 | 38->0 | 39->1 | 40->0 | 41->1 | 42->0 | 43->1 | 44->0 | 45->1 | 46->0 | 47->1 | 48->0 | 49->1 | 50->0 | 51->1 | 52->0 | 53->1 | 54->0 | 55->1 | 56->0 | 57->1 | 58->0 | 59->1 | 60->0 | 61->1 | 62->0 | 63->1 | 64->0 | 65->1 | 66->0 | 67->1 | 68->0 | 69->1 | 70->0 | 71->1 | 72->0 | 73->1 | 74->0 | 75->1 | 76->0 | 77->1 | 78->0 | 79->1 |
I0326 11:22:24.672158 308973 daemon.cpp:284] Disk Info:
Num disks 6: nvme0n, nvme1n, sda, sdb, md, md127p
I0326 11:22:24.672166 308973 daemon.cpp:285] Mem Info: 251.34 GB
I0326 11:22:24.815426 308973 daemon.cpp:260] Minidump is disabled
I0326 11:22:24.815469 308973 backend_options.cpp:100] priority cidrs in conf: 10.10.10.55/23
I0326 11:22:24.815845 308973 backend_options.cpp:77] localhost 10.10.10.55
I0326 11:22:24.816041 308973 exec_env.cpp:434] Set storage page cache size 43718810296
I0326 11:22:24.818608 309020 daemon.cpp:188] Current memory statistics: process(29259480), query_pool(0), load(0), metadata(0), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0326 11:22:24.821791 309022 data_dir.cpp:113] path: /data/starrocks/storage/be, hash: 7707643974414374032
I0326 11:22:24.951689 309098 data_dir.cpp:237] start to load tablets from /data/starrocks/storage/be
I0326 11:22:24.951712 309098 data_dir.cpp:243] begin loading rowset from meta
I0326 11:22:27.594189 309098 data_dir.cpp:261] load rowset from meta finished, data dir: /data/starrocks/storage/be
I0326 11:22:27.594228 309098 data_dir.cpp:266] begin loading tablet from meta
I0326 11:22:39.823786 309020 daemon.cpp:188] Current memory statistics: process(1230482296), query_pool(0), load(0), metadata(610412929), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
W0326 11:22:40.504978 309175 rowset.cpp:141] Fail to open /data/starrocks/storage/be/data/1021/321209132/1248468755/0200000000005e48a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: Corruption: Bad segment file /data/starrocks/storage/be/data/1021/321209132/1248468755/0200000000005e48a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: file size 0 < 12
/build/starrocks/be/src/storage/rowset/segment.cpp:195 Segment::parse_segment_footer(read_file.get(), &footer, footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/segment.cpp:67 segment->_open(footer_length_hint, partial_rowset_footer)
W0326 11:22:40.505415 309175 rowset_update_state.cpp:39] load RowsetUpdateState error: Corruption: Bad segment file /data/starrocks/storage/be/data/1021/321209132/1248468755/0200000000005e48a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: file size 0 < 12
/build/starrocks/be/src/storage/rowset/segment.cpp:195 Segment::parse_segment_footer(read_file.get(), &footer, footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/segment.cpp:67 segment->_open(footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/rowset.cpp:75 do_load()
/build/starrocks/be/src/storage/rowset/rowset.cpp:454 load()
/build/starrocks/be/src/storage/rowset_update_state.cpp:161 _load_upserts(rowset, 0, pk_column.get()) tablet:321209132 stack:
@ 0x46a4e19 _ZZSt9call_onceIZN9starrocks17RowsetUpdateState4loadEPNS0_6TabletEPNS0_6RowsetEEUlvE_JEEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv
@ 0x7fa3bbf2220b __pthread_once_slow
@ 0x469fde7 starrocks::RowsetUpdateState::load()
@ 0x4262183 starrocks::TabletUpdates::_apply_rowset_commit()
@ 0x4266353 starrocks::TabletUpdates::do_apply()
@ 0x4b17465 starrocks::ThreadPool::dispatch_thread()
@ 0x4b11e4a starrocks::thread::supervise_thread()
@ 0x7fa3bbf23ea5 start_thread
@ 0x7fa3bb53e96d __clone
@ (nil) (unknown)
W0326 11:22:40.586253 309175 rowset_update_state.cpp:27] bad RowsetUpdateState released tablet:321209132
E0326 11:22:40.586325 309175 tablet_updates.cpp:893] _apply_rowset_commit error: load rowset update state failed: Corruption: Bad segment file /data/starrocks/storage/be/data/1021/321209132/1248468755/0200000000005e48a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: file size 0 < 12
/build/starrocks/be/src/storage/rowset/segment.cpp:195 Segment::parse_segment_footer(read_file.get(), &footer, footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/segment.cpp:67 segment->_open(footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/rowset.cpp:75 do_load()
/build/starrocks/be/src/storage/rowset/rowset.cpp:454 load()
/build/starrocks/be/src/storage/rowset_update_state.cpp:161 _load_upserts(rowset, 0, pk_column.get()) tablet:321209132 #version:65 [3445.1 3500@59 3505] pending: rowsets:21
242 [seg:1 row:2292769 del:3569 bytes:277952717 compaction:-7353911]
957 [seg:1 row:3420402 del:4673 bytes:407845467 compaction:-136623991]
1126 [seg:1 row:3562116 del:5606 bytes:427480279 compaction:-155681018]
1438 [seg:1 row:2270929 del:4223 bytes:271922966 compaction:-959185]
1815 [seg:1 row:2263968 del:4019 bytes:273998034 compaction:-3130573]
2353 [seg:1 row:2370086 del:3697 bytes:285349941 compaction:-14688960]
2735 [seg:1 row:2275565 del:3264 bytes:274392024 compaction:-3988673]
3216 [seg:1 row:2781385 del:2522 bytes:337111248 compaction:-67147427]
3771 [seg:1 row:30144 del:1 bytes:2859317 compaction:265576609]
3772 [seg:1 row:65 del:0 bytes:12728 compaction:268422728]
3773 [seg:1 row:44 del:0 bytes:10476 compaction:268424980]
3774 [seg:1 row:63 del:0 bytes:12764 compaction:268422692]
3775 [seg:1 row:65 del:0 bytes:12635 compaction:268422821]
3776 [seg:1 row:62 del:0 bytes:12789 compaction:268422667]
3777 [seg:1 row:7 del:0 bytes:6192 compaction:268429264]
3778 [seg:1 row:15 del:0 bytes:6799 compaction:268428657]
3779 [seg:1 row:15 del:0 bytes:7014 compaction:268428442]
3780 [seg:1 row:16 del:0 bytes:6991 compaction:268428465]
3781 [seg:1 row:18 del:0 bytes:7331 compaction:268428125]
3782 [seg:1 row:72 del:0 bytes:13216 compaction:268422240]
3783 [seg:1 row:15 del:0 bytes:6997 compaction:268428459]
I0326 11:22:40.670964 309175 persistent_index.cpp:2675] load persistent index tablet:323593146 version:625 size: 6206492 l0_size: 0 l0_capacity:0 #shard: 583 l1_size:6100200 memory: 0 status: OK time:1ms
W0326 11:22:40.752521 309176 persistent_index.cpp:2683] load persistent index failed, tablet: 323592188, status: IO error: can not read fully
/build/starrocks/be/src/storage/persistent_index.cpp:1851 read_file->read_at_fully(offset, buff.data(), buff.size())
/build/starrocks/be/src/storage/persistent_index.cpp:2428 _l0->load(l0_meta)
/build/starrocks/be/src/storage/persistent_index.cpp:2379 _load(index_meta)
W0326 11:22:40.752588 309176 persistent_index.cpp:2690] delete error l0 index file: index.l0.77.0, status: Not found: index.l0.77.0: No such file or directory
F0326 11:22:40.785439 309175 tablet_updates.cpp:1131] delvec inconsistent tablet:323593146 rssid:657 #old:108 #add:15 #new:119 old_v:625 v:626
W0326 11:22:40.835842 309176 rowset.cpp:141] Fail to open /data/starrocks/storage/be/data/195/323592188/1813030133/0200000000001c95a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: Corruption: Bad segment file /data/starrocks/storage/be/data/195/323592188/1813030133/0200000000001c95a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: file size 0 < 12
/build/starrocks/be/src/storage/rowset/segment.cpp:195 Segment::parse_segment_footer(read_file.get(), &footer, footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/segment.cpp:67 segment->_open(footer_length_hint, partial_rowset_footer)
I0326 11:22:40.837478 309098 data_dir.cpp:317] load tablet from meta finished, loaded tablet: 112297, error tablet: 0, path: /data/starrocks/storage/be
I0326 11:22:40.840662 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 266442524, txn_id: 247128076, rowsetid: 020000005ac9e8fa6d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.840674 309098 data_dir.cpp:352] Added committed rowset=020000005ac9e8fa6d4bd3c563a6aff0a2efd4116d81e4bf tablet=266442524 schema hash=2135605729 txn_id: 247128076
W0326 11:22:40.846357 309098 data_dir.cpp:367] Found invalid rowset=020000001ccc04f9a74294fdc76c0c2b0c7d59cdda920db6 tablet id=298623355 tablet uid=054d848093c1e14b-d59c61088ec6648f schema hash=773891670 txn_id: 224974137 current valid tablet uid=364e41dbf9b46208-edc927769fddd18e
W0326 11:22:40.846382 309098 data_dir.cpp:367] Found invalid rowset=020000001ccc061fa74294fdc76c0c2b0c7d59cdda920db6 tablet id=298623355 tablet uid=054d848093c1e14b-d59c61088ec6648f schema hash=773891670 txn_id: 224974138 current valid tablet uid=364e41dbf9b46208-edc927769fddd18e
W0326 11:22:40.846387 309098 data_dir.cpp:367] Found invalid rowset=020000001ccc0744a74294fdc76c0c2b0c7d59cdda920db6 tablet id=298623355 tablet uid=054d848093c1e14b-d59c61088ec6648f schema hash=773891670 txn_id: 224974139 current valid tablet uid=364e41dbf9b46208-edc927769fddd18e
I0326 11:22:40.849648 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 266442544, txn_id: 247128076, rowsetid: 020000005ac9e8fd6d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.849654 309098 data_dir.cpp:352] Added committed rowset=020000005ac9e8fd6d4bd3c563a6aff0a2efd4116d81e4bf tablet=266442544 schema hash=2135605729 txn_id: 247128076
I0326 11:22:40.857275 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 299788624, txn_id: 247128130, rowsetid: 020000005aca154c6d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.857280 309098 data_dir.cpp:352] Added committed rowset=020000005aca154c6d4bd3c563a6aff0a2efd4116d81e4bf tablet=299788624 schema hash=1309589001 txn_id: 247128130
I0326 11:22:40.863637 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 291472512, txn_id: 247128374, rowsetid: 020000005aca985a6d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.863642 309098 data_dir.cpp:352] Added committed rowset=020000005aca985a6d4bd3c563a6aff0a2efd4116d81e4bf tablet=291472512 schema hash=1781286787 txn_id: 247128374
I0326 11:22:40.865213 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 323593662, txn_id: 247128521, rowsetid: 020000005acb19ae6d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.865217 309098 data_dir.cpp:352] Added committed rowset=020000005acb19ae6d4bd3c563a6aff0a2efd4116d81e4bf tablet=323593662 schema hash=820924035 txn_id: 247128521
I0326 11:22:40.865571 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 231699688, txn_id: 247128241, rowsetid: 020000005aca5a786d4bd3c563a6aff0a2efd4116d81e4bf #segment:0 #delfile:0
I0326 11:22:40.865574 309098 data_dir.cpp:352] Added committed rowset=020000005aca5a786d4bd3c563a6aff0a2efd4116d81e4bf tablet=231699688 schema hash=1638904495 txn_id: 247128241
I0326 11:22:40.865577 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 231699688, txn_id: 247128242, rowsetid: 020000005aca5a7c6d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.865581 309098 data_dir.cpp:352] Added committed rowset=020000005aca5a7c6d4bd3c563a6aff0a2efd4116d81e4bf tablet=231699688 schema hash=1638904495 txn_id: 247128242
I0326 11:22:40.867766 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 231699692, txn_id: 247128241, rowsetid: 020000005aca5a796d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.867771 309098 data_dir.cpp:352] Added committed rowset=020000005aca5a796d4bd3c563a6aff0a2efd4116d81e4bf tablet=231699692 schema hash=1638904495 txn_id: 247128241
I0326 11:22:40.867775 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 231699692, txn_id: 247128242, rowsetid: 020000005aca5a7d6d4bd3c563a6aff0a2efd4116d81e4bf #segment:0 #delfile:0
I0326 11:22:40.867779 309098 data_dir.cpp:352] Added committed rowset=020000005aca5a7d6d4bd3c563a6aff0a2efd4116d81e4bf tablet=231699692 schema hash=1638904495 txn_id: 247128242
I0326 11:22:40.876984 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 266442532, txn_id: 247128076, rowsetid: 020000005ac9e8fc6d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.876989 309098 data_dir.cpp:352] Added committed rowset=020000005ac9e8fc6d4bd3c563a6aff0a2efd4116d81e4bf tablet=266442532 schema hash=2135605729 txn_id: 247128076
W0326 11:22:40.835889 309176 primary_index.cpp:943] load PrimaryIndex error: Corruption: Bad segment file /data/starrocks/storage/be/data/195/323592188/1813030133/0200000000001c95a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: file size 0 < 12
/build/starrocks/be/src/storage/rowset/segment.cpp:195 Segment::parse_segment_footer(read_file.get(), &footer, footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/segment.cpp:67 segment->_open(footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/rowset.cpp:75 do_load()
/build/starrocks/be/src/storage/rowset/rowset.cpp:454 load()
/build/starrocks/be/src/storage/persistent_index.cpp:2821 _insert_rowsets(tablet, rowsets, pkey_schema, apply_version, std::move(pk_column)) tablet:0 stack:
@ 0x41825d0 starrocks::PrimaryIndex::load()
@ 0x426265e starrocks::TabletUpdates::_apply_rowset_commit()
@ 0x4266353 starrocks::TabletUpdates::do_apply()
@ 0x4b17465 starrocks::ThreadPool::dispatch_thread()
@ 0x4b11e4a starrocks::thread::supervise_thread()
@ 0x7fa3bbf23ea5 start_thread
@ 0x7fa3bb53e96d __clone
@ (nil) (unknown)
E0326 11:22:40.878813 309176 tablet_updates.cpp:911] _apply_rowset_commit error: load primary index failed: Corruption: Bad segment file /data/starrocks/storage/be/data/195/323592188/1813030133/0200000000001c95a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: file size 0 < 12
/build/starrocks/be/src/storage/rowset/segment.cpp:195 Segment::parse_segment_footer(read_file.get(), &footer, footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/segment.cpp:67 segment->_open(footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/rowset.cpp:75 do_load()
/build/starrocks/be/src/storage/rowset/rowset.cpp:454 load()
/build/starrocks/be/src/storage/persistent_index.cpp:2821 _insert_rowsets(tablet, rowsets, pkey_schema, apply_version, std::move(pk_column)) tablet:323592188 #version:13 [114 122@9 125] pending: rowsets:10
124 [seg:1 row:632799 del:0 bytes:7817679 compaction:260617777]
125 [seg:1 row:7052 del:0 bytes:94399 compaction:268341057]
126 [seg:1 row:1508 del:0 bytes:26228 compaction:268409228]
127 [seg:1 row:7029 del:0 bytes:95326 compaction:268340130]
128 [seg:1 row:1059 del:0 bytes:20167 compaction:268415289]
129 [seg:1 row:7449 del:0 bytes:96046 compaction:268339410]
130 [seg:1 row:1132 del:0 bytes:21710 compaction:268413746]
131 [seg:1 row:8246 del:0 bytes:114645 compaction:268320811]
132 [seg:1 row:8259 del:0 bytes:116335 compaction:268319121]
133 [seg:1 row:606 del:0 bytes:14084 compaction:268421372]
I0326 11:22:40.882707 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 266442512, txn_id: 247128076, rowsetid: 020000005ac9e8f96d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.882722 309098 data_dir.cpp:352] Added committed rowset=020000005ac9e8f96d4bd3c563a6aff0a2efd4116d81e4bf tablet=266442512 schema hash=2135605729 txn_id: 247128076
I0326 11:22:40.891373 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 150953368, txn_id: 247128141, rowsetid: 020000005aca235a6d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.891378 309098 data_dir.cpp:352] Added committed rowset=020000005aca235a6d4bd3c563a6aff0a2efd4116d81e4bf tablet=150953368 schema hash=1061772174 txn_id: 247128141
I0326 11:22:40.892314 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 150953384, txn_id: 247128141, rowsetid: 020000005aca235c6d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.892318 309098 data_dir.cpp:352] Added committed rowset=020000005aca235c6d4bd3c563a6aff0a2efd4116d81e4bf tablet=150953384 schema hash=1061772174 txn_id: 247128141
I0326 11:22:40.892599 309098 txn_manager.cpp:285] Commit txn successfully. tablet: 274484164, txn_id: 247128078, rowsetid: 020000005ac9e9086d4bd3c563a6aff0a2efd4116d81e4bf #segment:1 #delfile:0
I0326 11:22:40.892602 309098 data_dir.cpp:352] Added committed rowset=020000005ac9e9086d4bd3c563a6aff0a2efd4116d81e4bf tablet=274484164 schema hash=729321178 txn_id: 247128078

命令执行的不对

–root_path=/data/starrocks/storage/be

处理完后,应该可以启动成功,尽快升到2.5.20,这个问题修过了

./bin/meta_tool.sh --operation=delete_persistent_index_meta --root_path=/data/starrocks/storage/be --tablet_id=323592188

这样吗?老师

是的,这样写就行

如果 delete_persistent_index_meta 不行的话,就用delete_meta

be.info.log (17.4 KB)


还是启动失败,老师

再看看报错

Bad segment file

把报错的都处理下

这个报错怎么处理? 老师

啥报错,这不是处理成功了吗