-
Persistent index 导致 BE 启动失败
*** Aborted at 1711349618 (unix time) try "date -d @1711349618" if you are using GNU date ***
PC: @ 0x7f39cefe5387 __GI_raise
*** SIGABRT (@0x3e80000c50f) received by PID 50447 (TID 0x7f39425ff700) from PID 50447; stack trace: ***
@ 0x5b1ba42 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f39cfa9a630 (unknown)
@ 0x7f39cefe5387 __GI_raise
@ 0x7f39cefe6a78 __GI_abort
@ 0x2cdf2be starrocks::failure_function()
@ 0x5b0f41d google::LogMessage::Fail()
@ 0x5b1188f google::LogMessage::SendToLog()
@ 0x5b0ef6e google::LogMessage::Flush()
@ 0x5b11e99 google::LogMessageFatal::~LogMessageFatal()
@ 0x4265eaf starrocks::TabletUpdates::_apply_rowset_commit()
@ 0x4266353 starrocks::TabletUpdates::do_apply()
@ 0x4b17465 starrocks::ThreadPool::dispatch_thread()
@ 0x4b11e4a starrocks::Thread::supervise_thread()
@ 0x7f39cfa92ea5 start_thread
@ 0x7f39cf0ad96d __clone
@ 0x0 (unknown)
W0325 14:58:06.988127 52911 rowset.cpp:141] Fail to open /data/starrocks/storage/be/data/1021/321209132/1248468755/0200000000005e48a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: Corruption: Bad segment file /data/starrocks/storage/be/data/1021/321209132/1248468755/0200000000005e48a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: file size 0 < 12
/build/starrocks/be/src/storage/rowset/segment.cpp:195 Segment::parse_segment_footer(read_file.get(), &footer, footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/segment.cpp:67 segment->_open(footer_length_hint, partial_rowset_footer)
W0325 14:58:06.988687 52911 rowset_update_state.cpp:39] load RowsetUpdateState error: Corruption: Bad segment file /data/starrocks/storage/be/data/1021/321209132/1248468755/0200000000005e48a648e0f47d7f27a81aa03e6bcc4b45b4_0.dat: file size 0 < 12
/build/starrocks/be/src/storage/rowset/segment.cpp:195 Segment::parse_segment_footer(read_file.get(), &footer, footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/segment.cpp:67 segment->_open(footer_length_hint, partial_rowset_footer)
/build/starrocks/be/src/storage/rowset/rowset.cpp:75 do_load()
/build/starrocks/be/src/storage/rowset/rowset.cpp:454 load()
/build/starrocks/be/src/storage/rowset_update_state.cpp:161 _load_upserts(rowset, 0, pk_column.get()) tablet:321209132 stack:
@ 0x46a4e19 _ZZSt9call_onceIZN9starrocks17RowsetUpdateState4loadEPNS0_6TabletEPNS0_6RowsetEEUlvE_JEEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv
@ 0x7fb3c0d7b20b __pthread_once_slow
@ 0x469fde7 starrocks::RowsetUpdateState::load()
@ 0x4262183 starrocks::TabletUpdates::_apply_rowset_commit()
@ 0x4266353 starrocks::TabletUpdates::do_apply()
@ 0x4b17465 starrocks::ThreadPool::dispatch_thread()
@ 0x4b11e4a starrocks::Thread::supervise_thread()
@ 0x7fb3c0d7cea5 start_thread
@ 0x7fb3c039796d __clone
@ (nil) (unknown)
get_applied_rowsets failed, tablet updates is in error state: tablet:85018 actual row size changed after compaction 50531 -> 50041tablet:85018 #version:13 [29445 29456.1@12 29456.1] #pending:0 backend
F0423 22:31:29.743636 475679 tablet_updates.cpp:1132] delvec inconsistent tablet:8858730 rssid:5262 #old:1402 #add:4 #new:1402 old_v:10497 v:10498
-
Github Issue:
-
Github Fix PR:
-
Jira
-
问题版本:
-
2.5.0 ~ 2.5.20
-
3.0.0 ~ 3.0.8
-
3.1.0 ~ 3.1.5
-
-
修复版本:
-
2.5.21+
-
3.0.9+
-
3.1.6+
-
-
问题原因:
-
临时解决办法:
- 使用 ./meta_tool.sh --operation=delete_persistent_index_meta 功能 删除有问题 tablet 的 persistent index 并重新启动 。如果是3副本的话可以使用 ./meta_tool.sh --operation=delete_meta功能删除