任何节点重启be失败， tablet init missing rowset error

itdatasir · 2023年08月24日 04:34

为了更快的定位您的问题，请提供以下信息，谢谢
【详述】
routine load任务消费kafka如果更快落盘
修改配置，重启be，集群任何节点都不能重启成功。

【背景】
修改 be.conf
routine_load_thread_pool_size = 128
修改fe.conf
max_running_txn_num_per_db = 1000
max_routine_load_task_num_per_be = 32
enable_auto_tablet_distribution = true

重启fe没有问题，重启be失败。

【业务影响】

不能查询

SQL 错误 [1064] [42000]: get_applied_rowsets failed, tablet updates is in error state: tablet:21661594 actual row size changed after compaction 683463 -> 0tablet:21661594 #version:2 [6937 6937.1@1 6937.1] #pending:0 backend:172.20.192.74

【StarRocks版本】2.5.10

【集群规模】4fe（3 follower+1observer）+4be（fe与be混部）

【机器信息】CPU虚拟核/内存/网卡，例如：256C/2048G/万兆

报错信息

*** Aborted at 1692840766 (unix time) try “date -d @1692840766” if you are using GNU date ***
PC: @ 0x7f01c301d387 __GI_raise
*** SIGABRT (@0xe5b3) received by PID 58803 (TID 0x7f01777fe700) from PID 58803; stack trace: ***
@ 0x5aed0a2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f01c3ad2630 (unknown)
@ 0x7f01c301d387 __GI_raise
@ 0x7f01c301ea78 __GI_abort
@ 0x2cce6fe starrocks::failure_function()
@ 0x5ae0a7d google::LogMessage::Fail()
@ 0x5ae2eef google::LogMessage::SendToLog()
@ 0x5ae05ce google::LogMessage::Flush()
@ 0x5ae34f9 google::LogMessageFatal::~LogMessageFatal()
@ 0x41c8a12 starrocks::DataDir::load()
@ 0x41a9e3b _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN9starrocks13StorageEngine14load_data_dirsERKSt6vectorIPNS3_7DataDirESaIS7_EEEUlvE_EEEEE6_M_runEv
@ 0x7ffb6e0 execute_native_thread_routine
@ 0x7f01c3acaea5 start_thread
@ 0x7f01c30e5b0d __clone
@ 0x0 (unknown)

查看info日志发现 tablet init missing rowset error

跟GitHub描述的一样