3.0.2 be crash

【详述】be crash
【背景】
【业务影响】数据库崩溃,无法查询
【StarRocks版本】3.0.2
【集群规模】例如:3fe+2be(fe与be混部)
【机器信息】72C/376G/万兆
【联系方式】
【附件】
be.out:

3.0.2 RELEASE (build c833698)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 88120637048
tracker:query_pool consumption: 0
tracker:load consumption: 0
tracker:metadata consumption: 1417292878
tracker:tablet_metadata consumption: 662931429
tracker:rowset_metadata consumption: 453003728
tracker:segment_metadata consumption: 434096313
tracker:column_metadata consumption: -132738592
tracker:tablet_schema consumption: 258325
tracker:segment_zonemap consumption: 361168021
tracker:short_key_index consumption: 40294941
tracker:column_zonemap_index consumption: 684558536
tracker:ordinal_index consumption: -1334937096
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 5975094597
tracker:page_cache consumption: 75695586032
tracker:update consumption: 10088338
tracker:chunk_allocator consumption: 2164696336
tracker:clone consumption: 0
tracker:consistency consumption: 0
*** Aborted at 1693642076 (unix time) try “date -d @1693642076” if you are using GNU date ***
PC: @ 0x6d41767 extent_recycle.isra.0
*** SIGSEGV (@0x0) received by PID 317395 (TID 0x2b9b90208700) from PID 0; stack trace: ***
@ 0x62d7062 google::(anonymous namespace)::FailureSignalHandler()
@ 0x2b60f19495e0 (unknown)
@ 0x6d41767 extent_recycle.isra.0
@ 0x6d07251 arena_bin_malloc_hard
@ 0x6d09a7b je_arena_tcache_fill_small
@ 0x6d7ad1e je_tcache_alloc_small_hard
@ 0x6cefbc2 je_malloc_default
@ 0x4fe9288 malloc
@ 0x8a56d85 operator new()
@ 0x5f76f1c ZZN7rocksdb28FragmentedRangeTombstoneList18FragmentTombstonesESt10unique_ptrINS_20InternalIteratorBaseINS_5SliceEEESt14default_deleteIS4_EERKNS_21InternalKeyComparatorEbRKSt6vectorImSaImEEENKUlRKS3_E_clESH
@ 0x5f7725c rocksdb::FragmentedRangeTombstoneList::FragmentTombstones()
@ 0x5f7797f rocksdb::FragmentedRangeTombstoneList::FragmentedRangeTombstoneList()
@ 0x5f5c066 rocksdb::MemTable::NewRangeTombstoneIterator()
@ 0x5f5c344 rocksdb::MemTable::Get()
@ 0x5e9ef0f rocksdb::DBImpl::GetImpl()
@ 0x5e9fc01 rocksdb::DBImpl::Get()
@ 0x49e0efd starrocks::KVStore::get()
@ 0x4798dc5 starrocks::TabletMetaManager::get_persistent_index_meta()
@ 0x47d5e64 starrocks::TabletUpdates::_apply_rowset_commit()
@ 0x47d8c73 starrocks::TabletUpdates::do_apply()
@ 0x50e5335 starrocks::ThreadPool::dispatch_thread()
@ 0x50dfd1a starrocks::thread::supervise_thread()
@ 0x2b60f1941e25 start_thread
@ 0x2b60f235c34d __clone
@ 0x0 (unknown)
start time: Sat Sep 2 19:02:49 CST 2023

这个是在 docker部署的么,还是正常物理机器部署

正常的物理机部署

现在也是频繁重启么,打印个 core 文件看看,如何获取coredump

不是特别频繁,距离上次crash大概有十几天

3.0.2这个版本太多问题了,我准备直接升级到3.1.2,然后今天在测试环境模拟从3.0.2升级到3.1.2,但是在升级fe的时候,有一台follower升级成功了,但是另外一台follower在升级的时候报了如下的错误:

2023-09-04 14:11:59,673 WARN (replayer|74) [GlobalStateMgr.replayJournalInner():2283] catch exception when replaying 23819,
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 90 path $.p.m2.
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:226) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:654) ~[starrocks-fe.jar:?]
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:186) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:145) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:654) ~[starrocks-fe.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:131) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:222) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:654) ~[starrocks-fe.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:131) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:222) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:654) ~[starrocks-fe.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:963) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:928) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:877) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:848) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.UserPrivilegeCollectionInfo.read(UserPrivilegeCollectionInfo.java:75) ~[starrocks-fe.jar:?]
at com.starrocks.journal.JournalEntity.readFields(JournalEntity.java:998) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJournalCursor.deserializeData(BDBJournalCursor.java:251) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJournalCursor.next(BDBJournalCursor.java:295) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2264) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr$5.runOneCycle(GlobalStateMgr.java:2130) ~[starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr$5.run(GlobalStateMgr.java:2195) ~[starrocks-fe.jar:?]
Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 90 path $.p.m2.
at com.google.gson.stream.JsonReader.beginObject(JsonReader.java:384) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:215) ~[spark-dpp-1.0.0.jar:?]
… 23 more
2023-09-04 14:11:59,673 ERROR (replayer|74) [GlobalStateMgr$5.runOneCycle():2139] replayer thread catch an exception when replay journal 23819.
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 90 path $.p.m2.
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:226) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:654) ~[starrocks-fe.jar:?]
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:186) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:145) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:654) ~[starrocks-fe.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:131) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:222) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:654) ~[starrocks-fe.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:131) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:222) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:654) ~[starrocks-fe.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:963) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:928) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:877) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:848) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.persist.UserPrivilegeCollectionInfo.read(UserPrivilegeCollectionInfo.java:75) ~[starrocks-fe.jar:?]
at com.starrocks.journal.JournalEntity.readFields(JournalEntity.java:998) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJournalCursor.deserializeData(BDBJournalCursor.java:251) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJournalCursor.next(BDBJournalCursor.java:295) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2264) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr$5.runOneCycle(GlobalStateMgr.java:2130) ~[starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr$5.run(GlobalStateMgr.java:2195) ~[starrocks-fe.jar:?]
Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 90 path $.p.m2.
at com.google.gson.stream.JsonReader.beginObject(JsonReader.java:384) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:215) ~[spark-dpp-1.0.0.jar:?]
… 23 more

3.0.2 上执行 alter system create image ,再升级到3.1。

1赞

可以发我一个完整的be.out文件吗,我看发生过哪些crash

BE一共crash四次
3.0.2-becrash.txt (11.3 KB)

执行这个后升级成功了

看了下完整的be.out文件,是一个随机Crash,内存写乱了,建议是升级到3.0的最新小版本。

已升级到3.0.5,目前已稳定运行一周,暂未出问题。

1赞