运行两天后be无法启动

【详述】问题详细描述

运行两天后be无法启动

【背景】仅仅数据存储,在测试稳定性
【业务影响】be宕机,无法启动 ,清除所有的数据文件后可以启动
【StarRocks版本】例如:master 2.5.1
【集群规模】例如:3fe(2 follower+1observer)+3be(fe与be混部)
【机器信息】8C16G
【联系方式】425277456@qq.com

【附件】

  • fe.log/beINFO/相应截图
  • 慢查询:
    • Profile信息,获取Profile,通过Profile分析查询瓶颈
    • 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
    • pipeline是否开启:show variables like ‘%pipeline%’;
    • be节点cpu和内存使用率截图
  • 查询报错:
  • be crash
    • be.out
      start time: Tue Mar 28 15:57:57 CST 2023
      query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
      tracker:process consumption: 29221640
      tracker:query_pool consumption: 0
      tracker:load consumption: 0
      tracker:metadata consumption: 166131
      tracker:tablet_metadata consumption: 77467
      tracker:rowset_metadata consumption: 78370
      tracker:segment_metadata consumption: 1318
      tracker:column_metadata consumption: 8976
      tracker:tablet_schema consumption: 3019
      tracker:segment_zonemap consumption: 672
      tracker:short_key_index consumption: 0
      tracker:column_zonemap_index consumption: 1344
      tracker:ordinal_index consumption: 3888
      tracker:bitmap_index consumption: 0
      tracker:bloom_filter_index consumption: 0
      tracker:compaction consumption: 0
      tracker:schema_change consumption: 0
      tracker:column_pool consumption: 0
      tracker:page_cache consumption: 0
      tracker:update consumption: 91
      tracker:chunk_allocator consumption: 0
      tracker:clone consumption: 0
      tracker:consistency consumption: 0
      *** Aborted at 1679990277 (unix time) try “date -d @1679990277” if you are using GNU date ***
      PC: @ 0x62dd87d ra_init
      *** SIGILL (@0x62dd87d) received by PID 29334 (TID 0x7f6dfd4df700) from PID 103667837; stack trace: ***
      @ 0x5c3e682 google::(anonymous namespace)::FailureSignalHandler()
      @ 0x7f6e1a982630 (unknown)
      @ 0x62dd87d ra_init
      @ 0x4690960 starrocks::DelVector::_add_dels()
      @ 0x469126c starrocks::DelVector::add_dels_as_new_version()
      @ 0x44d2ae4 starrocks::TabletUpdates::_apply_rowset_commit()
      @ 0x44d48e2 starrocks::TabletUpdates::do_apply()
      @ 0x4d24335 starrocks::ThreadPool::dispatch_thread()
      @ 0x4d1f17a starrocks::thread::supervise_thread()
      @ 0x7f6e1a97aea5 start_thread
      @ 0x7f6e19f95b0d __clone
      @ 0x0 (unknown)

be.log
I0328 15:53:53.660302 8914 daemon.cpp:297] Disk Info:
Num disks 4: vda, vdb, sr, dm-
I0328 15:53:53.660310 8914 daemon.cpp:298] Mem Info: 15.51 GB
I0328 15:53:53.837191 8914 daemon.cpp:273] Minidump is disabled
I0328 15:53:53.837273 8914 backend_options.cpp:100] priority cidrs in conf: 192.168.0.112
I0328 15:53:53.837415 8914 backend_options.cpp:77] localhost 192.168.0.112
I0328 15:53:53.838598 8914 exec_env.cpp:445] Set storage page cache size 2698293731
I0328 15:53:53.838594 8942 daemon.cpp:201] Current memory statistics: process(28777432), query_pool(0), load(0), metadata(0), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0328 15:53:53.840044 8945 data_dir.cpp:126] path: /xxx/abdi/disks/data1/starrocks/storage, hash: -4203646903927453555
I0328 15:53:53.936587 9028 data_dir.cpp:250] start to load tablets from /xxx/abdi/disks/data1/starrocks/storage
I0328 15:53:53.936607 9028 data_dir.cpp:256] begin loading rowset from meta
I0328 15:53:53.937952 9028 data_dir.cpp:274] load rowset from meta finished, data dir: /xxx/abdi/disks/data1/starrocks/storage
I0328 15:53:53.937964 9028 data_dir.cpp:279] begin loading tablet from meta
I0328 15:53:53.940366 9029 primary_index.cpp:1131] load primary index finish table:10097 tablet:10111 version:7 #rowset:1 #segment:1 data_size:3592 rowsets:7 size:3 capacity:3 memory:91 duration: 0ms
I0328 15:53:53.940425 9028 data_dir.cpp:315] load tablet from meta finished, loaded tablet: 66, error tablet: 0, path: /xxx/abdi/disks/data1/starrocks/storage
I0328 15:53:53.944919 9096 fragment_mgr.cpp:529] FragmentMgr cancel worker start working.
I0328 15:53:53.951252 8914 exec_env.cpp:194] [PIPELINE] Exec thread pool: thread_num=8
I0328 15:57:57.154309 29334 daemon.cpp:290] version UNKNOWN RELEASE (build d848355)
Built on 2023-03-21 20:27:52 by root@docker
I0328 15:57:57.155941 29334 mem_info.cpp:104] Physical Memory: 15.51 GB
I0328 15:57:57.155961 29334 daemon.cpp:296] Cpu Info:
Model: Intel® Core™2 Duo CPU T7700 @ 2.40GHz
Cores: 8
Max Possible Cores: 8
L1 Cache: 32.00 KB (Line: 64.00 B)
L2 Cache: 2.00 MB (Line: 64.00 B)
L3 Cache: 0 (Line: 0)
Hardware Supports:
ssse3
sse4_2
Numa Nodes: 1
Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->0 | 7->0 |
I0328 15:57:57.155992 29334 daemon.cpp:297] Disk Info:
Num disks 4: vda, vdb, sr, dm-
I0328 15:57:57.155998 29334 daemon.cpp:298] Mem Info: 15.51 GB
I0328 15:57:57.333062 29334 daemon.cpp:273] Minidump is disabled
I0328 15:57:57.333122 29334 backend_options.cpp:100] priority cidrs in conf: 192.168.0.112
I0328 15:57:57.333237 29334 backend_options.cpp:77] localhost 192.168.0.112
I0328 15:57:57.334159 29334 exec_env.cpp:445] Set storage page cache size 2698293731
I0328 15:57:57.334151 29364 daemon.cpp:201] Current memory statistics: process(28777432), query_pool(0), load(0), metadata(0), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0328 15:57:57.335328 29366 data_dir.cpp:126] path: /xxx/abdi/disks/data1/starrocks/storage, hash: -4203646903927453555
I0328 15:57:57.440623 29461 data_dir.cpp:250] start to load tablets from /xxx/abdi/disks/data1/starrocks/storage
I0328 15:57:57.440641 29461 data_dir.cpp:256] begin loading rowset from meta
I0328 15:57:57.441974 29461 data_dir.cpp:274] load rowset from meta finished, data dir: /xxx/abdi/disks/data1/starrocks/storage
I0328 15:57:57.441985 29461 data_dir.cpp:279] begin loading tablet from meta
I0328 15:57:57.444051 29461 data_dir.cpp:315] load tablet from meta finished, loaded tablet: 66, error tablet: 0, path: /xxx/abdi/disks/data1/starrocks/storage
I0328 15:57:57.444849 29462 primary_index.cpp:1131] load primary index finish table:10097 tablet:10111 version:7 #rowset:1 #segment:1 data_size:3592 rowsets:7 size:3 capacity:3 memory:91 duration: 0ms
I0328 15:57:57.448243 29536 fragment_mgr.cpp:529] FragmentMgr cancel worker

最后时刻的fe节点日志为
I0327 20:10:56.815253 27530 tablet_updates.cpp:528] commit rowset tablet:10103 version:8 txn_id: 14 0200000000000156eb4cac48df230cece3382ffef94603a3 rowset:8 #seg:0 #delfile:0 #row:0 size:0 #pending:0
I0327 20:10:56.815297 27536 tablet_updates.cpp:528] commit rowset tablet:10119 version:8 txn_id: 14 020000000000015aeb4cac48df230cece3382ffef94603a3 rowset:8 #seg:0 #delfile:0 #row:0 size:0 #pending:0
I0327 20:10:56.815318 27529 tablet_updates.cpp:528] commit rowset tablet:10123 version:8 txn_id: 14 020000000000015beb4cac48df230cece3382ffef94603a3 rowset:8 #seg:0 #delfile:0 #row:0 size:0 #pending:0
I0327 20:10:56.815160 27535 tablet_updates.cpp:528] commit rowset tablet:10099 version:8 txn_id: 14 0200000000000155eb4cac48df230cece3382ffef94603a3 rowset:8 #seg:0 #delfile:0 #row:0 size:0 #pending:0
I0327 20:10:56.815356 27533 tablet_updates.cpp:528] commit rowset tablet:10115 version:8 txn_id: 14 0200000000000159eb4cac48df230cece3382ffef94603a3 rowset:8 #seg:0 #delfile:0 #row:0 size:0 #pending:0
I0327 20:10:56.815608 16927 tablet_updates.cpp:1074] apply_rowset_commit finish. tablet:10107 version:8 txn_id: 14 total del/row:0/0 0% rowset:8 #seg:0 #op(upsert:0 del:0) #del:0+0=0 #dv:0 duration:0ms(0/0/0/0)
I0327 20:10:56.815670 16927 tablet_updates.cpp:1074] apply_rowset_commit finish. tablet:10127 version:8 txn_id: 14 total del/row:0/0 0% rowset:8 #seg:0 #op(upsert:0 del:0) #del:0+0=0 #dv:0 duration:0ms(0/0/0/0)
I0327 20:10:56.815701 27531 publish_version.cpp:123] Publish txn success tablet:10127 version:8 tablet_max_version:8 partition:10096 txn_id: 14 rowset:020000000000015ceb4cac48df230cece3382ffef94603a3
I0327 20:10:56.815716 16927 tablet_updates.cpp:1074] apply_rowset_commit finish. tablet:10119 version:8 txn_id: 14 total del/row:0/0 0% rowset:8 #seg:0 #op(upsert:0 del:0) #del:0+0=0 #dv:0 duration:0ms(0/0/0/0)
I0327 20:10:56.815739 27536 publish_version.cpp:123] Publish txn success tablet:10119 version:8 tablet_max_version:8 partition:10096 txn_id: 14 rowset:020000000000015aeb4cac48df230cece3382ffef94603a3
I0327 20:10:56.815764 27534 publish_version.cpp:123] Publish txn success tablet:10107 version:8 tablet_max_version:8 partition:10096 txn_id: 14 rowset:0200000000000157eb4cac48df230cece3382ffef94603a3
I0327 20:10:56.815771 27530 publish_version.cpp:123] Publish txn success tablet:10103 version:8 tablet_max_version:8 partition:10096 txn_id: 14 rowset:0200000000000156eb4cac48df230cece3382ffef94603a3
I0327 20:10:56.815785 27531 tablet_updates.cpp:528] commit rowset tablet:10131 version:8 txn_id: 14 020000000000015deb4cac48df230cece3382ffef94603a3 rowset:8 #seg:0 #delfile:0 #row:0 size:0 #pending:0
I0327 20:10:56.815827 27536 tablet_updates.cpp:528] commit rowset tablet:10135 version:8 txn_id: 14 020000000000015eeb4cac48df230cece3382ffef94603a3 rowset:8 #seg:0 #delfile:0 #row:0 size:0 #pending:0
I0327 20:10:56.815968 16929 tablet_updates.cpp:1074] apply_rowset_commit finish. tablet:10103 version:8 txn_id: 14 total del/row:0/1 0% rowset:8 #seg:0 #op(upsert:0 del:0) #del:0+0=0 #dv:0 duration:0ms(0/0/0/0)
I0327 20:10:56.816058 16927 tablet_updates.cpp:1074] apply_rowset_commit finish. tablet:10099 version:8 txn_id: 14 total del/row:0/1 0% rowset:8 #seg:0 #op(upsert:0 del:0) #del:0+0=0 #dv:0 duration:0ms(0/0/0/0)
I0327 20:10:56.816082 27535 publish_version.cpp:123] Publish txn success tablet:10099 version:8 tablet_max_version:8 partition:10096 txn_id: 14 rowset:0200000000000155eb4cac48df230cece3382ffef94603a3
I0327 20:10:56.816175 16930 tablet_updates.cpp:1074] apply_rowset_commit finish. tablet:10135 version:8 txn_id: 14 total del/row:0/0 0% rowset:8 #seg:0 #op(upsert:0 del:0) #del:0+0=0 #dv:0 duration:0ms(0/0/0/0)
I0327 20:10:56.816226 16929 tablet_updates.cpp:1074] apply_rowset_commit finish. tablet:10115 version:8 txn_id: 14 total del/row:0/0 0% rowset:8 #seg:0 #op(upsert:0 del:0) #del:0+0=0 #dv:0 duration:0ms(0/0/0/0)
I0327 20:10:56.816233 27536 publish_version.cpp:123] Publish txn success tablet:10135 version:8 tablet_max_version:8 partition:10096 txn_id: 14 rowset:020000000000015eeb4cac48df230cece3382ffef9

您这个集群支持avx2指令集么?

不支持,但是打包时已经禁用avx2了