BE节点突然Crash

【详述】周末有台be突然挂掉。
【背景】
【业务影响】
【是否存算分离】否
【StarRocks版本】3.0.5
【集群规模】3fe+5be
【机器信息】32C/128G/万兆
【联系方式】社区群13-番茄西红柿
【附件】

be.INFO里看看对应时间点前后有没有更多的日志或者backtrace.

这边看到有些binlog同步的日志,不知道是不是由于binlog同步的数据量激增导致宕机的,附件是宕机前后的一些日志信息。麻烦老师帮忙看看
be.INFO (706.3 KB)

W1126 09:46:19.157271 162368 primary_index.cpp:949] load PrimaryIndex error: Internal error: FixedMutableIndex<48> insert found duplicate key 80000000000000088000000000001CBB80000000245C039180000000000000A581780D3D98F84C9BA589E0000E8C4228
/build/starrocks/be/src/storage/persistent_index.cpp:1559 _shards[shard_offset + i]->insert(keys, values, idxes_by_shard[i])
/build/starrocks/be/src/storage/persistent_index.cpp:3099 _l0->insert(n, keys, values, check_l1_key_sizes)
/build/starrocks/be/src/storage/persistent_index.cpp:2766 _insert_rowsets(tablet, rowsets, pkey_schema, apply_version, std::move(pk_column)) tablet:0 stack:
    @          0x47a3688  starrocks::PrimaryIndex::load()
    @          0x487fc1c  starrocks::TabletUpdates::_apply_compaction_commit()
    @          0x48867fd  starrocks::TabletUpdates::do_apply()
    @          0x51a47a5  starrocks::ThreadPool::dispatch_thread()
    @          0x519f18a  starrocks::Thread::supervise_thread()
    @     0x7f691b7edea5  start_thread
    @     0x7f691ae08b0d  __clone
    @              (nil)  (unknown)
W1126 09:46:19.157194 162370 primary_index.cpp:949] load PrimaryIndex error: Internal error: FixedMutableIndex<48> insert found duplicate key 80000000000000088000000000001CBB80000000245BFBC1800000000000000481780D3D98FA2493A589E0002B844240
/build/starrocks/be/src/storage/persistent_index.cpp:1559 _shards[shard_offset + i]->insert(keys, values, idxes_by_shard[i])
/build/starrocks/be/src/storage/persistent_index.cpp:3099 _l0->insert(n, keys, values, check_l1_key_sizes)
/build/starrocks/be/src/storage/persistent_index.cpp:2766 _insert_rowsets(tablet, rowsets, pkey_schema, apply_version, std::move(pk_column)) tablet:0 stack:
    @          0x47a3688  starrocks::PrimaryIndex::load()
    @          0x487fc1c  starrocks::TabletUpdates::_apply_compaction_commit()
    @          0x48867fd  starrocks::TabletUpdates::do_apply()
    @          0x51a47a5  starrocks::ThreadPool::dispatch_thread()
    @          0x519f18a  starrocks::Thread::supervise_thread()
    @     0x7f691b7edea5  start_thread
    @     0x7f691ae08b0d  __clone
    @              (nil)  (unknown)
E1126 09:46:19.222044 162368 tablet_updates.cpp:1621] _apply_compaction_commit error: load primary index failed: Internal error: FixedMutableIndex<48> insert found duplicate key 80000000000000088000000000001CBB80000000245C039180000000000000A581780D3D98F84C9BA589E0000E8C4228
/build/starrocks/be/src/storage/persistent_index.cpp:1559 _shards[shard_offset + i]->insert(keys, values, idxes_by_shard[i])
/build/starrocks/be/src/storage/persistent_index.cpp:3099 _l0->insert(n, keys, values, check_l1_key_sizes)
/build/starrocks/be/src/storage/persistent_index.cpp:2766 _insert_rowsets(tablet, rowsets, pkey_schema, apply_version, std::move(pk_column)) tablet:3260935 #version:2 [7000.1 7000.1@0 7000.2] pending: rowsets:1
  19 [seg:1 row:6039456 del:0 bytes:179185692 compaction:89249764]
E1126 09:46:19.222046 162370 tablet_updates.cpp:1621] _apply_compaction_commit error: load primary index failed: Internal error: FixedMutableIndex<48> insert found duplicate key 80000000000000088000000000001CBB80000000245BFBC1800000000000000481780D3D98FA2493A589E0002B844240
/build/starrocks/be/src/storage/persistent_index.cpp:1559 _shards[shard_offset + i]->insert(keys, values, idxes_by_shard[i])
/build/starrocks/be/src/storage/persistent_index.cpp:3099 _l0->insert(n, keys, values, check_l1_key_sizes)
/build/starrocks/be/src/storage/persistent_index.cpp:2766 _insert_rowsets(tablet, rowsets, pkey_schema, apply_version, std::move(pk_column)) tablet:3260947 #version:2 [7000 7000@0 7000.1] pending: rowsets:1
  19 [seg:1 row:5930003 del:0 bytes:179068330 compaction:89367126]

Primary key index related.

这个BE所在的机器最近重启过吗?

给下这张表的建表sql看下

老师,最近都没有重启过的。

老师,根据日志里面的 tablet id 查询不到对应的表信息,请教一下,如何才能查询到对应的表信息呢?

show tablet 3260935;
show tablet 3260947;

who -b 发下结果再确认下机器是否发生过重启

  1. 通过show tablet $tablet_id;找到 tablename
  2. show create table $tablename; 发下结果
  3. 确认下 tablename 该表的数据来源以及通过哪种方式写入sr,是否存在部分字段更新或者条件更新等情况,写入时是否做特殊处理等

老师,这个tablet显示不存在了

老师,截图是执行命令返回的结果。
image