no tablet exist

【详述】streamload 时报错:


【背景】
【业务影响】
【StarRocks版本】2.4.3
【集群规模】3fe +5be(fe与be混部)
【机器信息】32C/128G/万兆
【联系方式】StarRocks 社区群 3; 如斯

fe.warn

W0820 16:01:24.956701 80679 tablet_sink.cpp:795] NodeChannel[1570555-850618], tablet open failed, load_id=2a49b3e2-2e7c-77de-d724-d5f7d684de93, txn_id: 67703553, parallel=1, compress_type=2, node=10.65.23.236:8060, errmsg=Fail to get tablet. tablet_id=1598212
W0820 16:01:24.956708 80679 tablet_sink.cpp:804] Open channel failed. load_id: 2a49b3e22e7c77de-d724d5f7d684de93, error: Internal error: Fail to get tablet. tablet_id=1598212: be:10.65.23.236
W0820 16:01:24.956723 80679 plan_fragment_executor.cpp:178] Fail to open fragment, instance_id=2a49b3e2-2e7c-77de-d724-d5f7d684de94, status=Internal error: Fail to get tablet. tablet_id=1598212: be:10.65.23.236
W0820 16:01:24.957011 80679 fragment_mgr.cpp:182] Fail to open fragment 2a49b3e2-2e7c-77de-d724-d5f7d684de94: Internal error: Fail to get tablet. tablet_id=1598212: be:10.65.23.236
W0820 16:01:24.957494 80679 stream_load_executor.cpp:92] fragment execute failed, query_id=2a49b3e22e7c77de-d724d5f7d684de93, err_msg=Fail to get tablet. tablet_id=1598212: be:10.65.23.236, id=2a49b3e22e7c77de-d724d5f7d684de93, job_id=-1, txn_id: 67703553, label=8d4147f5-731a-4a73-8437-588e16af96ea, db=aweme_gmv
W0820 16:01:24.957532 81344 stream_load.cpp:135] Fail to handle streaming load, id=2a49b3e22e7c77de-d724d5f7d684de93 errmsg=Fail to get tablet. tablet_id=1598212: be:10.65.23.236
W0820 19:25:52.175015 80682 tablet_sink.cpp:795] NodeChannel[1570555-850618], tablet open failed, load_id=164a4cb8-e378-e359-ceed-e1ac8a748d84, txn_id: 67734117, parallel=1, compress_type=2, node=10.65.23.236:8060, errmsg=Fail to get tablet. tablet_id=1598212
W0820 19:25:52.175019 80682 tablet_sink.cpp:804] Open channel failed. load_id: 164a4cb8e378e359-ceede1ac8a748d84, error: Internal error: Fail to get tablet. tablet_id=1598212: be:10.65.23.236
W0820 19:25:52.175026 80682 plan_fragment_executor.cpp:178] Fail to open fragment, instance_id=164a4cb8-e378-e359-ceed-e1ac8a748d85, status=Internal error: Fail to get tablet. tablet_id=1598212: be:10.65.23.236
W0820 19:25:52.175194 80682 fragment_mgr.cpp:182] Fail to open fragment 164a4cb8-e378-e359-ceed-e1ac8a748d85: Internal error: Fail to get tablet. tablet_id=1598212: be:10.65.23.236

W0820 19:25:52.175529 80682 stream_load_executor.cpp:92] fragment execute failed, query_id=164a4cb8e378e359-ceede1ac8a748d84, err_msg=Fail to get tablet. tablet_id=1598212: be:10.65.23.236, id=164a4cb8e378e359-ceede1ac8a748d84, job_id=-1, txn_id: 67734117, label=a493f1aa-0176-483e-81bf-e621f17398e1, db=aweme_gmv
W0820 19:53:15.346398 81289 olap_scan_node.cpp:659] failed to get tablet. tablet_id=1598212, with schema_hash=79328473, reason=tablet does not exist
W0820 19:53:15.346876 81289 internal_service.cpp:186] exec multi plan fragments failed, errmsg=failed to get tablet. tablet_id=1598212, with schema_hash=79328473, reason=tablet does not exist
W0820 19:54:14.229293 81289 olap_scan_node.cpp:659] failed to get tablet. tablet_id=1598212, with schema_hash=79328473, reason=tablet does not exist
W0820 19:54:14.229751 81289 internal_service.cpp:186] exec multi plan fragments failed, errmsg=failed to get tablet. tablet_id=1598212, with schema_hash=79328473, reason=tablet does not exist
W0821 10:26:20.452546 81349 utils.cpp:124] Failed to open file: /usr/StarRocks/be/www//api/meta/header/1598212/-1

be.info

W0821 11:22:43.381673 98298 delta_writer.cpp:84] Fail to get tablet. tablet_id=1598212

W0821 11:22:43.386008 98273 load_channel.cpp:76] Fail to open index 1570555 of load 23406b3b29443a23-224e3a44639e9ba1: Internal error: Fail to get tablet. tablet_id=1598212
W0821 11:22:43.386008 98273 load_channel.cpp:76] Fail to open index 1570555 of load 23406b3b29443a23-224e3a44639e9ba1: Internal error: Fail to get tablet. tablet_id=1598212
/root/starrocks/be/src/storage/delta_writer.cpp:22 writer->_init()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:484 res.status()
/root/starrocks/be/src/runtime/local_tablets_channel.cpp:230 _open_all_writers(params)

查询tablet 状态

在leader上查询

SHOW TABLET 1598212;

在follower上查询

使用 SHOW PROC ‘/dbs/1003462/1570554/partitions/1569831/1570555/1598212’; 查询

使用 http://10.65.23.236:8040/api/meta/header/1598212 查询, 会报错

{
    "status": "Fail",
    "msg": "no tablet exist"
}

查询分区副本状态
admin show replica status from DB.NAME partition(p20230109);

VersionNum 为 -1;

请问是因为 tablet 的元数据损坏吗?

VersionNum 为 -1 的 其他几个 tablet ,也有同样的问题。

在一台FE上查询 VersionNum 为 -1 , 其他2台FE上查询,相应的tablet VersionNum 为1