truncate table process failed

【详述】truncate table process failed
【背景】每天晚上ETL抽取T+1,先truncate后insert覆盖
【业务影响】无法覆盖表,数据无法更新
【StarRocks版本】2.3.0
【集群规模】3fe(3 follower)+5be(fe与be混部)
【联系方式】15623937986
【附件】

  • fe.log/beINFO/相应截图
    FE日志,如下:
    2023-02-04 07:33:41,430 WARN (thrift-server-pool-43|234) [StmtExecutor.handleDdlStmt():978] DDL statement(truncate table pete_dw.t_ref_tgt_impact_lot_list) process failed.
    com.starrocks.common.DdlException: fail to create tablet: timed out. unfinished replicas(3/6): 200932067(192.168.52.146) 200932055(192.168.52.147) 200932067(192.168.52.147) timeout=2s
    at com.starrocks.server.LocalMetastore.waitForFinished(LocalMetastore.java:1628) ~[starrocks-fe.jar:?]
    at com.starrocks.server.LocalMetastore.sendCreateReplicaTasksAndWaitForFinished(LocalMetastore.java:1580) ~[starrocks-fe.jar:?]
    at com.starrocks.server.LocalMetastore.buildPartitionsSequentially(LocalMetastore.java:1431) ~[starrocks-fe.jar:?]
    at com.starrocks.server.LocalMetastore.buildPartitions(LocalMetastore.java:1400) ~[starrocks-fe.jar:?]
    at com.starrocks.server.LocalMetastore.truncateTable(LocalMetastore.java:3731) ~[starrocks-fe.jar:?]
    at com.starrocks.server.GlobalStateMgr.truncateTable(GlobalStateMgr.java:2989) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.DdlExecutor.execute(DdlExecutor.java:215) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.handleDdlStmt(StmtExecutor.java:961) [starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:449) [starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.proxyExecute(ConnectProcessor.java:586) [starrocks-fe.jar:?]
    at com.starrocks.service.FrontendServiceImpl.forward(FrontendServiceImpl.java:737) [starrocks-fe.jar:?]
    at com.starrocks.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:2071) [starrocks-fe.jar:?]
    at com.starrocks.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:2051) [starrocks-fe.jar:?]
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.13.0.jar:0.13.0]
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) [libthrift-0.13.0.jar:0.13.0]
    at com.starrocks.common.SRTThreadPoolServer$WorkerProcess.run(SRTThreadPoolServer.java:310) [starrocks-fe.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
    2023-02-04 07:33:41,523 WARN (thrift-server-pool-20|194) [MasterImpl.finishTask():194] cannot find task. type: CREATE, backendId: 852425
    BE日志,如下:
    I0204 07:33:39.429392 15152 task_worker_pool.cpp:266] success to submit task. type=CREATE, signature=[200932055,200932067], task_count_in_queue=327
    I0204 07:33:41.553824 11262 tablet_manager.cpp:136] Creating tablet 200932055
    I0204 07:33:41.554236 11262 tablet_manager.cpp:178] Created tablet 200932055
    I0204 07:33:50.869511 15152 task_worker_pool.cpp:266] success to submit task. type=DROP, signature=[200814583,200814571,200814554,200814537,200814524,200815222,200815209,200815196,200815179,200815167,200815154,200815137,200815124,200815111,200815094,200815082,200815069,200815052,200815039,200815026,200815009,200814997,200814984,200814967,200814954,200814941,200814924,200814912,200814899,200814882,200814869,200814856,200814839,200814827,200814814,200814797,200814784,200814771,200814754,200814742,200814729,200814712,200814699,200814686,200814669,200814657,200814644,200814627,200814614,200822748,200822725,200822682,200822662,200822652,200822597,200822581,200822526,200822512,200822475,200823728,200823685,200823670,200823659,200823617,200823591,200823575,200823557,200823515,200823476,200823460,200823413,200823388,200823345,200823333,200823302,200823278,200823247,200823207,200823189,200823154,200823129,200823120,200823077,200823068,200823052,200823005,200822985,200822962,200822941,200822894,200822852,200822832,200822826,200822810,200822789,200847102,200847090,200847077,200847060,200847047,200932067,200932055,200932035,199129952,199129278], task_count_in_queue=105
    I0204 07:33:50.876425 11267 tablet_manager.cpp:331] Dropping tablet 200932055
    I0204 07:33:50.876437 11267 tablet_manager.cpp:1227] drop tablet:200932055, stop compaction task
    I0204 07:34:41.814851 11063 tablet_manager.cpp:929] Moved /data/data8/starrocks8/data/327/200932055

后续通过重建表解决,但是根本原因没有找到,并且还有一条垃圾元数据信息,如下:

(1)重建表,如下:
create table pete_dw.t_ref_tgt_impact_lot_list_bak like pete_dw.t_ref_tgt_impact_lot_list;
(2)导入数据,如下:
insert into pete_dw.t_ref_tgt_impact_lot_list_bak select * from pete_dw.t_ref_tgt_impact_lot_list;
(3)交换表名,如下:
alter table t_ref_tgt_impact_lot_list rename t_ref_tgt_impact_lot_list_bak1;
alter table t_ref_tgt_impact_lot_list_bak rename t_ref_tgt_impact_lot_list;
(4)truncate table t_ref_tgt_impact_lot_list_bak1仍然报错,如下:
java.sql.SQLSyntaxErrorException: fail to create tablet: timed out. unfinished replicas(3/3): 205230045(192.168.52.146) 205230041(192.168.52.147) 205230041(192.168.52.147) timeout=2s
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:764)
at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:648)
(5)drop table t_ref_tgt_impact_lot_list_bak1 force,看垃圾元数据信息,仍然存在,如下:

show tablet 200932055

DbName TableName PartitionName IndexName DbId TableId PartitionId IndexId IsSync DetailCmd
  |         |             |         |-1  |-1     |-1         |-1     |false |SHOW PROC '/dbs/-1/-1/partitions/-1/-1/200932055';|

你老的日志还在吗,可以备份下吗,我们一起找个时间看看这个问题。be.info和fe.info当时出错时间点附近的日志

有邮箱吗,将日志发到你邮箱,OK吗

lxhhust350@qq.com 或是加微信 lxhhust350 发都行, 最好压缩下

日志发我了吗?。。。

日志文件刚刚发给您,请查收,谢谢!我的邮箱地址为:
RENFENG_LIU

已经发送到您隔热民邮箱,请查收,谢谢!

好的,收到,我先分析下,再给你答复

这个查到问题了没

还在排查过程中,。。

HI,找到了原因没,谢谢!

你加我下,我拉个群,查下这个问题

我在starrocks社区群8,您微信号,多少呢,我加一下

lxhhust350

用户改大了时间以后,问题不存在了,所以更大概率可能是网络VPC有点问题。

改了,还是偶尔发生了,频率确实低了很多

能问下,这里改了什么时间?这个问题我们也遇到了

你用的是哪个版本,现像是什么?