starrocks 写入失败

【详述】问题详细描述
使用dolphinscheduler jdbc方式和python连接starrocks,写数据时,报execute sql error: get database write lock timeout, database=default_cluster:app, timeoutMillis=10000,这个错误应该是最近才出现的,我调了参数:txn_commit_rpc_timeout_ms=30000,也不生效
【背景】做过哪些操作?
【业务影响】
数据写入失败
【StarRocks版本】2.3.10
【集群规模】3fe(3 follower)+ 3be
【机器信息】CPU虚拟核/内存/网卡,16C/64G/万兆
【表模型】UNIQUE模型
【导入或者导出方式】python,jdbc(insert into)
【联系方式】社区群5-老张
【附件】


补充下:fe.log中有大量的此类日志:2023-07-26 00:00:17,127 WARN (starrocks-mysql-nio-pool-8018|409382) [Database.logTryLockFailureEvent():157] try db lock failed. type: writeLock, current
2023-07-26 00:00:17,127 WARN (starrocks-mysql-nio-pool-8018|409382) [StmtExecutor.handleDMLStmt():1270] handle insert stmt fail: insert_501a52ca-2b04-11ee-9fbd-06322d653dc1
com.starrocks.common.UserException: get database write lock timeout, database=default_cluster:ods, timeoutMillis=10000
at com.starrocks.transaction.GlobalTransactionMgr.commitAndPublishTransaction(GlobalTransactionMgr.java:348) ~[starrocks-fe.jar:?]
at com.starrocks.transaction.GlobalTransactionMgr.commitAndPublishTransaction(GlobalTransactionMgr.java:337) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:1248) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:447) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:325) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:443) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:706) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:55) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_341]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_341]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_341]

参考 get database write lock timeout问题处理手册

看过了哈,对我们场景来说帮助不大,况且我们导入频率已经控制的很好了,没那么频繁

最近这两天又报另外的错了:
2023-07-31 00:02:07,304 INFO (starrocks-mysql-nio-pool-4604|11440) [StmtExecutor.analyze():605] begin to analyze stmt: 1550454, forwarded stmt id: 0
2023-07-31 00:02:07,306 INFO (starrocks-mysql-nio-pool-4601|11435) [QeProcessorImpl.registerQuery():78] register query id = 6eeccd12-2ef2-11ee-94bc-06322d653dc1, job: -1
2023-07-31 00:02:07,306 INFO (starrocks-mysql-nio-pool-4600|11434) [QeProcessorImpl.registerQuery():78] register query id = 2607e982-2ef2-11ee-94bc-06322d653dc1, job: -1
2023-07-31 00:02:07,306 WARN (replayer|69) [Database.logSlowLockEventIfNeeded():150] slow db lock. type: writeLock, db id: 10070, db name: default_cluster:ods, wait time: 99023ms, former , current stack trace:
java.lang.Exception: null
at com.starrocks.catalog.Database.logSlowLockEventIfNeeded(Database.java:151) [starrocks-fe.jar:?]
at com.starrocks.catalog.Database.writeLock(Database.java:228) [starrocks-fe.jar:?]
at com.starrocks.server.LocalMetastore.replayUpdateReplica(LocalMetastore.java:2588) [starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayUpdateReplica(GlobalStateMgr.java:2274) [starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.loadJournal(EditLog.java:398) [starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:1726) [starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr$4.runOneCycle(GlobalStateMgr.java:1518) [starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:115) [starrocks-fe.jar:?]
2023-07-31 00:02:07,306 WARN (replayer|69) [GlobalStateMgr.replayJournal():1739] replay journal cost too much time: 99025 replayedJournalId: 127068269
2023-07-31 00:02:07,307 INFO (starrocks-mysql-nio-pool-4605|11441) [QeProcessorImpl.registerQuery():78] register query id = 6a9580d8-2ef2-11ee-94bc-06322d653dc1, job: -1
2023-07-31 00:02:07,308 INFO (starrocks-mysql-nio-pool-4601|11435) [QeProcessorImpl.unregisterQuery():88] deregister query id 6eeccd12-2ef2-11ee-94bc-06322d653dc1
2023-07-31 00:02:07,309 INFO (starrocks-mysql-nio-pool-4604|11440) [QeProcessorImpl.registerQuery():78] register query id = 6eed1b33-2ef2-11ee-94bc-06322d653dc1, job: -1
2023-07-31 00:02:07,309 INFO (replayer|69) [GlobalStateMgr.replayJournal():1711] replayed journal id is 127068269, replay to journal id is 127069148
2023-07-31 00:02:07,310 INFO (starrocks-mysql-nio-pool-4601|11435) [StmtExecutor.analyze():605] begin to analyze stmt: 1550457, forwarded stmt id: 0

这个错,有一个fe节点时不时的在报这个错,导致有时连数据库和查询,插入操作都很慢

我在社群15,昵称老张,方便添加下微信吗

是不是insert into 太频繁,检查一下表的tablet是不是过多了。

我们的是批处理任务,insert into不频繁,请问多少tablet算比较多呢?

一个tablet控制在1G左右最佳。

有些是小余1G的,并不是每个tablet都能控制到1G

什么导入方式,有多频繁?是不是还会有很多drop table分区 创建新分区的操作

大多数任务是insert into select方式,最小调度周期是5分钟,也有stream load任务,线上是10秒刷一次,还有部分kafka routine load任务

insert into select 调度调大观察一下

5分钟一次了,还不够大?

insert into select 多少个表,会不会上一次还没完成就启动了下次的跑批?

不会的,都是一个insert into 一张表,每次执行批处理还是挺快的

保证唯一性,可以尝试主键模型,如果没有唯一性,可以用明细模型。

这两种方式底层都是insert into values吧?封装下stream load写入,生产不建议使用insert into values,另外show proc '/transactions/$dbid/finished’可以看到写入的并发

另外提供一份fe.log

很多场景是没办法封装stream load的,都是要通过sql来做

insert into values 不合适