2.5.12 查询高峰期insert into load 任务timeout频繁

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
在业务查询高峰期,单个FE qps 超过150左右,insert into load 任务timeout 频繁,无法通过加大timeout 参数解决
CPU / 内存 / IO 都在正常的使用范围, 可能是BE某个参数达到瓶颈了?

【背景】做过哪些操作?
【业务影响】
【StarRocks版本】例如:2.5.12
【集群规模】例如:5fe(3 follower+2observer)+19be
【机器信息】CPU虚拟核/内存/网卡,例如:FE 32C 128G / BE 64C 256G NVME SSD 3.5T * 4
【联系方式】社区群3-杨荣
【附件】

  • fe.log/beINFO/相应截图
    fe.warn.log 的日志
    2023-10-09 10:16:20,707 WARN (starrocks-mysql-nio-pool-5706|1328214) [StmtExecutor.handleDMLStmtImpl():1559] failed to handle stmt xxx
    label: insert_8e8dafb2-6649-11ee-b5e7-525400b25688
    com.starrocks.common.DdlException: Query timeout. Increase the query_timeout session variable and retry
    at com.starrocks.common.ErrorReport.reportDdlException(ErrorReport.java:86) ~[starrocks-fe.jar:?]
    at com.starrocks.common.ErrorReport.reportDdlException(ErrorReport.java:81) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.handleDMLStmtImpl(StmtExecutor.java:1459) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:1267) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:503) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:325) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:442) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:700) ~[starrocks-fe.jar:?]
    at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:55) ~[starrocks-fe.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
    2023-10-09 10:16:20,707 WARN (starrocks-mysql-nio-pool-5706|1328214) [StmtExecutor.handleDMLStmtImpl():1582] errors when abort txn
    com.starrocks.transaction.TransactionNotFoundException: transaction not found
    at com.starrocks.transaction.DatabaseTransactionMgr.abortTransaction(DatabaseTransactionMgr.java:1265) ~[starrocks-fe.jar:?]
    at com.starrocks.transaction.DatabaseTransactionMgr.abortTransaction(DatabaseTransactionMgr.java:1246) ~[starrocks-fe.jar:?]
    at com.starrocks.transaction.GlobalTransactionMgr.abortTransaction(GlobalTransactionMgr.java:487) ~[starrocks-fe.jar:?]
    at com.starrocks.transaction.GlobalTransactionMgr.abortTransaction(GlobalTransactionMgr.java:475) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.handleDMLStmtImpl(StmtExecutor.java:1574) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:1267) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:503) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:325) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:442) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:700) ~[starrocks-fe.jar:?]
    at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:55) ~[starrocks-fe.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
  • 慢查询:
    复现了一个timeout 的insert into load 任务的profile 信息
    正常情况10s 内能执行完,异常的情况下120s 会超时
    • Profile信息, profile.txt (32.0 KB)
    • 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
    • pipeline是否开启:开启pipeline, pipeline_dop=4
    • be节点cpu和内存使用率截图