查询出错,msg:StarRocks planner use long time 30000 remaining task num 4

【详述】同时提交多个查询报执行计划超时
【背景】调整过执行计划超时时间至30s未能解决问题
【业务影响】
【StarRocks版本】例如:1.19.5

【附件】

  • fe.log
    2022-01-17 10:43:50,035 WARN (starrocks-mysql-nio-pool-372|33583) [StmtExecutor.execute():315] New planner error: select count(1) from (SELECT /*+ SET_VAR(new_planner_optimize_timeout = 30000) */ ‘2022-01-16’ daytime, ROUND(IFNULL(SUM(dev_init_cnt), 0), 2) AS dev_init_cnt, ROUND(IFNULL(SUM(device_reg_cnt), 0), 2) AS device_reg_cnt, ROUND(IFNULL(IFNULL(SUM(device_reg_cnt), 0) / IFNULL(SUM(dev_init_cnt), 0), 0), 4) AS convert_rate FROM tl_advert_daily_active_basic WHERE appkey IN (‘5dd2c66c90cf4d6d8b20ee12a6fab966’, ‘6b2cec81030e4e1996f615ad8193bc14’, ‘fda16bbd14754c53baaa24798d364055’, ‘8c0b4d32b0674424a3753135cd491379’, ‘db3bd98333c144b0abe404dc648ecf55’, ‘987235e403ad42f185e85c230069ac32’, ‘440aa82ef5084638ae18db4014c436b3’, ‘1aa9c860218b4e739af1e0fcc7c5a533’, ‘5e5027c8bd9b4b899f80936773c7cd02’, ‘cbf559f5f8ce4b0a9bdbdfd418284fc9’, ‘8e8c7cdb10694aa9ab2d97353ce51135’, ‘3ad69fc2cc77442ab5341fca28b7dada’, ‘78c0a36619854a9ab8f30d71bd6526d4’, ‘c060c384937843e18c5bc0d43859000a’, ‘28e1a9c9a8ba4f92b8526b79d870fff7’, ‘547d9f104cd94e798bd4d4c34c2c2120’, ‘56f4588dd9414fa7bd88ac175e945a28’, ‘04b691902f0544339c4f24e3b95c3f9a’, ‘4d93959b4a064bc4a5f8a5f06ca1c95f’, ‘4407f2a46e8a4638bffb601a75eb6045’, ‘18f45b4962b34d4f8bfe1505c5e35bbc’, ‘dc9f970fb68746b1b7e072da084c6902’, ‘e4aaa0b804dd4f1d810e331496850f3f’) AND daytime BETWEEN ‘2021-03-01’ AND ‘2022-01-16’ AND platform = ‘SERVER’ ORDER BY daytime DESC ) cnum
    com.starrocks.sql.common.StarRocksPlannerException: StarRocks planner use long time 30000 remaining task num 4
    at com.starrocks.sql.optimizer.task.SeriallyTaskScheduler.executeTasks(SeriallyTaskScheduler.java:32) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.optimizer.Optimizer.optimize(Optimizer.java:167) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.StatementPlanner.createQueryPlan(StatementPlanner.java:73) ~[starrocks-fe.jar:?]
    at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:44) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:301) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:248) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:397) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:633) ~[starrocks-fe.jar:?]
    at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$485(ReadListener.java:54) ~[starrocks-fe.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_301]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_301]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_301]

看下fe是不是有fullgc,一般可以调整fe的conf文件种jvm参数大小,改为16G重试下查询看看

这是我们参数的设置
JAVA_OPTS="-Xmx51200m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:$DORIS_HOME/log/fe.gc.log.$DATE"

JAVA_OPTS_FOR_JDK_9="-Xmx32768m -XX:SurvivorRatio=8 -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xlog:gc*:$DORIS_HOME/log/fe.gc.log.$DATE:time"`

设置这么大,重启fe后还有问题吗?看下fe.gc.log有没有频繁的fullgc

   我们fe节点是64G的,这个参数是不设置的太大了?
 2022-01-17T14:52:19.718+0800: 270377.362: [GC (Allocation Failure) 2022-01-17T14:52:19.719+0800: 270377.362: [ParNew: 309056K->34304K(309056K), 0.0559949 secs] 13118383K->12942211K(36039200K), 0.0562204 secs] [Times: user=0.64 sys=0.00, real=0.05 secs] 
        2022-01-17T14:52:20.003+0800: 270377.647: [GC (Allocation Failure) 2022-01-17T14:52:20.003+0800: 270377.647: [ParNew: 309056K->34304K(309056K), 0.1200192 secs] 13216963K->13049697K(36039200K), 0.1202677 secs] [Times: user=1.29 sys=0.02, real=0.12 secs] 
        2022-01-17T14:52:20.412+0800: 270378.056: [GC (Allocation Failure) 2022-01-17T14:52:20.412+0800: 270378.056: [ParNew: 309056K->34304K(309056K), 0.0633669 secs] 13324449K->13146734K(36039200K), 0.0636049 secs] [Times: user=0.74 sys=0.00, real=0.06 secs] 
        2022-01-17T14:52:20.733+0800: 270378.377: [GC (Allocation Failure) 2022-01-17T14:52:20.734+0800: 270378.377: [ParNew: 309056K->34304K(309056K), 0.1144339 secs] 13421486K->13260860K(36039200K), 0.1146695 secs] [Times: user=1.24 sys=0.01, real=0.12 secs] 
        2022-01-17T14:52:21.107+0800: 270378.751: [GC (Allocation Failure) 2022-01-17T14:52:21.107+0800: 270378.751: [ParNew: 309056K->34304K(309056K), 0.0744008 secs] 13535612K->13386665K(36039200K), 0.0746316 secs] [Times: user=0.78 sys=0.02, real=0.08 secs] 
        2022-01-17T14:52:21.405+0800: 270379.049: [GC (Allocation Failure) 2022-01-17T14:52:21.405+0800: 270379.049: [ParNew: 309056K->34304K(309056K), 0.1190523 secs] 13661417K->13496831K(36039200K), 0.1192965 secs] [Times: user=1.29 sys=0.00, real=0.12 secs] 
        2022-01-17T14:52:21.803+0800: 270379.447: [GC (Allocation Failure) 2022-01-17T14:52:21.804+0800: 270379.447: [ParNew: 309056K->34304K(309056K), 0.0789693 secs] 13771583K->13629345K(36039200K), 0.0791930 secs] [Times: user=0.87 sys=0.01, real=0.08 secs] 
        2022-01-17T14:52:22.137+0800: 270379.780: [GC (Allocation Failure) 2022-01-17T14:52:22.137+0800: 270379.781: [ParNew: 309056K->34304K(309056K), 0.1157660 secs] 13904097K->13741066K(36039200K), 0.1159921 secs] [Times: user=1.27 sys=0.00, real=0.12 secs] 
        2022-01-17T14:52:22.502+0800: 270380.146: [GC (Allocation Failure) 2022-01-17T14:52:22.502+0800: 270380.146: [ParNew: 309056K->34304K(309056K), 0.0559946 secs] 14015818K->13832968K(36039200K), 0.0562335 secs] [Times: user=0.67 sys=0.01, real=0.06 secs] 
        2022-01-17T14:52:22.786+0800: 270380.430: [GC (Allocation Failure) 2022-01-17T14:52:22.786+0800: 270380.430: [ParNew: 301438K->34304K(309056K), 0.1214521 secs] 14100102K->13943407K(36039200K), 0.1216945 secs] [Times: user=1.32 sys=0.00, real=0.13 secs] 
        2022-01-17T14:52:23.191+0800: 270380.835: [GC (Allocation Failure) 2022-01-17T14:52:23.192+0800: 270380.835: [ParNew: 309056K->34304K(309056K), 0.0841277 secs] 14218159K->14068739K(36039200K), 0.0843902 secs] [Times: user=0.94 sys=0.00, real=0.08 secs] 
        2022-01-17T14:52:23.541+0800: 270381.185: [GC (Allocation Failure) 2022-01-17T14:52:23.541+0800: 270381.185: [ParNew: 309056K->34304K(309056K), 0.1163437 secs] 14343491K->14180915K(36039200K), 0.1165927 secs] [Times: user=1.26 sys=0.01, real=0.12 secs] 
        2022-01-17T14:52:24.254+0800: 270381.898: [GC (Allocation Failure) 2022-01-17T14:52:24.254+0800: 270381.898: [ParNew: 309056K->33866K(309056K), 0.0442170 secs] 14455667K->14249638K(36039200K), 0.0444348 secs] [Times: user=0.56 sys=0.00, real=0.05 secs] 
        2022-01-17T14:52:26.931+0800: 270384.575: [GC (Allocation Failure) 2022-01-17T14:52:26.931+0800: 270384.575: [ParNew: 308618K->17257K(309056K), 0.0307612 secs] 14524390K->14233028K(36039200K), 0.0309736 secs] [Times: user=0.39 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:29.067+0800: 270386.711: [GC (Allocation Failure) 2022-01-17T14:52:29.067+0800: 270386.711: [ParNew: 292009K->20877K(309056K), 0.0315529 secs] 14507780K->14236648K(36039200K), 0.0317824 secs] [Times: user=0.40 sys=0.00, real=0.04 secs] 
        2022-01-17T14:52:29.681+0800: 270387.325: [GC (Allocation Failure) 2022-01-17T14:52:29.681+0800: 270387.325: [ParNew: 295629K->34304K(309056K), 0.0322766 secs] 14511400K->14253251K(36039200K), 0.0324986 secs] [Times: user=0.41 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:29.759+0800: 270387.403: [GC (Allocation Failure) 2022-01-17T14:52:29.759+0800: 270387.403: [ParNew: 309056K->19640K(309056K), 0.0346881 secs] 14528003K->14267325K(36039200K), 0.0349107 secs] [Times: user=0.39 sys=0.01, real=0.03 secs] 
        2022-01-17T14:52:29.968+0800: 270387.611: [GC (Allocation Failure) 2022-01-17T14:52:29.968+0800: 270387.612: [ParNew: 294392K->27942K(309056K), 0.0303505 secs] 14542077K->14275627K(36039200K), 0.0305685 secs] [Times: user=0.38 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:30.595+0800: 270388.239: [GC (Allocation Failure) 2022-01-17T14:52:30.595+0800: 270388.239: [ParNew: 302694K->23359K(309056K), 0.0341396 secs] 14550379K->14282732K(36039200K), 0.0343727 secs] [Times: user=0.41 sys=0.01, real=0.04 secs] 
        2022-01-17T14:52:31.367+0800: 270389.011: [GC (Allocation Failure) 2022-01-17T14:52:31.367+0800: 270389.011: [ParNew: 298111K->21624K(309056K), 0.0305944 secs] 14557484K->14280996K(36039200K), 0.0308096 secs] [Times: user=0.39 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:32.155+0800: 270389.799: [GC (Allocation Failure) 2022-01-17T14:52:32.155+0800: 270389.799: [ParNew: 296376K->20120K(309056K), 0.0377128 secs] 14555748K->14289756K(36039200K), 0.0379315 secs] [Times: user=0.45 sys=0.00, real=0.04 secs] 
        2022-01-17T14:52:32.854+0800: 270390.498: [GC (Allocation Failure) 2022-01-17T14:52:32.854+0800: 270390.498: [ParNew: 294872K->21375K(309056K), 0.0325501 secs] 14564508K->14291011K(36039200K), 0.0327810 secs] [Times: user=0.41 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:33.481+0800: 270391.125: [GC (Allocation Failure) 2022-01-17T14:52:33.481+0800: 270391.125: [ParNew: 296127K->23717K(309056K), 0.0340077 secs] 14565763K->14295475K(36039200K), 0.0342295 secs] [Times: user=0.42 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:34.249+0800: 270391.893: [GC (Allocation Failure) 2022-01-17T14:52:34.249+0800: 270391.893: [ParNew: 298469K->21621K(309056K), 0.0279048 secs] 14570227K->14293379K(36039200K), 0.0281129 secs] [Times: user=0.35 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:34.895+0800: 270392.538: [GC (Allocation Failure) 2022-01-17T14:52:34.895+0800: 270392.539: [ParNew: 296373K->23913K(309056K), 0.0334206 secs] 14568131K->14297750K(36039200K), 0.0336289 secs] [Times: user=0.41 sys=0.01, real=0.04 secs] 
        2022-01-17T14:52:35.642+0800: 270393.286: [GC (Allocation Failure) 2022-01-17T14:52:35.642+0800: 270393.286: [ParNew: 298665K->29377K(309056K), 0.0381031 secs] 14572502K->14307228K(36039200K), 0.0383148 secs] [Times: user=0.46 sys=0.00, real=0.04 secs] 
        2022-01-17T14:52:36.280+0800: 270393.923: [GC (Allocation Failure) 2022-01-17T14:52:36.280+0800: 270393.924: [ParNew: 304129K->22129K(309056K), 0.0293410 secs] 14581980K->14307687K(36039200K), 0.0295669 secs] [Times: user=0.36 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:37.101+0800: 270394.745: [GC (Allocation Failure) 2022-01-17T14:52:37.101+0800: 270394.745: [ParNew: 296881K->20083K(309056K), 0.0333489 secs] 14582439K->14311172K(36039200K), 0.0335709 secs] [Times: user=0.41 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:37.985+0800: 270395.628: [GC (Allocation Failure) 2022-01-17T14:52:37.985+0800: 270395.629: [ParNew: 294835K->19570K(309056K), 0.0295413 secs] 14585924K->14310660K(36039200K), 0.0297525 secs] [Times: user=0.37 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:40.535+0800: 270398.178: [GC (Allocation Failure) 2022-01-17T14:52:40.535+0800: 270398.179: [ParNew: 294322K->34304K(309056K), 0.0388594 secs] 14585412K->14335475K(36039200K), 0.0390941 secs] [Times: user=0.48 sys=0.00, real=0.04 secs] 
        2022-01-17T14:52:43.469+0800: 270401.113: [GC (Allocation Failure) 2022-01-17T14:52:43.469+0800: 270401.113: [ParNew: 309056K->25693K(309056K), 0.0473806 secs] 14610227K->14361126K(36039200K), 0.0476311 secs] [Times: user=0.48 sys=0.02, real=0.04 secs] 
        2022-01-17T14:52:44.720+0800: 270402.364: [GC (Allocation Failure) 2022-01-17T14:52:44.720+0800: 270402.364: [ParNew: 300445K->16278K(309056K), 0.0293862 secs] 14635878K->14351711K(36039200K), 0.0296488 secs] [Times: user=0.37 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:45.739+0800: 270403.383: [GC (Allocation Failure) 2022-01-17T14:52:45.739+0800: 270403.383: [ParNew: 291030K->20826K(309056K), 0.0294906 secs] 14626463K->14356260K(36039200K), 0.0297362 secs] [Times: user=0.38 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:46.518+0800: 270404.162: [GC (Allocation Failure) 2022-01-17T14:52:46.518+0800: 270404.162: [ParNew: 295578K->22440K(309056K), 0.0308134 secs] 14631012K->14357874K(36039200K), 0.0310484 secs] [Times: user=0.39 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:47.290+0800: 270404.933: [GC (Allocation Failure) 2022-01-17T14:52:47.290+0800: 270404.934: [ParNew: 297192K->26537K(309056K), 0.0316360 secs] 14632626K->14361970K(36039200K), 0.0318804 secs] [Times: user=0.40 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:48.162+0800: 270405.806: [GC (Allocation Failure) 2022-01-17T14:52:48.162+0800: 270405.806: [ParNew: 301289K->28990K(309056K), 0.0334051 secs] 14636722K->14372194K(36039200K), 0.0336163 secs] [Times: user=0.41 sys=0.00, real=0.03 secs] 
        2022-01-17T14:52:49.416+0800: 270407.060: [GC (Allocation Failure) 2022-01-17T14:52:49.416+0800: 270407.060: [ParNew: 303742K->34304K(309056K), 0.0450071 secs] 14646946K->14387215K(36039200K), 0.0452268 secs] [Times: user=0.55 sys=0.00, real=0.05 secs] 
        2022-01-17T14:52:50.972+0800: 270408.616: [GC (Allocation Failure) 2022-01-17T14:52:50.973+0800: 270408.616: [ParNew: 309056K->29710K(309056K), 0.0498385 secs] 14661967K->14416854K(36039200K), 0.0500679 secs] [Times: user=0.56 sys=0.00, real=0.05 secs] 
        2022-01-17T14:52:52.146+0800: 270409.790: [GC (Allocation Failure) 2022-01-17T14:52:52.146+0800: 270409.790: [ParNew: 304462K->34304K(309056K), 0.0444381 secs] 14691606K->14432970K(36039200K), 0.0446627 secs] [Times: user=0.54 sys=0.01, real=0.05 secs] 
        2022-01-17T14:52:55.732+0800: 270413.376: [GC (Allocation Failure) 2022-01-17T14:52:55.732+0800: 270413.376: [ParNew: 309056K->21304K(309056K), 0.0428764 secs] 14707722K->14449123K(36039200K), 0.0431319 secs] [Times: user=0.51 sys=0.00, real=0.04 secs] 
        2022-01-17T14:52:58.481+0800: 270416.125: [GC (Allocation Failure) 2022-01-17T14:52:58.481+0800: 270416.125: [ParNew: 296056K->18776K(309056K), 0.0316082 secs] 14723875K->14446596K(36039200K), 0.0318183 secs] [Times: user=0.39 sys=0.01, real=0.03 secs] 
        2022-01-17T14:53:03.134+0800: 270420.778: [GC (Allocation Failure) 2022-01-17T14:53:03.134+0800: 270420.778: [ParNew: 293528K->16315K(309056K), 0.0380460 secs] 14721348K->14454349K(36039200K), 0.0382771 secs] [Times: user=0.48 sys=0.00, real=0.04 secs] 
        2022-01-17T14:53:07.352+0800: 270424.996: [GC (Allocation Failure) 2022-01-17T14:53:07.352+0800: 270424.996: [ParNew: 291067K->34304K(309056K), 0.0499326 secs] 14729101K->14480108K(36039200K), 0.0501692 secs] [Times: user=0.61 sys=0.01, real=0.05 secs] 
        2022-01-17T14:53:07.603+0800: 270425.247: [GC (Allocation Failure) 2022-01-17T14:53:07.603+0800: 270425.247: [ParNew: 309056K->26680K(309056K), 0.0525757 secs] 14754860K->14505818K(36039200K), 0.0528151 secs] [Times: user=0.60 sys=0.00, real=0.05 secs] 
        2022-01-17T14:53:09.359+0800: 270427.003: [GC (Allocation Failure) 2022-01-17T14:53:09.359+0800: 270427.003: [ParNew: 301432K->20161K(309056K), 0.0318616 secs] 14780570K->14499300K(36039200K), 0.0320932 secs] [Times: user=0.41 sys=0.00, real=0.03 secs] 
        2022-01-17T14:53:14.548+0800: 270432.192: [GC (Allocation Failure) 2022-01-17T14:53:14.548+0800: 270432.192: [ParNew: 294913K->33676K(309056K), 0.0332366 secs] 14774052K->14512814K(36039200K), 0.0334778 secs] [Times: user=0.42 sys=0.00, real=0.03 secs]

集群查询压力很大吗?64G的集群fe jvm没必要设置50多个g,16G/32G够用了

暂时查询压力不是特别大,不过我们这外提供查询,一次性行会提交十几个SQL,我一会改成16G,在观察下?

嗯。更改后重启fe可以观察下

能否调整为G1回收器?