【详述】问题详细描述
3台fe同时出现OOM,导致服务无法连接
【背景】做过哪些操作?
【业务影响】服务无法访问
【StarRocks版本】例如:2.4.0
【集群规模】3fe(1 follower+2observer)+8be(fe与be独立部署)
【机器信息】CPU虚拟核/内存/网卡,be:32C/128G/万兆 fe: 8c/32G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群12-金谡-jinsu@moojing.com
【附件】
- fe.log/beINFO/相应截图
[2023-01-30 06:47:38] failed to read after retried 2 times! key = 36738032, db = CloseSafeDatabase{db=}Got EnvironmentFailureException and the current ReplicatedEnvironment is invalid, will exit.
com.sleepycat.je.EnvironmentFailureException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.3.7) 10.19.16.54_9010_1672039015714(3):/data/starrocks_240/meta/bdb java.lang.OutOfMemoryError: Java heap space UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1775)
at com.sleepycat.je.Environment.checkOpen(Environment.java:2473)
at com.sleepycat.je.Environment.getDatabaseNames(Environment.java:2245)
at com.starrocks.journal.bdbje.BDBEnvironment.getDatabaseNamesWithPrefix(BDBEnvironment.java:501)
at com.starrocks.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:132)
at com.starrocks.server.GlobalStateMgr.getMaxJournalId(GlobalStateMgr.java:2541)
at com.starrocks.metric.MetricRepo$4.getValue(MetricRepo.java:203)
at com.starrocks.metric.MetricRepo$4.getValue(MetricRepo.java:200)
at com.starrocks.metric.PrometheusMetricVisitor.visit(PrometheusMetricVisitor.java:181)
at com.starrocks.metric.MetricRepo.getMetric(MetricRepo.java:504)
at com.starrocks.http.rest.MetricsAction.execute(MetricsAction.java:93)
at com.starrocks.http.rest.RestBaseAction.handleRequest(RestBaseAction.java:59)
at com.starrocks.http.HttpServerHandler.channelRead(HttpServerHandler.java:69)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.3.7) 10.19.16.54_9010_1672039015714(3):/data/starrocks_240/meta/bdb java.lang.OutOfMemoryError: Java heap space UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:205)
at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:236)
at java.base/java.lang.Thread.dispatchUncaughtException(Thread.java:1997)
Caused by: java.lang.OutOfMemoryError: Java heap space
com.sleepycat.je.EnvironmentFailureException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.3.7) 10.19.16.54_9010_1672039015714(3):/data/starrocks_240/meta/bdb java.lang.OutOfMemoryError: Java heap space UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1775)
at com.sleepycat.je.Environment.checkOpen(Environment.java:2473)
at com.sleepycat.je.Environment.getDatabaseNames(Environment.java:2245)
at com.starrocks.journal.bdbje.BDBEnvironment.getDatabaseNamesWithPrefix(BDBEnvironment.java:501)
at com.starrocks.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:132)
at com.starrocks.server.GlobalStateMgr.getMaxJournalId(GlobalStateMgr.java:2541)
at com.starrocks.metric.MetricRepo$4.getValue(MetricRepo.java:203)
at com.starrocks.metric.MetricRepo$4.getValue(MetricRepo.java:200)
at com.starrocks.metric.PrometheusMetricVisitor.visit(PrometheusMetricVisitor.java:181)
at com.starrocks.metric.MetricRepo.getMetric(MetricRepo.java:504)
at com.starrocks.http.rest.MetricsAction.execute(MetricsAction.java:93)
at com.starrocks.http.rest.RestBaseAction.handleRequest(RestBaseAction.java:59)
at com.starrocks.http.HttpServerHandler.channelRead(HttpServerHandler.java:69)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.3.7) 10.19.16.54_9010_1672039015714(3):/data/starrocks_240/meta/bdb java.lang.OutOfMemoryError: Java heap space UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:205)
at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:236)
at java.base/java.lang.Thread.dispatchUncaughtException(Thread.java:1997)
Caused by: java.lang.OutOfMemoryError: Java heap space
com.sleepycat.je.EnvironmentFailureException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.3.7) 10.19.16.54_9010_1672039015714(3):/data/starrocks_240/meta/bdb java.lang.OutOfMemoryError: Java heap space UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1775)
at com.sleepycat.je.Environment.checkOpen(Environment.java:2473)
at com.sleepycat.je.Environment.getDatabaseNames(Environment.java:2245)
at com.starrocks.journal.bdbje.BDBEnvironment.getDatabaseNamesWithPrefix(BDBEnvironment.java:501)
at com.starrocks.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:132)
at com.starrocks.server.GlobalStateMgr.getMaxJournalId(GlobalStateMgr.java:2541)
at com.starrocks.metric.MetricRepo$4.getValue(MetricRepo.java:203)
at com.starrocks.metric.MetricRepo$4.getValue(MetricRepo.java:200)
at com.starrocks.metric.PrometheusMetricVisitor.visit(PrometheusMetricVisitor.java:181)
at com.starrocks.metric.MetricRepo.getMetric(MetricRepo.java:504)
at com.starrocks.http.rest.MetricsAction.execute(MetricsAction.java:93)
at com.starrocks.http.rest.RestBaseAction.handleRequest(RestBaseAction.java:59)
at com.starrocks.http.HttpServerHandler.channelRead(HttpServerHandler.java:69)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.3.7) 10.19.16.54_9010_1672039015714(3):/data/starrocks_240/meta/bdb java.lang.OutOfMemoryError: Java heap space UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:205)
at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:236)
at java.base/java.lang.Thread.dispatchUncaughtException(Thread.java:1997)
Caused by: java.lang.OutOfMemoryError: Java heap space
using java version 11
-Dlog4j2.formatMsgNoLookups=true -Xmx16384m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xlog:gc*:/data/var/log/starrocks/log/fe.gc.log.20230130-082003:time