常见 Crash / BUG / 优化 查询

97. failed to call frontend service

BE 报错

failed to call frontend service

FE 报错

2023-07-23 06:49:00,883 WARN (thrift-server-accept|85) [ThreadPoolManager$LogDiscardPolicy.rejectedExecution():178] Task com.starrocks.common.SRTThreadPoolServer$WorkerProcess@5362a7df rejected from thrift-server-pool java.util.concurrent.ThreadPoolExecutor@6a6406a6[Running, pool size = 4096, active threads = 4096, queued tasks = 0, completed tasks = 4558444]
  • Github Issue:
  • Github Fix PR:
  • Jira
  • 问题版本:
    • 2.2.0 ~ 最新
    • 2.3.0 ~ 最新
    • 2.4.0 ~ 最新
  • 修复版本:
    • 2.5.0+
  • 临时规避方法:
    • 修改 fe.conf: thrift_server_max_worker_threads=8192 (默认是4096)
    • 调小 session 变量: parallel_fragment_exec_instance_num
  • 问题原因:
    • Thrift 线程池问题,2.5专门优化过
  1. 使用 replace 函数 crash

*** Aborted at 1688752969 (unix time) try "date -d @1688752969" if you are using GNU date ***
PC: @     0x2baa64381387 __GI_raise
*** SIGABRT (@0x1229c) received by PID 74396 (TID 0x2bac0a06c700) from PID 74396; stack trace: ***
    @          0x596c182 google::(anonymous namespace)::FailureSignalHandler()
    @     0x2baa63a30630 (unknown)
    @     0x2baa64381387 __GI_raise
    @     0x2baa64382a78 __GI_abort
    @          0x2af5006 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
    @          0x7dfef76 __cxxabiv1::__terminate()
    @          0x7dfefe1 std::terminate()
    @          0x7dff134 __cxa_throw
    @          0x2af6ce7 std::__throw_length_error()
    @          0x507c233 starrocks::vectorized::regexp_replace_use_hyperscan()
    @          0x50855d6 starrocks::vectorized::StringFunctions::regexp_replace()
    @          0x3e843c7 starrocks::vectorized::VectorizedFunctionCallExpr::evaluate()
    @          0x386b7c7 starrocks::ExprContext::evaluate()
    @          0x3e83f9c starrocks::vectorized::VectorizedFunctionCallExpr::evaluate()
    @          0x386b7c7 starrocks::ExprContext::evaluate()
    @          0x3e83f9c starrocks::vectorized::VectorizedFunctionCallExpr::evaluate()
    @          0x3e4e60f starrocks::vectorized::VectorizedIfExpr<>::evaluate()
    @          0x3e4e646 starrocks::vectorized::VectorizedIfExpr<>::evaluate()
    @          0x3e4e646 starrocks::vectorized::VectorizedIfExpr<>::evaluate()
    @          0x386b85e starrocks::ExprContext::evaluate()
    @          0x2ebde44 starrocks::vectorized::ProjectNode::get_next()
    @          0x4891463 starrocks::PlanFragmentExecutor::_get_next_internal_vectorized()
    @          0x4891850 starrocks::PlanFragmentExecutor::_open_internal_vectorized()
    @          0x48939ed starrocks::PlanFragmentExecutor::open()
    @          0x47e4f4b starrocks::FragmentExecState::execute()
    @          0x47eb1d3 starrocks::FragmentMgr::exec_actual()
    @          0x49888b2 starrocks::ThreadPool::dispatch_thread()
    @          0x49833aa starrocks::Thread::supervise_thread()
    @     0x2baa63a28ea5 start_thread
    @     0x2baa6444996d __clone
    @                0x0 (unknown)
  1. Thrift rpc 申请大量内存

W0726 14:07:03.697324 16174 mem_hook.cpp:247] large memory alloc: 1347571780 b
ytes, stack:
    @          0x31a4d83  malloc
    @          0x8191535  operator new()
    @          0x27b6d7e  std::__cxx11::basic_string<>::_M_mutate()
    @          0x30af6b7  apache::thrift::protocol::TBinaryProtocolT<>::readStringBody<>()
    @          0x30af84c  apache::thrift::protocol::TVirtualProtocol<>::readMessageBegin_virt()
    @          0x3318599  apache::thrift::TDispatchProcessor::process()
    @          0x5f0a058  apache::thrift::server::TConnectedClient::run()
    @          0x5f02554  apache::thrift::server::TThreadedServer::TConnectedClientRunner::run()
    @          0x5f04d5d  apache::thrift::concurrency::Thread::threadMain()
    @          0x5eea4c6  std::thread::_State_impl<>::_M_run()
    @          0x820a430  execute_native_thread_routine
    @     0x7f77e8ebeea5  start_thread
    @     0x7f77e84d9b0d  __clone
    @              (nil)  (unknown)
  • Github Issue:
  • Github Fix PR:
  • Jira
  • 问题版本:
    • 2.2.0 ~ 最新
    • 2.3.0 ~ 2.3.14
    • 2.4.0 ~ 最新
    • 2.5.0 ~ 2.5.9
    • 3.0.0 ~ 3.0.4
  • 修复版本:
    • 2.2 未修复
    • 2.3.15+
    • 2.4 未修复
    • 2.5.10+
    • 3.0.5+
  • 临时规避方法:
  • 问题原因:
  1. Dup rpc 导致 use-after-free

*** Aborted at 1689305620 (unix time) try "date -d @1689305620" if you are using GNU date ***
PC: @     0x7f548e0ace1f (unknown)
*** SIGABRT (@0x1300e) received by PID 77838 (TID 0x7f53e3e026c0) from PID 77838; stack trace: ***
    @          0x6240182 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f548e060fc0 (unknown)
    @     0x7f548e0ace1f (unknown)
    @     0x7f548e060f16 gsignal
    @     0x7f548e04c47f abort
    @          0x2e62ca8 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
    @          0x89c1436 __cxxabiv1::__terminate()
    @          0x89c14a1 std::terminate()
    @          0x89c1b9f __cxa_pure_virtual
    @          0x56ff27b starrocks::pipeline::PipelineDriverPoller::run_internal()
    @          0x5065b1a starrocks::Thread::supervise_thread()
    @     0x7f548e0ab32a (unknown)
    @     0x7f548e129a60 (unknown)
    @                0x0 (unknown)
  • Github Issue:
  • Github Fix PR:
  • Jira
  • 问题版本:
    • 2.3.0 ~ 最新
    • 2.4.0 ~ 最新
    • 2.5.0 ~ 2.5.9
    • 3.0.0 ~ 3.0.4
  • 修复版本:
    • 2.3 未修复
    • 2.4 未修复
    • 2.5.10+
    • 3.0.5+
  • 临时规避方法:
  • 问题原因:
  1. FE 元数据目录膨胀

  • Github Issue:
  • Github Fix PR:
  • Jira
  • 问题版本:
    • 2.0.4~2.0.7
    • 2.1.5~2.1.10
    • 2.2.0~2.2.2
  • 修复版本:
    • 2.0.8+
    • 2.1.11+
    • 2.2.3+
  • 临时规避方法:
    • 将fe/lib目录下的starrocks-bdb-je-7.3.8.jar替换为http://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/je-7.3.7.jar 并重启FE
  • 问题原因:
  1. 主键模型 compaction crash

query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
*** Aborted at 1667573352 (unix time) try "date -d @1667573352" if you are using GNU date ***
PC: @          0x1e72eb0 starrocks::TabletUpdates::_apply_compaction_commit()
*** SIGSEGV (@0x0) received by PID 40683 (TID 0x7efd5f069700) from PID 0; stack trace: ***
    @          0x4820332 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7efddbe1e630 (unknown)
    @          0x1e72eb0 starrocks::TabletUpdates::_apply_compaction_commit()
    @          0x1e7425d starrocks::TabletUpdates::do_apply()
    @          0x2681635 starrocks::ThreadPool::dispatch_thread()
    @          0x267ca6a starrocks::Thread::supervise_thread()
    @     0x7efddbe16ea5 start_thread
    @     0x7efddb431b0d __clone
    @                0x0 (unknown)
  1. FE follower 内存泄漏

Insert 或 insert into select, 物化视图刷新 导致 FE follower 内存泄漏

grep LoadLabelCleaner fe.log*

如果没有日志输出,或者输出的时间已经非常老,说明已经触发了该问题。

  1. 主键模型 schema change 后 tablet state 未持久化,重启后导致不触发 Compaction, 从而 Too many versions

通过 show tablet 后, curl meta 信息,发现 tablet 一直是NOT_READY状态

"tablet_state": "PB_NOTREADY",

2.5后可以通过下面这个SQL,查看哪些 Tablet 有问题

select be_id, state, count(*) from information_schema.be_tablets group by be_id, state;
73474502        NOTREADY        130
admin execute on 10004 '
for (info in StorageEngine.get_tablet_infos(xxx, yyy)) {
    if (info.state == 0) {
        var t = StorageEngine.get_tablet(info.tablet_id)
        if (t != null) {
            t.set_tablet_state_as_int(0)
            t.save_meta()
            System.print("fix table %(info.table_id) tablet %(info.tablet_id)")
        }
    }
}
';

xxx: tablet_id

yyy: partition_id

1赞
  1. 聚合 convert_hash_set_to_chunk crash

query_id:8b0c470a-245d-11ee-873d-00163e0782a2, fragment_instance:8b0c470a-245d-11ee-873d-00163e0782b7
*** Aborted at 1689569467 (unix time) try "date -d @1689569467" if you are using GNU date ***
PC: @ 0x25f3304 starrocks::vectorized::NullableColumn::deserialize_and_append_batch()
*** SIGSEGV (@0x0) received by PID 3038 (TID 0x7f73c94e9700) from PID 0; stack trace: ***
 @ 0x3f91c22 google::(anonymous namespace)::FailureSignalHandler()
 @ 0x7f744a943235 os::Linux::chained_handler()
 @ 0x7f744a948031 JVM_handle_linux_signal
 @ 0x7f744a93b0c8 signalHandler()
 @ 0x7f7449df2630 (unknown)
 @ 0x25f3304 starrocks::vectorized::NullableColumn::deserialize_and_append_batch()
 @ 0x26d00e3 starrocks::Aggregator::convert_hash_set_to_chunk<>()
 @ 0x2a0e3b3 starrocks::pipeline::AggregateDistinctBlockingSourceOperator::pull_chunk()
 @ 0x29e0633 starrocks::pipeline::PipelineDriver::process()
 @ 0x29d6cde starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
 @ 0x2236199 starrocks::ThreadPool::dispatch_thread()
 @ 0x2231d4a starrocks::Thread::supervise_thread()
 @ 0x7f7449deaea5 start_thread
 @ 0x7f7449405b0d __clone
 @ 0x0 (unknown)
  1. Java UDTF 内存泄漏

  1. 主键模型导入数据后,select count(*) from xxx; 结果跳条

主键模型写入导致副本数据不一致

  1. 主键模型清理过期 rowset crash

*** Aborted at 1655286911 (unix time) try "date -d @1655286911" if you are using GNU date ***
PC: @          0x1a6f2cc starrocks::TabletUpdates::_debug_version_info()
*** SIGSEGV (@0x0) received by PID 5538 (TID 0x7ff9db8a8700) from PID 0; stack trace: ***
    @          0x3f6fad2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7ffa892a0630 (unknown)
    @          0x1a6f2cc starrocks::TabletUpdates::_debug_version_info()
    @          0x1a79d23 starrocks::TabletUpdates::remove_expired_versions()
    @          0x1a43950 starrocks::TabletManager::start_trash_sweep()
    @          0x1a15267 starrocks::StorageEngine::_start_trash_sweep()
    @          0x1bf4e79 starrocks::StorageEngine::_garbage_sweeper_thread_callback()
    @          0x59ed4d0 execute_native_thread_routine
    @     0x7ffa89298ea5 start_thread
    @     0x7ffa888b38dd __clone
    @                0x0 (unknown)
  1. tablet updates is in error state 后,主键模型一直 clone 失败占用大量磁盘空间

tablet updates is in error state
  • Github Issue:
  • Github Fix PR:
  • 问题版本:
    • 2.2.0 ~ latest
    • 2.3.0 ~ latest
    • 2.4.0 ~ latest
    • 2.5.0 ~ 2.5.10
    • 3.0.0 ~ 3.0.4
  • 修复版本:
    • 2.2 未修复
    • 2.3 未修复
    • 2.4 未修复
    • 2.5.11+
    • 3.0.5+
  • 临时修复方法:
    • 用meta_tool.sh 将有问题的 tablet 副本删除
  1. FE 启动失败: failed to load journal type 118

2023-08-16 09:11:47,262 WARN (leaderCheckpointer|130) [GlobalStateMgr.replayJournalInner():2012] catch exception when replaying 9748222,
com.starrocks.journal.JournalInconsistentException: failed to load journal type 118
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:981) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2001) [starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:1953) [starrocks-fe.jar:?]
        at com.starrocks.leader.Checkpoint.replayAndGenerateGlobalStateMgrImage(Checkpoint.java:215) [starrocks-fe.jar:?]
        at com.starrocks.leader.Checkpoint.runAfterCatalogReady(Checkpoint.java:106) [starrocks-fe.jar:?]
        at com.starrocks.common.util.LeaderDaemon.runOneCycle(LeaderDaemon.java:73) [starrocks-fe.jar:?]
        at com.starrocks.common.util.Daemon.run(Daemon.java:115) [starrocks-fe.jar:?]
Caused by: java.lang.NullPointerException
        at com.starrocks.lake.StarOSAgent.getServiceId(StarOSAgent.java:101) ~[starrocks-fe.jar:?]
        at com.starrocks.lake.StarOSAgent.prepare(StarOSAgent.java:94) ~[starrocks-fe.jar:?]
        at com.starrocks.lake.StarOSAgent.getShardReplicas(StarOSAgent.java:393) ~[starrocks-fe.jar:?]
        at com.starrocks.lake.StarOSAgent.getBackendIdsByShard(StarOSAgent.java:444) ~[starrocks-fe.jar:?]
        at com.starrocks.lake.LakeTablet.getBackendIds(LakeTablet.java:88) ~[starrocks-fe.jar:?]
        at com.starrocks.server.LocalMetastore.truncateTableInternal(LocalMetastore.java:4833) ~[starrocks-fe.jar:?]
        at com.starrocks.server.LocalMetastore.replayTruncateTable(LocalMetastore.java:4862) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayTruncateTable(GlobalStateMgr.java:3520) ~[starrocks-fe.jar:?]
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:574) ~[starrocks-fe.jar:?]
        ... 6 more
  1. SinkBuffer::_try_to_send_rpc crash

PC: @          0x3fbffa2 starrocks::pipeline::SinkBuffer::_try_to_send_rpc()
*** SIGSEGV (@0x0) received by PID 769 (TID 0x7f489efff700) from PID 0; stack trace: ***
    @          0x487c722 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f49c5619630 (unknown)
    @          0x3fbffa2 starrocks::pipeline::SinkBuffer::_try_to_send_rpc()
    @          0x3fc0742 starrocks::pipeline::SinkBuffer::add_request()
    @          0x3fb4357 starrocks::pipeline::ExchangeSinkOperator::Channel::send_one_chunk()
    @          0x3fb4cb4 starrocks::pipeline::ExchangeSinkOperator::Channel::_close_internal()
    @          0x3fb4d46 starrocks::pipeline::ExchangeSinkOperator::Channel::close()
    @          0x3fb5137 starrocks::pipeline::ExchangeSinkOperator::set_finishing()
    @          0x1e5c1a7 starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
    @          0x1e5cdc5 starrocks::pipeline::PipelineDriver::process()
    @          0x3ce8805 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x37adbf5 starrocks::ThreadPool::dispatch_thread()
    @          0x37a8a9a starrocks::Thread::supervise_thread()
    @     0x7f49c5611ea5 start_thread
    @     0x7f49c4c2cb0d __clone
    @                0x0 (unknown)
  1. 物化视图 refresh 导致 FE 启动失败

2023-08-21 21:48:10,983 ERROR (UNKNOWN 10.8.1.81_9010_1678173058506(-1)|1) [StarRocksFE.start():170] StarRocksFE start failed
java.lang.IllegalArgumentException: capacity < 0: (-2038667263 < 0)
        at java.nio.Buffer.createCapacityException(Buffer.java:256) ~[?:?]
        at java.nio.CharBuffer.allocate(CharBuffer.java:347) ~[?:?]
        at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:807) ~[?:?]
        at com.starrocks.common.io.Text.decode(Text.java:342) ~[starrocks-fe.jar:?]
        at com.starrocks.common.io.Text.decode(Text.java:321) ~[starrocks-fe.jar:?]
        at com.starrocks.common.io.Text.readString(Text.java:396) ~[starrocks-fe.jar:?]
        at com.starrocks.scheduler.TaskManager.loadTasks(TaskManager.java:518) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.loadImage(GlobalStateMgr.java:1331) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.initialize(GlobalStateMgr.java:955) ~[starrocks-fe.jar:?]
        at com.starrocks.StarRocksFE.start(StarRocksFE.java:116) ~[starrocks-fe.jar:?]
        at com.starrocks.StarRocksFE.main(StarRocksFE.java:68) ~[starrocks-fe.jar:?]
  1. Broker Load Crash

BrokerLoad 使用表达式函数,导致 Crash

*** Aborted at 1692242665 (unix time) try "date -d @1692242665" if you are using GNU date ***
PC: @     0x7f5c1f10158e __memcpy_ssse3_back
*** SIGSEGV (@0x7f3ab324a000) received by PID 841389 (TID 0x7f46650f7700) from PID 18446744072420106240; stack trace: ***
    @          0x576f1a2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f5c205bee4b os::Linux::chained_handler()
    @     0x7f5c205c3a3d JVM_handle_linux_signal
    @     0x7f5c205b67e8 signalHandler()
    @     0x7f5c1fa96630 (unknown)
    @     0x7f5c1f10158e __memcpy_ssse3_back
    @          0x2c2c1e2 std::vector<>::_M_range_insert<>()
    @          0x2c2fca4 starrocks::vectorized::BinaryColumnBase<>::append()
    @          0x4d13272 starrocks::vectorized::Chunk::append()
    @          0x4036a69 starrocks::ChunkPipelineAccumulator::push()
    @          0x2f132c1 starrocks::pipeline::ConnectorChunkSource::_read_chunk()
    @          0x2efd23c starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
    @          0x2c7c274 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
    @          0x2c8d2dd starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x4817fdd starrocks::ThreadPool::dispatch_thread()
    @          0x4812d6a starrocks::Thread::supervise_thread()
    @     0x7f5c1fa8eea5 start_thread
    @     0x7f5c1f0a9b0d __clone
    @                0x0 (unknown)
*** Aborted at 1693273861 (unix time) try "date -d @1693273861" if you are using GNU date ***
PC: @          0x489c426 starrocks::BitmapValue::BitmapValue()
*** SIGSEGV (@0x0) received by PID 45 (TID 0x7f0903bec700) from PID 0; stack trace: ***
    @          0x593cd22 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f0a32a407fb os::Linux::chained_handler()
    @     0x7f0a32a454bd JVM_handle_linux_signal
    @     0x7f0a32a37e78 signalHandler()
    @     0x7f0a31ec8630 (unknown)
    @          0x489c426 starrocks::BitmapValue::BitmapValue()
    @          0x4e9b627 std::vector<>::_M_realloc_insert<>()
    @          0x4e9b732 starrocks::vectorized::ObjectColumn<>::append()
    @          0x4e9b8b5 starrocks::vectorized::ObjectColumn<>::append_selective()
    @          0x4e7416a starrocks::vectorized::Chunk::append_selective()
    @          0x4f7c224 starrocks::pipeline::ExchangeSinkOperator::Channel::add_rows_selective()
    @          0x4f7dac7 starrocks::pipeline::ExchangeSinkOperator::push_chunk()
    @          0x2d0fe86 starrocks::pipeline::PipelineDriver::process()
    @          0x4f6fc13 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x49656a2 starrocks::ThreadPool::dispatch_thread()
    @          0x496019a starrocks::Thread::supervise_thread()
    @     0x7f0a31ec0ea5 start_thread
    @     0x7f0a314dbb0d __clone
    @                0x0 (unknown)
  1. 使用了 Not Like / Like 等条件,将Range拆的太碎,导致查询结果不对

  1. FE 启动报 duplicate key 错误(物化视图导致)

com.google.gson.JsonSyntaxException: duplicate key: 7417179
        at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:190) ~[spark-dpp-1.0.0.jar:?]
        at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:145) ~[spark-dpp-1.0.0.jar:?]
        at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:641) ~[starrocks-fe.jar:?]
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:131) ~[spark-dpp-1.0.0.jar:?]
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:222) ~[spark-dpp-1.0.0.jar:?]
        at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:641) ~[starrocks-fe.jar:?]
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:131) ~[spark-dpp-1.0.0.jar:?]
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:222) ~[spark-dpp-1.0.0.jar:?]
        at com.starrocks.persist.gson.GsonUtils$ProcessHookTypeAdapterFactory$1.read(GsonUtils.java:641) ~[starrocks-fe.jar:?]
        at com.google.gson.Gson.fromJson(Gson.java:963) ~[spark-dpp-1.0.0.jar:?]
        at com.google.gson.Gson.fromJson(Gson.java:928) ~[spark-dpp-1.0.0.jar:?]
        at com.google.gson.Gson.fromJson(Gson.java:877) ~[spark-dpp-1.0.0.jar:?]
        at com.google.gson.Gson.fromJson(Gson.java:848) ~[spark-dpp-1.0.0.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:848) ~[spark-dpp-1.0.0.jar:?]
        at com.starrocks.persist.ChangeMaterializedViewRefreshSchemeLog.read(ChangeMaterializedViewRefreshSchemeLog.java:71) ~[starrocks-fe.jar:?]
        at com.starrocks.journal.JournalEntity.readFields(JournalEntity.java:358) ~[starrocks-fe.jar:?]
        at com.starrocks.journal.bdbje.BDBJournalCursor.deserializeData(BDBJournalCursor.java:251) ~[starrocks-fe.jar:?]
        at com.starrocks.journal.bdbje.BDBJournalCursor.next(BDBJournalCursor.java:295) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2137) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:2097) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.transferToLeader(GlobalStateMgr.java:1142) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr.access$100(GlobalStateMgr.java:324) ~[starrocks-fe.jar:?]
        at com.starrocks.server.GlobalStateMgr$1.transferToLeader(GlobalStateMgr.java:721) ~[starrocks-fe.jar:?]
        at com.starrocks.ha.StateChangeExecutor.runOneCycle(StateChangeExecutor.java:103) ~[starrocks-fe.jar:?]
        at com.starrocks.common.util.Daemon.run(Daemon.java:115) ~[starrocks-fe.jar:?]
  1. BE 灰度过程中,内存暴涨

W0821 21:18:12.877478 64468 mem_hook.cpp:254] large memory alloc: 1988991648 bytes, stack:
    @          0x48a322b  malloc
    @          0x7df6b05  operator new()
    @          0x4822dd6  starrocks::QueryStatistics::merge_pb()
    @          0x4822f6f  starrocks::QueryStatisticsRecvr::insert()
    @          0x479ca34  starrocks::DataStreamMgr::transmit_chunk()
    @          0x52130d8  starrocks::PInternalServiceImplBase<>::transmit_chunk()
    @          0x5b8a2ad  brpc::policy::ProcessRpcRequest()
    @          0x5c6aa57  brpc::ProcessInputMessage()
    @          0x5ab4bef  bthread::TaskGroup::task_runner()
    @          0x5bf9151  bthread_make_fcontext
W0821 21:17:30.210508 62349 mem_hook.cpp:254] large memory alloc: 1091033952 bytes, stack:
    @          0x48a322b  malloc
    @          0x7df6b05  operator new()
    @          0x48229f6  starrocks::QueryStatistics::merge()
    @          0x4822ad2  starrocks::QueryStatisticsRecvr::aggregate()
    @          0x2d016b4  starrocks::pipeline::QueryContext::intermediate_query_statistic()
    @          0x47d124d  starrocks::RuntimeState::intermediate_query_statistic()
    @          0x4f9d87b  starrocks::pipeline::ExchangeSinkOperator::Channel::send_one_chunk()
    @          0x4f9e397  starrocks::pipeline::ExchangeSinkOperator::Channel::_close_internal()
    @          0x4f9e46c  starrocks::pipeline::ExchangeSinkOperator::Channel::close()
    @          0x4f9e849  starrocks::pipeline::ExchangeSinkOperator::set_finishing()
    @          0x2d1c9e9  starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
    @          0x2d1da2f  starrocks::pipeline::PipelineDriver::process()
    @          0x4f91993  starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x4983a52  starrocks::ThreadPool::dispatch_thread()
    @          0x497e54a  starrocks::Thread::supervise_thread()
    @     0x7f36e1153ea5  start_thread
    @     0x7f36e076eb0d  __clone
    @              (nil)  (unknown)
  • Github Issue:
  • Github Fix PR:
  • 问题版本:
    • 2.3.0 ~ 2.3.16
    • 2.4.0 ~ 2.4.5
    • 2.5.10
    • 3.0.0 ~ 3.0.5
    • 3.1.0 ~ 3.1.1
  • 修复版本:
    • 2.3.17+
    • 2.4.6+
    • 2.5.11+
    • 3.0.6+
    • 3.1.2+
  • 临时修复方法:
    • BE 全部升级后,会恢复