为了更快的定位您的问题,请提供以下信息,谢谢
【详述】一开始 be 是正常启动的,后面突然挂掉,没有明显错误日志,但是 be.out 中有一些异常日志, 然后重启 be 也是可以成功重启的。
【背景】集群做了扩容和缩容,突然挂的机器是最近扩容进来的机器。
【业务影响】
【是否存算分离】
【StarRocks版本】2.5.13
【集群规模】3fe(1 follower+2observer)+ 5be
【机器信息】CPU虚拟核/内存/网卡,FE:8C/32G/万兆 BE: 64C128G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,18069783113
【附件】
be.out
*** Check failure stack trace: ***
2.5.13 RELEASE (build a3b58a0)
query_id:c4e60154-c59d-11ef-a938-fa163e82da79, fragment_instance:c4e60154-c59d-11ef-a938-fa163e82da8f
tracker:process consumption: 36604806696
tracker:query_pool consumption: 2102827318
tracker:load consumption: 286560
tracker:metadata consumption: 450611319
tracker:tablet_metadata consumption: 81043581
tracker:rowset_metadata consumption: 42589056
tracker:segment_metadata consumption: 56213748
tracker:column_metadata consumption: 270764934
tracker:tablet_schema consumption: 1809133
tracker:segment_zonemap consumption: 37111574
tracker:short_key_index consumption: 15485916
tracker:column_zonemap_index consumption: 96521454
tracker:ordinal_index consumption: 104766520
tracker:bitmap_index consumption: 2400
tracker:bloom_filter_index consumption: 118256
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 5254649054
tracker:page_cache consumption: 24031125312
tracker:update consumption: 540380
tracker:chunk_allocator consumption: 2146419080
tracker:clone consumption: 0
tracker:consistency consumption: 0
*** Check failure stack trace: ***
*** Aborted at 1735446687 (unix time) try “date -d @1735446687” if you are using GNU date ***
PC: @ 0x7f100b65b387 __GI_raise
*** SIGABRT (@0xd368) received by PID 54120 (TID 0x7f0f4c63d700) from PID 54120; stack trace: ***
@ 0x5b97b22 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f100c110630 (unknown)
@ 0x7f100b65b387 __GI_raise
@ 0x7f100b65ca78 __GI_abort
@ 0x2cf47be starrocks::failure_function()
@ 0x5b8b4fd google::LogMessage::Fail()
@ 0x5b8d96f google::LogMessage::SendToLog()
@ 0x5b8b04e google::LogMessage::Flush()
@ 0x5b8df79 google::LogMessageFatal::~LogMessageFatal()
@ 0x50bc2a2 ZN9starrocks20type_dispatch_columnINS_10vectorized13ColumnBuilderEJNS_14TypeDescriptorEmEEEDaNS_13PrimitiveTypeET_DpT0
@ 0x50b976b starrocks::vectorized::ColumnHelper::create_column()
@ 0x53fbc3e starrocks::serde::ProtobufChunkDeserializer::deserialize()
@ 0x4a90f72 starrocks::DataStreamRecvr::SenderQueue::_deserialize_chunk()
@ 0x4a93e5a starrocks::DataStreamRecvr::PipelineSenderQueue::get_chunk()
@ 0x4a09593 starrocks::DataStreamRecvr::get_chunk_for_pipeline()
@ 0x302793a starrocks::pipeline::ExchangeSourceOperator::pull_chunk()
@ 0x2d906c0 starrocks::pipeline::PipelineDriver::process()
@ 0x51add6a starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x4b968f2 starrocks::ThreadPool::dispatch_thread()
@ 0x4b9138a starrocks::supervise_thread()
@ 0x7f100c108ea5 start_thread
@ 0x7f100b723b0d __clone
@ 0x0 (unknown)
已知bug,2.5最新版本已经修复,建议升级2.5或者3.1最新版本。 临时解决办法:
set global disable_join_reorder=true; 另外3个fe的话建议选择3个follower
全局设置了这个,对业务使用有什么影响吗, 比如需要避免使用什么样的 sql 之类的?