be节点异常掉线

3.2.16能通過參數解決這個問題嗎?

哪个问题?

我现在运维的集群也是,网络流量压力大时,一些查询会报错:The server is overcrowded transmit chunk rpc failed
sr版本是sr 3.2.16
错误日志:
transmit chunk rpc failed [dest_instance_id=677fa1df-d62f-11f0-96fa-005056ab3771] [dest=xx.xx.xx.xx:8060] detail:brpc failed, error=The server is overcrowded, error_text=[E1011]The server is overcrowded @xx.xx.xx.xx:8060 [R1][E1011]The server is overcrowded @xx.xx.xx.xx:8060 [R2][E1011]The server is overcrowded @xx.xx.xx.xx:8060 [R3][E1011]The server is overcrowded @xx.xx.xx.xx:8060

be.conf brpc_query_ignore_overcrowded=true 可以规避这个问题

大佬,救急一下。我刚把版本直接升级至3.5.12了,但是中午又出现be节点宕机了。be.out内容是:
3.5.12 RELEASE (build a2e4b58 distro centos arch x86_64)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000, plan_node_id:-1
*** Aborted at 1772514117 (unix time) try “date -d @1772514117” if you are using GNU date ***
PC: @ 0x84bf7ac starrocks::RowsetWriter::build()
*** SIGSEGV (@0x100) received by PID 29083 (TID 0x153195dfd700) LWP(29251) from PID 256; stack trace: ***
@ 0x1531b0f4120b __pthread_once_slow
@ 0xbe2b194 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x1531b0f4a630 (/usr/lib64/libpthread-2.17.so+0xf62f)
@ 0x84bf7ac starrocks::RowsetWriter::build()
@ 0x84c8ea4 starrocks::HorizontalRowsetWriter::build()
@ 0x7b489f6 starrocks::DeltaWriter::commit()
@ 0x7767cc3 starrocks::SegmentFlushTask::run()
@ 0x45d2a07 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x1531b0f42ea5 start_thread
@ 0x1531aeb1fb0d __clone

同时升级至3.5.12后,原来的数据同步任务大多数出现延迟,从监控上看速度太慢了


未升级之前数据同步任务都正常,未出现延迟现象。数据同步任务是实时抽取mysql全库数据,通过stream load方式入sr。

宕机时间点前be.warning中日志:
W20260303 12:44:57.612310 23297845151488 input_messenger.cpp:377] Fail to read from Socket{id=515396076439 fd=3194 addr=x.x.x.151:54004:8060} (0x15319f477040): Connection reset by peer [104]
W20260303 12:44:57.612329 23297899783936 input_messenger.cpp:377] Fail to read from Socket{id=592705496742 fd=1057 addr=x.x.x.151:53992:8060} (0x152e4307f700): Connection reset by peer [104]
W20260303 12:44:57.716589 23302712055552 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=515396076439 fd=3194 addr=x.x.x.151:54004:8060} (0x15319f477040): Connection reset by peer [104]
W20260303 12:44:57.716603 23302705755904 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=592705496742 fd=1057 addr=x.x.x.151:53992:8060} (0x152e4307f700): Connection reset by peer [104]
E20260303 12:44:57.716659 23302712055552 segment_flush_executor.cpp:111] failed to flush segment, txn_id: 101676621, tablet id: 85841324, status: Internal error: Fail to close delta writer. tablet_id: 85841324, state: kCommitted
W20260303 12:45:22.733132 23299457353472 mem_hook.cpp:117] large memory alloc, query_id:c32cbb02-16bb-11f1-a9cb-a0369fd8ced8 instance: c32cbb02-16bb-11f1-a9cb-a0369fd8cfc9 acquire:1713746268 bytes, is_bad_alloc_caught: 1, stack:
@ 0x45a6f71 starrocks::get_stack_traceabi:cxx11
@ 0x44fa690 malloc
@ 0x43d42eb starrocks::AllocatorFactory<starrocks::Allocator, starrocks::MemHookAllocator>::checked_alloc(unsigned long)
@ 0x4ddd156 void std::vector<unsigned char, starrocks::raw::RawAllocator<unsigned char, 16ul, starrocks::ColumnAllocator > >::_M_range_insert<unsigned char const*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, starrocks::raw::R@
@ 0x4df13f0 starrocks::BinaryColumnBase::append(starrocks::Column const&, unsigned long, unsigned long)
@ 0x500b505 starrocks::JoinHashTable::append_chunk(std::shared_ptrstarrocks::Chunk const&, std::vector<starrocks::Cowstarrocks::Column::ImmutPtrstarrocks::Column, std::allocator<starrocks::Cowstarrocks::Column::ImmutPtrstarrocks::Column > > const&)
@ 0x57bdcd2 starrocks::SingleHashJoinBuilder::do_append_chunk(std::shared_ptrstarrocks::Chunk const&)
@ 0x57be00c starrocks::AdaptivePartitionHashJoinBuilder::do_append_chunk(std::shared_ptrstarrocks::Chunk const&)
@ 0x57a9b8c starrocks::HashJoiner::append_chunk_to_ht(std::shared_ptrstarrocks::Chunk const&)
@ 0x5448f5b starrocks::pipeline::HashJoinBuildOperator::push_chunk(starrocks::RuntimeState*, std::shared_ptrstarrocks::Chunk const&)
@ 0x53e99ba starrocks::pipeline::PipelineDriver::process(starrocks::RuntimeState*, int)
@ 0x58839ed starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x45d2117 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x1531b0f42ea5 start_thread
@ 0x1531aeb1fb0d __clone
W20260303 12:45:58.370167 23297864062720 input_messenger.cpp:377] Fail to read from Socket{id=352187324809 fd=6191 addr=x.x.x.217:49656:8060} (0x15241d10fd40): Connection reset by peer [104]
W20260303 12:45:58.377517 23299463657216 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=352187324809 fd=6191 addr=x.x.x.217:49656:8060} (0x15241d10fd40): Connection reset by peer [104]
W20260303 12:52:57.806128 23298057897728 input_messenger.cpp:377] Fail to read from Socket{id=317827590203 fd=3489 addr=x.x.x.151:5226:8060} (0x150c598cb100): Connection reset by peer [104]
W20260303 12:52:57.806298 23298047391488 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=317827590203 fd=3489 addr=x.x.x.151:5226:8060} (0x150c598cb100): Connection reset by peer [104]
W20260303 12:54:43.592517 23157617084160 agent_server.cpp:603] fail to make_snapshot. tablet_id:13966543 msg:Not found: get_rowsets_for_snapshot: no version to clone tablet:13966543 #version:147 [333777 333890@146 333890] #pending:0 request_version:333891,
W20260303 12:55:03.828380 23297868265216 input_messenger.cpp:377] Fail to read from Socket{id=300647736179 fd=4057 addr=x.x.x.151:8060:6970} (0x150971337f40): Connection timed out [110]
W20260303 12:55:03.828394 23297864062720 input_messenger.cpp:377] Fail to read from Socket{id=481036344752 fd=4343 addr=x.x.x.151:8060:7266} (0x15263ba09f80): Connection timed out [110]
W20260303 12:55:22.982182 23299478365952 mem_hook.cpp:117] large memory alloc, query_id:29037b06-16bd-11f1-a9cb-a0369fd8ced8 instance: 29037b06-16bd-11f1-a9cb-a0369fd8cfca acquire:1547213724 bytes, is_bad_alloc_caught: 1, stack:
@ 0x45a6f71 starrocks::get_stack_traceabi:cxx11
@ 0x44fa690 malloc
@ 0x43d42eb starrocks::AllocatorFactory<starrocks::Allocator, starrocks::MemHookAllocator>::checked_alloc(unsigned long)
@ 0x4ddd156 void std::vector<unsigned char, starrocks::raw::RawAllocator<unsigned char, 16ul, starrocks::ColumnAllocator > >::_M_range_insert<unsigned char const*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, starrocks::raw::R@
@ 0x4df13f0 starrocks::BinaryColumnBase::append(starrocks::Column const&, unsigned long, unsigned long)
@ 0x500b505 starrocks::JoinHashTable::append_chunk(std::shared_ptrstarrocks::Chunk const&, std::vector<starrocks::Cowstarrocks::Column::ImmutPtrstarrocks::Column, std::allocator<starrocks::Cowstarrocks::Column::ImmutPtrstarrocks::Column > > const&)
@ 0x57bdcd2 starrocks::SingleHashJoinBuilder::do_append_chunk(std::shared_ptrstarrocks::Chunk const&)
@ 0x57be00c starrocks::AdaptivePartitionHashJoinBuilder::do_append_chunk(std::shared_ptrstarrocks::Chunk const&)
@ 0x57a9b8c starrocks::HashJoiner::append_chunk_to_ht(std::shared_ptrstarrocks::Chunk const&)
@ 0x5448f5b starrocks::pipeline::HashJoinBuildOperator::push_chunk(starrocks::RuntimeState*, std::shared_ptrstarrocks::Chunk const&)
@ 0x53e99ba starrocks::pipeline::PipelineDriver::process(starrocks::RuntimeState*, int)
@ 0x58839ed starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x45d2117 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x1531b0f42ea5 start_thread
@ 0x1531aeb1fb0d __clone
W20260303 12:55:42.836359 23297914492672 input_messenger.cpp:377] Fail to read from Socket{id=352187337669 fd=3243 addr=x.x.x.217:8060:52708} (0x151a4b0b6040): Connection timed out [110]
W20260303 12:55:42.836359 23297855657728 input_messenger.cpp:377] Fail to read from Socket{id=214748392364 fd=4246 addr=x.x.x.217:8060:52902} (0x151debc38580): Connection timed out [110]
W20260303 12:55:42.844364 23297857758976 input_messenger.cpp:377] Fail to read from Socket{id=369367210696 fd=4551 addr=x.x.x.217:8060:52526} (0x15044713cb40): Connection timed out [110]
W20260303 12:55:42.844379 23297834645248 input_messenger.cpp:377] Fail to read from Socket{id=206158454411 fd=4092 addr=x.x.x.217:8060:52108} (0x151529e5fc00): Connection timed out [110]
W20260303 12:55:42.844391 23297880872704 input_messenger.cpp:377] Fail to read from Socket{id=240518193621 fd=4481 addr=x.x.x.217:8060:53070} (0x14ec54cb89c0): Connection timed out [110]
W20260303 12:55:42.844369 23298062100224 input_messenger.cpp:377] Fail to read from Socket{id=292057795694 fd=4322 addr=x.x.x.217:8060:52944} (0x151982a58c40): Connection timed out [110]
W20260303 12:55:42.852361 23297885075200 input_messenger.cpp:377] Fail to read from Socket{id=154618845618 fd=4283 addr=x.x.x.217:8060:52926} (0x151be3173cc0): Connection timed out [110]
W20260303 12:55:42.860354 23297887176448 input_messenger.cpp:377] Fail to read from Socket{id=472446407348 fd=4177 addr=x.x.x.217:8060:52190} (0x152d15d05ec0): Connection timed out [110]
W20260303 12:55:42.900395 23297901885184 input_messenger.cpp:377] Fail to read from Socket{id=266287991386 fd=3854 addr=x.x.x.217:8060:51848} (0x150d2bd87700): Connection timed out [110]
W20260303 12:55:42.908430 23297897682688 input_messenger.cpp:377] Fail to read from Socket{id=584115559175 fd=4268 addr=x.x.x.217:8060:52914} (0x15297aff1700): Connection timed out [110]
W20260303 12:55:42.909350 23298066302720 input_messenger.cpp:377] Fail to read from Socket{id=309237671034 fd=4448 addr=x.x.x.217:8060:53038} (0x151077cc1280): Connection timed out [110]
W20260303 12:55:42.909368 23297906087680 input_messenger.cpp:377] Fail to read from Socket{id=206158457037 fd=4125 addr=x.x.x.217:8060:52822} (0x151427e55200): Connection timed out [110]
W20260303 12:55:42.916371 23297899783936 input_messenger.cpp:377] Fail to read from Socket{id=575525631668 fd=3630 addr=x.x.x.217:8060:51620} (0x15193ab61ec0): Connection timed out [110]
W20260303 12:55:42.916377 23298051593984 input_messenger.cpp:377] Fail to read from Socket{id=128849048412 fd=4514 addr=x.x.x.217:8060:52502} (0x1523226211c0): Connection timed out [110]
W20260303 12:55:42.924357 23298057897728 input_messenger.cpp:377] Fail to read from Socket{id=352187341061 fd=4391 addr=x.x.x.217:8060:52410} (0x1515276f5740): Connection timed out [110]
W20260303 12:55:42.932360 23297857758976 input_messenger.cpp:377] Fail to read from Socket{id=292057795089 fd=4392 addr=x.x.x.217:8060:52982} (0x14ec91a1f040): Connection timed out [110]
W20260303 12:55:42.957356 23297855657728 input_messenger.cpp:377] Fail to read from Socket{id=481036346112 fd=4564 addr=x.x.x.217:8060:53088} (0x151d0b681600): Connection timed out [110]
W20260303 12:55:42.957368 23297914492672 input_messenger.cpp:377] Fail to read from Socket{id=360777266711 fd=4105 addr=x.x.x.217:8060:52806} (0x1510a5dcb0c0): Connection timed out [110]
W20260303 12:55:42.964368 23297840948992 input_messenger.cpp:377] Fail to read from Socket{id=274877931950 fd=3005 addr=x.x.x.217:8060:62876} (0x14ec54cb2840): Connection timed out [110]
W20260303 12:55:42.964394 23297864062720 input_messenger.cpp:377] Fail to read from Socket{id=300647727327 fd=3777 addr=x.x.x.217:8060:51774} (0x1517a29c4600): Connection timed out [110]
W20260303 12:55:42.964380 23297910290176 input_messenger.cpp:377] Fail to read from Socket{id=292057802652 fd=3188 addr=x.x.x.217:8060:62888} (0x150e20356f00): Connection timed out [110]
W20260303 12:55:42.964374 23297874568960 input_messenger.cpp:377] Fail to read from Socket{id=326417537766 fd=3151 addr=x.x.x.217:8060:62886} (0x150447187040): Connection timed out [110]
W20260303 12:55:42.973374 23297874568960 input_messenger.cpp:377] Fail to read from Socket{id=300647730602 fd=3650 addr=x.x.x.217:8060:62906} (0x150bacdc7e00): Connection timed out [110]
W20260303 12:55:43.020360 23298066302720 input_messenger.cpp:377] Fail to read from Socket{id=335007461293 fd=4463 addr=x.x.x.217:8060:53054} (0x150c621e01c0): Connection timed out [110]
W20260303 12:55:43.028521 23297878771456 input_messenger.cpp:377] Fail to read from Socket{id=687194771163 fd=3205 addr=x.x.x.217:8060:62890} (0x1528568e82c0): Connection timed out [110]
W20260303 12:55:43.047843 23298064201472 input_messenger.cpp:377] Fail to read from Socket{id=231928258238 fd=4636 addr=x.x.x.217:8060:52638} (0x151529e67b80): Connection timed out [110]
W20260303 12:55:43.092355 23297845151488 input_messenger.cpp:377] Fail to read from Socket{id=292057800132 fd=3797 addr=x.x.x.217:8060:51790} (0x151529b3f440): Connection timed out [110]
W20260303 12:55:43.100408 23297845151488 input_messenger.cpp:377] Fail to read from Socket{id=652835037624 fd=4229 addr=x.x.x.217:8060:52244} (0x1523ebecb780): Connection timed out [110]
W20260303 12:55:43.101353 23297868265216 input_messenger.cpp:377] Fail to read from Socket{id=463856474781 fd=4235 addr=x.x.x.217:8060:52890} (0x15286dfa3180): Connection timed out [110]
W20260303 12:55:43.101354 23297859860224 input_messenger.cpp:377] Fail to read from Socket{id=601295429017 fd=4608 addr=x.x.x.217:8060:52582} (0x15263ba06600): Connection timed out [110]
W20260303 12:55:43.108371 23297845151488 input_messenger.cpp:377] Fail to read from Socket{id=377957144004 fd=4378 addr=x.x.x.217:8060:52392} (0x151adfdd7400): Connection timed out [110]
W20260303 12:55:43.116382 23297838847744 input_messenger.cpp:377] Fail to read from Socket{id=180388649018 fd=3755 addr=x.x.x.217:8060:52996} (0x151f921c3040): Connection timed out [110]
W20260303 12:55:43.172358 23297910290176 input_messenger.cpp:377] Fail to read from Socket{id=163208774466 fd=4563 addr=x.x.x.217:8060:53086} (0x1521cc981380): Connection timed out [110]
W20260303 12:55:43.220371 23298066302720 input_messenger.cpp:377] Fail to read from Socket{id=627065233369 fd=3990 addr=x.x.x.217:8060:52000} (0x15292910c940): Connection timed out [110]
W20260303 12:55:43.284363 23297872467712 input_messenger.cpp:377] Fail to read from Socket{id=214748390870 fd=4575 addr=x.x.x.217:8060:52544} (0x151e0dca4b00): Connection timed out [110]
W20260303 12:55:43.348361 23297882973952 input_messenger.cpp:377] Fail to read from Socket{id=335007464939 fd=4339 addr=x.x.x.217:8060:52362} (0x14f52fc3a400): Connection timed out [110]
W20260303 12:55:43.412383 23297874568960 input_messenger.cpp:377] Fail to read from Socket{id=506806154487 fd=4213 addr=x.x.x.217:8060:52224} (0x150d4c4b69c0): Connection timed out [110]
W20260303 12:55:43.476369 23297843050240 input_messenger.cpp:377] Fail to read from Socket{id=223338325628 fd=4153 addr=x.x.x.217:8060:52846} (0x150580aaf400): Connection timed out [110]
W20260303 12:55:56.876829 23297851455232 input_messenger.cpp:377] Fail to read from Socket{id=420906814584 fd=4089 addr=x.x.x.217:52294:8060} (0x151982a5a540): Connection reset by peer [104]
W20260303 12:55:56.876991 23298049492736 input_messenger.cpp:377] Fail to read from Socket{id=180388655060 fd=4202 addr=x.x.x.217:53294:8060} (0x15146f5a1b40): Connection reset by peer [104]
W20260303 12:55:56.877001 23298055796480 input_messenger.cpp:377] Fail to read from Socket{id=386547077876 fd=4468 addr=x.x.x.217:53390:8060} (0x15158c821a40): Connection reset by peer [104]
W20260303 12:55:56.877001 23297834645248 input_messenger.cpp:377] Fail to read from Socket{id=687194773047 fd=4221 addr=x.x.x.217:53296:8060} (0x1521ef1f0b40): Connection reset by peer [104]
W20260303 12:55:56.877205 23298064201472 input_messenger.cpp:377] Fail to read from Socket{id=652835039960 fd=4186 addr=x.x.x.217:52350:8060} (0x151d16b7a3c0): Connection reset by peer [104]
W20260303 12:55:56.877230 23297889277696 input_messenger.cpp:377] Fail to read from Socket{id=747324322880 fd=3976 addr=x.x.x.217:53194:8060} (0x1516cfceb240): Connection reset by peer [104]
W20260303 12:55:56.877232 23297870366464 input_messenger.cpp:377] Fail to read from Socket{id=369367204401 fd=4108 addr=x.x.x.217:53270:8060} (0x1517b2064800): Connection reset by peer [104]
W20260303 12:55:56.878070 23299411126016 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=420906814584 fd=4089 addr=x.x.x.217:52294:8060} (0x151982a5a540): Connection reset by peer [104]
W20260303 12:55:56.884694 23297876670208 input_messenger.cpp:377] Fail to read from Socket{id=300647734395 fd=4179 addr=x.x.x.217:53306:8060} (0x1518aabc20c0): Connection reset by peer [104]
W20260303 12:55:56.885519 23297897682688 input_messenger.cpp:377] Fail to read from Socket{id=188978588592 fd=5063 addr=x.x.x.217:3464:8060} (0x151debc38f80): Connection reset by peer [104]
W20260303 12:55:56.889109 23299501479680 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=747324322880 fd=3976 addr=x.x.x.217:53194:8060} (0x1516cfceb240): Connection reset by peer [104]
W20260303 12:55:56.890538 23299406923520 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=300647734395 fd=4179 addr=x.x.x.217:53306:8060} (0x1518aabc20c0): Connection reset by peer [104]
W20260303 12:55:56.891371 23299474163456 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=369367204401 fd=4108 addr=x.x.x.217:53270:8060} (0x1517b2064800): Connection reset by peer [104]
W20260303 12:55:56.901785 23299434239744 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=180388655060 fd=4202 addr=x.x.x.217:53294:8060} (0x15146f5a1b40): Connection reset by peer [104]
W20260303 12:55:56.917555 23299434239744 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=687194773047 fd=4221 addr=x.x.x.217:53296:8060} (0x1521ef1f0b40): Connection reset by peer [104]
W20260303 13:00:12.596358 23297908188928 input_messenger.cpp:377] Fail to read from Socket{id=360777256238 fd=4169 addr=x.x.x.217:8060:64356} (0x152cce50c740): Connection timed out [110]
W20260303 13:01:57.598826 23297845151488 input_messenger.cpp:377] Fail to read from Socket{id=601295433848 fd=4500 addr=x.x.x.217:6366:8060} (0x15150a17b300): Connection reset by peer [104]
W20260303 13:01:57.598852 23298051593984 input_messenger.cpp:377] Fail to read from Socket{id=343597408689 fd=3258 addr=x.x.x.151:21578:8060} (0x14ec54cb2fc0): Connection reset by peer [104]
W20260303 13:01:57.598968 23298038986496 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=601295433848 fd=4500 addr=x.x.x.217:6366:8060} (0x15150a17b300): Connection reset by peer [104]