为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
【背景】做过哪些操作?
【业务影响】
【是否存算分离】
【StarRocks版本】例如:3.3.0
【集群规模】例如:3fe(1 follower+2observer)+3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:16C/64G/万兆
【联系方式】社区群15,redscarf
【附件】
- be.INFO
I0816 19:12:18.226403 1042 compaction_task.cpp:39] start compaction. task_id:22416, tablet:2629792, algorithm:VERTICAL_COMPACTION, compaction_type:cumulative, compaction_score:59.0758, output_version:[54768-54785], input rowsets size:18
I0816 19:12:18.226452 1044 compaction_manager.cpp:87] submit task to compaction pool, task_id:22418, tablet_id:2629804, compaction_type:cumulative, compaction_score:57.258 for round:22554, candidates_size:0
E0816 19:12:18.226548 1041 threadpool.cpp:455] Thread pool failed to create thread: Runtime error: Could not create thread: Resource temporarily unavailable
I0816 19:12:18.226768 123893 compaction_task.cpp:39] start compaction. task_id:22417, tablet:2629798, algorithm:VERTICAL_COMPACTION, compaction_type:cumulative, compaction_score:66.522, output_version:[54768-54785], input rowsets size:18
I0816 19:12:18.226778 1040 size_tiered_compaction_policy.cpp:353] pick tablet 2629810 for size-tiered compaction rowset version=54769-54785 score=40 level_size=66604 total_size=1081409 segment_num=17 force_base_compaction=0 reached_max_versions=0
I0816 19:12:18.226843 1044 compaction_manager.cpp:87] submit task to compaction pool, task_id:22419, tablet_id:2629810, compaction_type:cumulative, compaction_score:71.6143 for round:22555, candidates_size:0
I0816 19:12:18.313393 1153 stream_load.cpp:243] new income streaming load request.id=e6480452682071e3-2374d5ca100363bc, job_id=-1, txn_id: -1, label=flinkx_connector_20240816_191210_9f6da15928414e73a54c57f522e0d4a7, db=clog, db=clog, tbl=containers_log
I0816 19:12:18.314041 1150 stream_load.cpp:243] new income streaming load request.id=2145c51841d732d1-b88453f4cf7fd8b4, job_id=-1, txn_id: -1, label=flinkx_connector_20240816_191210_ed58e0cd6e644bdfa2c096e574763c7a, db=clog, db=clog, tbl=containers_log
I0816 19:12:18.314802 1078 local_tablets_channel.cpp:711] LocalTabletsChannel txn_id: 1564623 load_id: 88401629-4aeb-2747-3fd6-8b5343d278b1 open 8 delta writers, 0 failed_tablets: _num_remaining_senders: 1
I0816 19:12:18.317302 1152 stream_load.cpp:243] new income streaming load request.id=e1411c820c4bc9f7-7fe8e3219ff36f9a, job_id=-1, txn_id: -1, label=flinkx_connector_20240816_191211_19163d60242c48afaebb16a15686bb0a, db=clog, db=clog, tbl=containers_log
I0816 19:12:18.318394 1153 stream_load_executor.cpp:77] begin to execute job. label=flinkx_connector_20240816_191210_9f6da15928414e73a54c57f522e0d4a7, txn_id: 1564625, query_id=e6480452-6820-71e3-2374-d5ca100363bc
I0816 19:12:18.318485 1153 plan_fragment_executor.cpp:83] Prepare(): query_id=e6480452-6820-71e3-2374-d5ca100363bc fragment_instance_id=e6480452-6820-71e3-2374-d5ca100363bd backend_num=0
I0816 19:12:18.319118 1090 local_tablets_channel.cpp:711] LocalTabletsChannel txn_id: 1564624 load_id: 9c4ed4a8-dfc8-fa9b-893a-cf58dbee709c open 8 delta writers, 0 failed_tablets: _num_remaining_senders: 1
I0816 19:12:18.319233 1150 stream_load_executor.cpp:77] begin to execute job. label=flinkx_connector_20240816_191210_ed58e0cd6e644bdfa2c096e574763c7a, txn_id: 1564626, query_id=2145c518-41d7-32d1-b884-53f4cf7fd8b4
I0816 19:12:18.319335 1150 plan_fragment_executor.cpp:83] Prepare(): query_id=2145c518-41d7-32d1-b884-53f4cf7fd8b4 fragment_instance_id=2145c518-41d7-32d1-b884-53f4cf7fd8b5 backend_num=0
I0816 19:12:18.319592 747 plan_fragment_executor.cpp:192] Open(): fragment_instance_id=e6480452-6820-71e3-2374-d5ca100363bd
I0816 19:12:18.320350 749 plan_fragment_executor.cpp:192] Open(): fragment_instance_id=2145c518-41d7-32d1-b884-53f4cf7fd8b5
I0816 19:12:18.320616 1100 local_tablets_channel.cpp:711] LocalTabletsChannel txn_id: 1564625 load_id: e6480452-6820-71e3-2374-d5ca100363bc open 8 delta writers, 0 failed_tablets: _num_remaining_senders: 1
I0816 19:12:18.321417 1105 local_tablets_channel.cpp:711] LocalTabletsChannel txn_id: 1564626 load_id: 2145c518-41d7-32d1-b884-53f4cf7fd8b4 open 8 delta writers, 0 failed_tablets: _num_remaining_senders: 1
E0816 19:12:18.321874 747 threadpool.cpp:455] Thread pool failed to create thread: Runtime error: Could not create thread: Resource temporarily unavailable
E0816 19:12:18.322134 749 threadpool.cpp:455] Thread pool failed to create thread: Runtime error: Could not create thread: Resource temporarily unavailable
- be crash
- be.out
start time: Sun Aug 18 22:46:25 CST 2024, server uptime: 22:46:25 up 158 days, 8:08, 0 users, load average: 154.89, 52.10, 22.27
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
3.3.0 RELEASE (build 19a3f66)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 2561198336
tracker:query_pool consumption: 0
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 0
tracker:metadata consumption: 30574239
tracker:tablet_metadata consumption: 10644196
tracker:rowset_metadata consumption: 11687850
tracker:segment_metadata consumption: 1472779
tracker:column_metadata consumption: 6769414
tracker:tablet_schema consumption: 544252
tracker:segment_zonemap consumption: 546681
tracker:short_key_index consumption: 804869
tracker:column_zonemap_index consumption: 1172670
tracker:ordinal_index consumption: 1792736
tracker:bitmap_index consumption: 15600
tracker:bloom_filter_index consumption: 30840
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 883585312
tracker:jit_cache consumption: 7624
tracker:update consumption: 2847660
tracker:chunk_allocator consumption: 43188984
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1724039747 (unix time) try "date -d @1724039747" if you are using GNU date ***
PC: @ 0x7f22800979fc pthread_kill
*** SIGABRT (@0x18) received by PID 24 (TID 0x7f2156683640) from PID 24; stack trace: ***
@ 0x990848a google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f2280043520 (unknown)
@ 0x7f22800979fc pthread_kill
@ 0x7f2280043476 raise
@ 0x7f22800297f3 abort
@ 0xe787763 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0xe785e1c __cxxabiv1::__terminate()
@ 0xe785e87 std::terminate()
@ 0xe785fe9 __cxa_throw
@ 0xe820cc8 std::__throw_system_error()
@ 0xe820f3d std::thread::_M_start_thread()
@ 0x98e8c6f apache::thrift::server::TThreadedServer::onClientConnected()
@ 0x98e45d3 apache::thrift::server::TServerFramework::serve()
@ 0x98e9432 apache::thrift::server::TThreadedServer::serve()
@ 0x8309ac3 starrocks::ThriftServer::ThriftServerEventProcessor::supervise()
@ 0xe820e54 execute_native_thread_routine
@ 0x7f2280095ac3 (unknown)
@ 0x7f2280126a04 clone
@ 0x0 (unknown)
目前3个BE均出现不同程度的重启,thrift_server 这个线程一直处于增长
总结:
1、现象一:thrift_server线程到达3000多之后(大约12个小时左右),be重启
2、现象二:be的线程总数达到3万左右,be重启
pstack 信息
pstack-10002.txt (40.9 KB)
pstack-10215.txt (41.8 KB)