【背景】有个节点磁盘坏掉了,修复后删除掉节点又重新加入了该节点
目前正在进行数据自动平衡 几乎每隔一小时就会挂掉。
【业务影响】
【StarRocks版本】例如:2.5.0
【集群规模】例如:3fe +4be(fe与be混部)
【机器信息】24C/300G/万兆
【附件】
be.out:
start time: Wed Feb 1 15:32:37 CST 2023
terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
*** Aborted at 1675237414 (unix time) try “date -d @1675237414” if you are using GNU date ***
PC: @ 0x2afb2dd0a207 __GI_raise
*** SIGABRT (@0x7d000007d54) received by PID 32084 (TID 0x2afc2bc1d700) from PID 32084; stack trace: ***
@ 0x59fe902 google::(anonymous namespace)::FailureSignalHandler()
@ 0x2afb2d3b95d0 (unknown)
@ 0x2afb2dd0a207 __GI_raise
@ 0x2afb2dd0b8f8 __GI_abort
@ 0x2b00334 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x7d22ef6 __cxxabiv1::__terminate()
@ 0x7d22f61 std::terminate()
@ 0x7d230b4 __cxa_throw
@ 0x2b0024c _Znwm.cold
@ 0x4aad87f std::vector<>::_M_default_append()
@ 0x4a998db starrocks::TOlapTablePartitionParam::read()
@ 0x4a69ba0 starrocks::TOlapTableSink::read()
@ 0x4a7786c starrocks::TDataSink::read()
@ 0x4b5ad06 starrocks::TPlanFragment::read()
@ 0x4972bc5 starrocks::TExecPlanFragmentParams::read()
@ 0x49ca5bc starrocks::TStreamLoadPutResult::read()
@ 0x498a2c9 starrocks::FrontendService_streamLoadPut_presult::read()
@ 0x49a0bce starrocks::FrontendServiceClient::recv_streamLoadPut()
@ 0x48625a7 starrocks::ThriftRpcHelper::rpc<>()
@ 0x4d302e2 starrocks::StreamLoadAction::_process_put()
@ 0x4d313fb starrocks::StreamLoadAction::_on_header()
@ 0x4d321aa starrocks::StreamLoadAction::on_header()
@ 0x4d16b88 starrocks::EvHttpServer::on_header()
@ 0x5a87d2a evhttp_read_header
@ 0x5a8a6e3 bufferevent_readcb
@ 0x5a76e22 event_process_active_single_queue
@ 0x5a7755f event_base_loop
@ 0x4d15884 _ZZN9starrocks12EvHttpServer5startEvENKUlvE_clEv
@ 0x7d9c8e0 execute_native_thread_routine
@ 0x2afb2d3b1dd5 start_thread
@ 0x2afb2ddd1ead __clone
@ 0x0 (unknown)
be.Warning:***************************
W0201 16:47:16.863003 17881 exec_state_reporter.cpp:129] Retrying ReportExecStatus: No more data to read.
W0201 16:47:20.313288 24523 engine_clone_task.cpp:342] Fail to make snapshot from 10.161.24.42: Not found: get_rowsets_for_snapshot: too many rowsets for incremental clone #rowset:26 #rowset_for_full_clone:1 tablet:259754468 #version:1 [125.1 125.1@0 125.1] #pending:0 tablet:259754468
W0201 16:47:20.313349 24523 engine_clone_task.cpp:293] Fail to clone tablet. tablet_id:259754468, schema_hash:334661471, signature:259754468, version:99, expected_version: 125
W0201 16:47:20.313369 24523 agent_task.cpp:321] clone failed. signature: 259754468
W0201 16:47:21.500051 24523 engine_clone_task.cpp:342] Fail to make snapshot from 10.161.24.43: Not found: get_rowsets_for_snapshot: too many rowsets for incremental clone #rowset:26 #rowset_for_full_clone:1 tablet:259754532 #version:1 [125.1 125.1@0 125.1] #pending:0 tablet:259754532
W0201 16:47:21.500068 24524 engine_clone_task.cpp:342] Fail to make snapshot from 10.161.24.43: Not found: get_rowsets_for_snapshot: too many rowsets for incremental clone #rowset:75 #rowset_for_full_clone:1 tablet:259745021 #version:1 [608.1 608.1@0 608.1] #pending:0 tablet:259745021
W0201 16:47:21.500113 24523 engine_clone_task.cpp:293] Fail to clone tablet. tablet_id:259754532, schema_hash:334661471, signature:259754532, version:99, expected_version: 125
W0201 16:47:21.500115 24524 engine_clone_task.cpp:293] Fail to clone tablet. tablet_id:259745021, schema_hash:816410294, signature:259745021, version:533, expected_version: 608
W0201 16:47:21.500124 24523 agent_task.cpp:321] clone failed. signature: 259754532
W0201 16:47:21.500133 24524 agent_task.cpp:321] clone failed. signature: 259745021
W0201 16:47:23.852612 17881 exec_state_reporter.cpp:129] Retrying ReportExecStatus: No more data to read.
W0201 16:47:26.867254 17881 exec_state_reporter.cpp:129] Retrying ReportExecStatus: No more data to read.
W0201 16:47:31.496527 24327 binary_converter.cpp:50] Column [binary]'s length exceed max varchar length.
W0201 16:47:31.828539 24328 binary_converter.cpp:50] Column [binary]'s length exceed max varchar length.
W0201 16:47:32.084055 24330 binary_converter.cpp:50] Column [binary]'s length exceed max varchar length.
W0201 16:47:32.972941 24335 binary_converter.cpp:50] Column [binary]'s length exceed max varchar length.
W0201 16:47:35.482699 24274 runtime_profile.cpp:932] find non-isomorphic children, profile_name=ChunkSource0, children_names=[FileScanner], another profile_name=ChunkSource1, another children_names=[]
W0201 16:47:36.366183 24716 thrift_rpc_helper.cpp:63] retrying call frontend service after 100 ms, address=TNetworkAddress(hostname=10.161.24.43, port=9020), reason=No more data to read.
W0201 16:47:37.023480 24176 exec_state_reporter.cpp:129] Retrying ReportExecStatus: No more data to read.
W0201 16:47:37.059504 24176 exec_state_reporter.cpp:129] Retrying ReportExecStatus: No more data to read.
W0201 16:47:43.806771 24539 utils.cpp:96] master client, retry finishTask: No more data to read.
W0201 16:47:44.765103 24247 runtime_profile.cpp:932] find non-isomorphic children, profile_name=ChunkSource0, children_names=[FileScanner], another profile_name=ChunkSource1, another children_names=[]
W0201 16:47:44.766335 24176 exec_state_reporter.cpp:129] Retrying ReportExecStatus: write() send(): Broken pipe
W0201 16:47:44.807842 24539 utils.cpp:96] master client, retry finishTask: No more data to read.
W0201 16:47:45.809010 24539 utils.cpp:96] master client, retry finishTask: No more data to read.
W0201 16:47:46.347380 24525 utils.cpp:56] master client, retry finishTask: No more data to read.
W0201 16:47:46.810117 24539 utils.cpp:96] master client, retry finishTask: No more data to read.
W0201 16:47:46.825505 24525 utils.cpp:56] master client, retry finishTask: write() send(): Broken pipe
W0201 16:47:46.862044 24726 thrift_rpc_helper.cpp:63] retrying call frontend service after 100 ms, address=TNetworkAddress(hostname=10.161.24.43, port=9020), reason=No more data to read.
W0201 16:47:47.105060 24176 exec_state_reporter.cpp:129] Retrying ReportExecStatus: No more data to read.
W0201 16:47:47.116035 24176 exec_state_reporter.cpp:129] Retrying ReportExecStatus: No more data to read.