为了更快的定位您的问题,请提供以下信息,谢谢
【详述】broker load 报错: [E1011]The server is overcrowded
【背景】无变更
【业务影响】
【StarRocks版本】例如:2.5.10
【集群规模】3fe + 5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,48C/256G/万兆
【表模型】更新模型
【导入或者导出方式】Broker Load
【联系方式】13556416203@163.com
【附件】
截图无法上传,以下是FE日志:
2023-09-11 02:20:53,669 INFO (Load job scheduler|43) [DatabaseTransactionMgr.beginTransaction():309] begin transaction: txn_id: 4092327 with label ods_waybill_rt_20230911_dn20230911022000000_10 from coordinator FE: 10.153.164.31, listner id: 4738303
2023-09-11 04:21:08,999 INFO (loading_load_task_scheduler_priority_pool-4|311608) [DatabaseTransactionMgr.abortTransaction():1288] transaction:[TransactionState. txn_id: 4092327, label: ods_waybill_rt_20230911_dn20230911022000000_10, db id: 10163, table id list: 3139088, callback id: 4738303, coordinator: FE: 10.153.164.31, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1694370053669, commit time: -1, finish time: 1694377268998, total cost: 7215329ms, reason: Commit failed. txn: 4092327 table: ods_waybill_rt tablet: 3141742 quorum: 0<2 errorReplicas: 3141744:{be:10003 10.153.164.30 st:NORMAL V:102749 LFV:-1},3141745:{be:10004 10.153.164.31 st:NORMAL V:102749 LFV:-1},3141743:{be:10006 10.153.164.33 st:NORMAL V:102749 LFV:-1}, attachment: LoadJobEndOperation{id=4738303, loadingStatus=EtlStatus{state=CANCELLED, trackingUrl=’\N’, stats={STARROCKS_LOAD_STATISTIC={“counterTbl”:{“clazz”:“HashBasedTable”,“rowKeys”:[“087b4461-8b63-4df7-9180-a6e8a0790e12”],“columnKeys”:[“087b4461-8b63-4df7-9180-a6e8a0790e13”,“087b4461-8b63-4df7-9180-a6e8a0790e14”,“087b4461-8b63-4df7-9180-a6e8a0790e15”,“087b4461-8b63-4df7-9180-a6e8a0790e16”,“087b4461-8b63-4df7-9180-a6e8a0790e17”],“cells”:[0,0,5960604,0,1,5496817,0,2,5616101,0,3,5730683,0,4,6088859]},“unfinishedBackendIds”:{“087b4461-8b63-4df7-9180-a6e8a0790e12”:[]},“allBackendIds”:{“087b4461-8b63-4df7-9180-a6e8a0790e12”:[10007,10004,10005,10006,10003]},“fileNum”:252,“totalFileSizeB”:29221565055,“sinkBytesCounterTbl”:{“clazz”:“HashBasedTable”,“rowKeys”:[“087b4461-8b63-4df7-9180-a6e8a0790e12”],“columnKeys”:[“087b4461-8b63-4df7-9180-a6e8a0790e13”,“087b4461-8b63-4df7-9180-a6e8a0790e14”,“087b4461-8b63-4df7-9180-a6e8a0790e15”,“087b4461-8b63-4df7-9180-a6e8a0790e16”,“087b4461-8b63-4df7-9180-a6e8a0790e17”],“cells”:[0,0,1511194362,0,1,1393558456,0,2,1423847346,0,3,1452902948,0,4,1543713265]},“sourceRowsCounterTbl”:{“clazz”:“HashBasedTable”,“rowKeys”:[“087b4461-8b63-4df7-9180-a6e8a0790e12”],“columnKeys”:[“087b4461-8b63-4df7-9180-a6e8a0790e13”,“087b4461-8b63-4df7-9180-a6e8a0790e14”,“087b4461-8b63-4df7-9180-a6e8a0790e15”,“087b4461-8b63-4df7-9180-a6e8a0790e16”,“087b4461-8b63-4df7-9180-a6e8a0790e17”],“cells”:[0,0,5960604,0,1,5496817,0,2,5616101,0,3,5730683,0,4,6088859]},“sourceBytesCounterTbl”:{“clazz”:“HashBasedTable”,“rowKeys”:[“087b4461-8b63-4df7-9180-a6e8a0790e12”],“columnKeys”:[“087b4461-8b63-4df7-9180-a6e8a0790e13”,“087b4461-8b63-4df7-9180-a6e8a0790e14”,“087b4461-8b63-4df7-9180-a6e8a0790e15”,“087b4461-8b63-4df7-9180-a6e8a0790e16”,“087b4461-8b63-4df7-9180-a6e8a0790e17”],“cells”:[0,0,1510907706,0,1,1393294264,0,2,1423577010,0,3,1452627620,0,4,1543420657]},“loadFinish”:true}}, counters={unselected.rows=0, dpp.abnorm.ALL=0, dpp.norm.ALL=28893064}, tableCounters={3139088={table_load_bytes=7325216377, table_load_finished=1, table_load_rows=28893064}}, fileMap={}, progress=0, failMsg=’’, dppResult=‘null’}, progress=99, loadStartTimestamp=1694370053754, finishTimestamp=-1, jobState=LOADING, failMsg=null}] successfully rollback
根据FE的txn_id: 4092327,在BE找到如下日志:
be.INFO.log.20230911-020056:W0911 02:21:28.301134 491189 segment_replicate_executor.cpp:152] Failed to send rpc to SyncChannnel [host: 10.153.164.33, port: 8060, load_id: 087b4461-8b63-4df7-9180-a6e8a0790e12, tablet_id: 3141766, txn_id: 4092327] err=Internal error: [E1011]The server is overcrowded @10.153.164.33:8060 [R1][E1011]The server is overcrowded @10.153.164.33:8060 [R2][E1011]The server is overcrowded @10.153.164.33:8060 [R3][E1011]The server is overcrowded @10.153.164.33:8060
be.INFO.log.20230911-020056:W0911 02:21:28.301162 491189 segment_replicate_executor.cpp:279] Failed to sync segment SyncChannnel [host: 10.153.164.33, port: 8060, load_id: 087b4461-8b63-4df7-9180-a6e8a0790e12, tablet_id: 3141766, txn_id: 4092327] err Internal error: [E1011]The server is overcrowded @10.153.164.33:8060 [R1][E1011]The server is overcrowded @10.153.164.33:8060 [R2][E1011]The server is overcrowded @10.153.164.33:8060 [R3][E1011]The server is overcrowded @10.153.164.33:8060
另外,在BE WARNING中还发现大量如下日志:
W0911 13:13:09.446166 492313 runtime_filter_worker.cpp:272] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=400ms @10.153.164.32:8060
W0911 13:13:09.488420 492313 runtime_filter_worker.cpp:272] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=400ms @10.153.164.30:8060