【详述】StreamLoad 导入任务经常失败报错消息 Message : [E1008]Reached timeout=120000ms @x.x.x.3:8060
【背景】30 + 的 streamLoad 任务,实时导入 50 +个表的数据,数据量都不大(几十万条数据,每天增量几十到几万条),每个任务一分钟从mysql 采集一次数据,如果有数据就往 starrocks 写,一个任务写多个表时,按采集数据依次写入每个表的数据
【业务影响】
【StarRocks版本】2.3.1
【集群规模】例如:3fe(3 follower)+4be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:16C 64G
【表模型】大部分是主键模型,小部分明细模型
【导入或者导出方式】java 使用streamload 方式导入
【联系方式】社区6群-春江
【附件】
-streamload 导入任务失败报错信息:
Status : Fail
BeginTxnTimeMs : 364
Message : [E1008]Reached timeout=120000ms @x.x.x.3:8060
NumberUnselectedRows : 0
CommitAndPublishTimeMs : 0
Label : load_xxx_1673393499144
LoadBytes : 5223
StreamLoadPutTimeMs : 1524
NumberTotalRows : 0
WriteDataTimeMs : 120006
TxnId : 14805062
LoadTimeMs : 121895
ReadDataTimeMs : 0
NumberLoadedRows : 0
NumberFilteredRows : 0
-be.warning.log
W0111 09:09:03.500874 450 fragment_context.cpp:19] [Driver] Canceled, query_id=897a29b7-914c-11ed-a698-525400342343, instance_id=897a29b7-914c-11ed-a698-525400342346, reason=Cancelled: LimitReach
W0111 09:09:48.828512 32683 tablet_sink.cpp:1021] close channel failed. channel_name=NodeChannel[24707746-10038], load_info=load_id=e442709b-c31d-b1e8-c6e0-0186056fa3a1, txn_id: 14806962, parallel=1, compress_type=2, error_msg=[E1008]Reached timeout=120000ms @x.x.x.20:8060
W0111 09:09:48.828569 32683 tablet_sink.cpp:1021] close channel failed. channel_name=NodeChannel[24707746-10034], load_info=load_id=e442709b-c31d-b1e8-c6e0-0186056fa3a1, txn_id: 14806962, parallel=1, compress_type=2, error_msg=[E1008]Reached timeout=120000ms @x.x.x.4:8060
W0111 09:09:48.828577 32683 tablet_sink.cpp:1021] close channel failed. channel_name=NodeChannel[24707746-10002], load_info=load_id=e442709b-c31d-b1e8-c6e0-0186056fa3a1, txn_id: 14806962, parallel=1, compress_type=2, error_msg=[E1008]Reached timeout=120000ms @x.x.x.2:8060
W0111 09:09:48.828584 32683 tablet_sink.cpp:1021] close channel failed. channel_name=NodeChannel[24707746-10028], load_info=load_id=e442709b-c31d-b1e8-c6e0-0186056fa3a1, txn_id: 14806962, parallel=1, compress_type=2, error_msg=[E1008]Reached timeout=120000ms @x.x.x.3:8060
W0111 09:09:48.828896 32683 plan_fragment_executor.cpp:185] Fail to open fragment, instance_id=e442709b-c31d-b1e8-c6e0-0186056fa3a2, status=Internal error: [E1008]Reached timeout=120000ms @x.x.x.3:8060
/root/starrocks/be/src/exec/tablet_sink.cpp:425 _wait_request(closure)
/root/starrocks/be/src/exec/tablet_sink.cpp:522 _wait_all_prev_request()
W0111 09:09:48.828958 32683 fragment_mgr.cpp:180] Fail to open fragment e442709b-c31d-b1e8-c6e0-0186056fa3a2: Internal error: [E1008]Reached timeout=120000ms @x.x.x.3:8060
/root/starrocks/be/src/exec/tablet_sink.cpp:425 _wait_request(closure)
/root/starrocks/be/src/exec/tablet_sink.cpp:522 _wait_all_prev_request()
W0111 09:09:48.829569 32683 stream_load_executor.cpp:89] fragment execute failed, query_id=e442709bc31db1e8-c6e00186056fa3a1, err_msg=[E1008]Reached timeout=120000ms @x.x.x.3:8060, id=e442709bc31db1e8-c6e00186056fa3a1, job_id=-1, txn_id: 14806962, label=load_ods_user_answer_info_1673399267519, db=sl03_pro
W0111 09:09:48.829612 504 stream_load.cpp:133] Fail to handle streaming load, id=e442709bc31db1e8-c6e00186056fa3a1 errmsg=[E1008]Reached timeout=120000ms @x.x.x.3:8060
请问下这个问题还有出现吗?
在社区大佬的帮助下,初步判断 Tablet太多,导致FE元数据太多,占用太多内存,long gc导致,现在已经调大了 FE 内存,处理部分表的 bucket 数,持续持续观察中
1赞
在,观测FE发现内存并不多 不知道什么原因导致