为了更快的定位您的问题,请提供以下信息,谢谢
【详述】routine load 任务每天凌晨1点,上午 9 10点左右,状态会从running变为PAUSED
【背景】最开始部署的时候未把 元数据和数据目录存放在 数据盘,近期做过 fe be的元数据和数据目录迁移,5月26日做的迁移,28号开始出问题 每天凌晨1点,上午 9 10点左右,状态会从running变为PAUSED
【业务影响】 被投诉
【是否存算分离】 存算一体、混合部署
【StarRocks版本】3.4.1-2f78e09
【集群规模】3fe(1leader 2 follower)+ 3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:6C/64G/万兆
【联系方式】社区群26-张宇辉 邮箱 654561513@qq.com,谢谢
【附件】
ReasonOfStateChanged:
错误一: ErrorReason{errCode = 104, msg=‘be 10003 abort task with reason: kafka consume failed, err: fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 3)’}
错误二: ErrorReason{errCode = 104, msg=‘FE aborts the task with reason: failed to check task ready to execute, err: Consume offset: 885003 is greater than the latest offset: 883238 in kafka partition: 1. You can modify ‘kafka_offsets’ property through ALTER ROUTINE LOAD and RESUME the job’}
LatestSourcePosition: {“0”:“889833”,“1”:“883238”,“2”:“890369”}
Progress: {“0”:“889832”,“1”:“885002”,“2”:“890369”}
kafka offset:
[root@ bin]# ./kafka-consumer-groups.sh --bootstrap-server 10.120.7.102:9092 --describe --group group_starrock_20250529
Consumer group ‘group_starrock_20250529’ has no active members.
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
group_starrock_20250529 equipment_signal_log_topic 2 890370 898518 8148 - - -
group_starrock_20250529 equipment_signal_log_topic 1 885003 893150 8147 - - -
group_starrock_20250529 equipment_signal_log_topic 0 889833 898013 8180 - - -
kafka正常、Progress: {“0”:“889832”,“1”:“885002”,“2”:“890369”} ,为什么 LatestSourcePosition 不正常?
查看任务分配情况,发现 BeId 为 -1,但是任务确实一直再跑,数据也在 正常同步
FE正常
BE正常