【详述】从2.4.4升级版本到2.5.3
【背景】升级后发现一些RoutineLoad状态是Paused,使用show routine load去查看,发现routine load中一些自定义的where条件和columns处理函数看不到了(where条件没了,columns是*),同时留言到升级前,这些RoutineLoad的名称在不同的database下是相同的。因为业务上有这些routine load的原始语句,所以选择重新建routine load,步骤是先stop这些有问题的routine load,然后在其中一个database,使用新的名字重建routine load后,确认数据正常导入,然后在另外一个database,继续使用旧的名字重建routine load后,也确认数据正常导入。
现在遗留的问题是,checkpoint线程读取log,因为曾经创建过的routine load不能正常解析,导致一直有错误日志。
【业务影响】
【StarRocks版本】2.5.3
【集群规模】例如:3fe(1 leader+2follower)+ 3be
【联系方式】jimokanghanchao@gmail.com
【附件】
- fe.log的日志
- 升级后,问题刚发生时的日志
2023-04-01 02:24:37,429 ERROR (UNKNOWN 10.100.8.178_9010_1676709855154(-1)|1) [CreateRoutineLoadStmt.getLoadDesc():368] error happens when parsing create/alter routine load stmt: create routine load abc.routine_load_ods_wls_user_suggestion_c1 on ods_wls_user_suggestion columns ( xxxx, xxxx, cdate = date(create_time) ),where cdate>= DATE(CURRENT_TIMESTAMP() + INTERVAL -7 DAY) PROPERTIES ( “format”=“json”, “desired_concurrent_number” = “1”, “json_root”="$.data", “strip_outer_array” =“true”, “max_batch_interval” = “5”, “max_error_number”=“0” ) FROM KAFKA ( “kafka_broker_list”= “xxxx”, “property.group.id” = “xxxxx”, “property.kafka_default_offsets” = “OFFSET_BEGINNING” );
2023-04-01 02:24:37,513 ERROR (UNKNOWN 10.100.8.178_9010_1676709855154(-1)|1) [CreateRoutineLoadStmt.getLoadDesc():368] error happens when parsing create/alter routine load stmt: CREATE ROUTINE LOAD abcd.routine_load_ods_wls_user_suggestion_reply_c1 ON ods_wls_user_suggestion_reply
=== 其中routine_load_ods_wls_user_suggestion_reply_c1就是在不同database下(abc和abcd)同名的routine load。一些业务信息被xxx取代,看起来错误和这些没有关系。
=== 看到这个错误后,重建了routine load。(先stop abc和abcd下的routine load,然后改了名字去重建)
=== 确认routine load本身能正常工作后,发现fe的checkpoint线程一直会打印相同的错误,比如
ERROR (leaderCheckpointer|124) [CreateRoutineLoadStmt.getLoadDesc():368] error happens when parsing create/alter routine load stmt: create routine load abc.routine_load_ods_wls_user_suggestion_reply_c1 on ods_wls_user_suggestion_reply columns (xxx,xxx, cdate = date(create_time) ),where cdate>= DATE(CURRENT_TIMESTAMP() + INTERVAL -7 DAY) PROPERTIES ( “format”=“json”, “desired_concurrent_number” = “1”, “json_root”="$.data", “strip_outer_array” =“true”, “max_batch_interval” = “5”, “max_error_number”=“0” ) FROM KAFKA ( “kafka_broker_list”= “xxx”, “property.group.id” = “xxxxx”, “property.kafka_default_offsets” = “OFFSET_BEGINNING” );
2023-04-01 03:27:49,437 ERROR (leaderCheckpointer|124) [CreateRoutineLoadStmt.getLoadDesc():368] error happens when parsing create/alter routine load stmt: CREATE ROUTINE LOAD abcd.routine_lods_ods_wls_user_suggestion_c1 ON ods_wls_user_suggestion
==== 目前Fe leader在2天内没有变更,上述错误一直存在,确认image本身的创建和成功创建日志,看起来是正常的。
2023-04-03 11:46:18,351 INFO (leaderCheckpointer|124) [Checkpoint.replayAndGenerateGlobalStateMgrImage():197] begin to generate new image: image.19170484
2023-04-03 11:46:33,847 INFO (leaderCheckpointer|124) [Checkpoint.replayAndGenerateGlobalStateMgrImage():210] checkpoint finished save image.19170484