【StarRocks版本】1.19.0
【集群规模】单fe转为3fe集群
【机器信息】测试环境4C8G
【详述】使用1.19.0开源版本部署fe集群时报错
【背景】单fe+3be架构运行正常,测试fe扩容到3fe集群
操作流程:
1、部署新的fe,配置文件与正在运行的单fe一致。
2、新的fe通过–helper启动,指定正在运行的fe为helper
sh start_fe.sh --helper 192.168.40.81:9010 --daemon
3、查看新fe的日志,
fe.warning.log内容如下:
2021-11-16 10:38:26,686 WARN (main|1) [Catalog.getClusterIdAndRole():925] current node is not added to the group. please add it first. sleep 5 seconds and retry, current helper nodes: [192.168.40.81:9010]
2021-11-16 10:38:31,694 WARN (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1048] failed to get fe node type from helper node: 192.168.40.81:9010. response code: 400
2021-11-16 10:38:31,695 WARN (main|1) [Catalog.getClusterIdAndRole():925] current node is not added to the group. please add it first. sleep 5 seconds and retry, current helper nodes: [192.168.40.81:9010]
2021-11-16 10:38:36,708 WARN (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1048] failed to get fe node type from helper node: 192.168.40.81:9010. response code: 400
2021-11-16 10:38:36,709 WARN (main|1) [Catalog.getClusterIdAndRole():925] current node is not added to the group. please add it first. sleep 5 seconds and retry, current helper nodes: [192.168.40.81:9010]
2021-11-16 10:38:41,720 WARN (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1048] failed to get fe node type from helper node: 192.168.40.81:9010. response code: 400
2021-11-16 10:38:41,721 WARN (main|1) [Catalog.getClusterIdAndRole():925] current node is not added to the group. please add it first. sleep 5 seconds and retry, current helper nodes: [192.168.40.81:9010]
2021-11-16 10:38:46,731 WARN (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1048] failed to get fe node type from helper node: 192.168.40.81:9010. response code: 400
2021-11-16 10:38:46,732 WARN (main|1) [Catalog.getClusterIdAndRole():925] current node is not added to the group. please add it first. sleep 5 seconds and retry, current helper nodes: [192.168.40.81:9010]
2021-11-16 10:38:51,740 WARN (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1048] failed to get fe node type from helper node: 192.168.40.81:9010. response code: 400
2021-11-16 10:38:51,741 WARN (main|1) [Catalog.getClusterIdAndRole():925] current node is not added to the group. please add it first. sleep 5 seconds and retry, current helper nodes: [192.168.40.81:9010]
fe.log内容如下
2021-11-16 10:38:11,184 INFO (main|1) [DorisDbFe.start():102] DorisDb FE starting…
2021-11-16 10:38:11,191 INFO (main|1) [FrontendOptions.analyzePriorityCidrs():121] configured prior_cidrs value: 192.168.40.0/24
2021-11-16 10:38:11,203 INFO (main|1) [FrontendOptions.init():89] local address: /192.168.40.98.
2021-11-16 10:38:11,308 INFO (main|1) [ConsistencyChecker.initWorkTime():106] consistency checker will work from 23:00 to 4:00
2021-11-16 10:38:11,611 INFO (main|1) [Catalog.getHelperNodes():1151] get helper nodes: [192.168.40.81:9010]
2021-11-16 10:38:11,651 WARN (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1048] failed to get fe node type from helper node: 192.168.40.81:9010. response code: 400
2021-11-16 10:38:11,652 WARN (main|1) [Catalog.getClusterIdAndRole():925] current node is not added to the group. please add it first. sleep 5 seconds and retry, current helper nodes: [192.168.40.81:9010]
2021-11-16 10:38:16,661 WARN (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1048] failed to get fe node type from helper node: 192.168.40.81:9010. response code: 400
2021-11-16 10:38:16,662 WARN (main|1) [Catalog.getClusterIdAndRole():925] current node is not added to the group. please add it first. sleep 5 seconds and retry, current helper nodes: [192.168.40.81:9010]
2021-11-16 10:38:21,671 WARN (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1048] failed to get fe node type from helper node: 192.168.40.81:9010. response code: 400
此时新启动的fe进程存在。在进行下面add操作后,fe会自动停止。
4、通过mysql客户端连接到正在运行的原有fe中,执行如下命令:
MySQL [(none)]> SHOW PROC ‘/frontends’\G
*************************** 1. row ***************************
Name: 192.168.40.81_9010_1630033959965
IP: 192.168.40.81
HostName: db-testware-01.novalocal
EditLogPort: 9010
HttpPort: 8030
QueryPort: 9030
RpcPort: 9020
Role: FOLLOWER
IsMaster: true
ClusterId: 720115122
Join: true
Alive: true
ReplayedJournalId: 2118732
LastHeartbeat: 2021-11-16 10:36:32
IsHelper: true
ErrMsg:
1 row in set (0.043 sec)
MySQL [(none)]> alter system add follower “192.168.40.98:9010”;
Query OK, 0 rows affected (0.013 sec)
MySQL [(none)]> SHOW PROC ‘/frontends’\G
*************************** 1. row ***************************
Name: 192.168.40.81_9010_1630033959965
IP: 192.168.40.81
HostName: db-testware-01.novalocal
EditLogPort: 9010
HttpPort: 8030
QueryPort: 9030
RpcPort: 9020
Role: FOLLOWER
IsMaster: true
ClusterId: 720115122
Join: true
Alive: true
ReplayedJournalId: 2118778
LastHeartbeat: 2021-11-16 10:39:03
IsHelper: true
ErrMsg:
*************************** 2. row ***************************
Name: 192.168.40.98_9010_1637030333352
IP: 192.168.40.98
HostName: 192.168.40.98
EditLogPort: 9010
HttpPort: 8030
QueryPort: 0
RpcPort: 0
Role: FOLLOWER
IsMaster: false
ClusterId: 720115122
Join: false
Alive: false
ReplayedJournalId: 0
LastHeartbeat: NULL
IsHelper: true
ErrMsg: got exception
2 rows in set (0.053 sec)
可以看到新增加的fe的join和alive都是false。errmsg为got exception
5、此时查看fe的log
fe.log内容新增如下:
2021-11-16 10:38:51,740 WARN (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1048] failed to get fe node type from helper node: 192
.168.40.81:9010. response code: 400
2021-11-16 10:38:51,741 WARN (main|1) [Catalog.getClusterIdAndRole():925] current node is not added to the group. please add it first.
sleep 5 seconds and retry, current helper nodes: [192.168.40.81:9010]
2021-11-16 10:38:56,757 INFO (main|1) [Catalog.getFeNodeTypeAndNameFromHelpers():1078] get fe node type FOLLOWER, name 192.168.40.98_90
10_1637030333352 from 192.168.40.81:8030
2021-11-16 10:38:57,006 INFO (main|1) [Catalog.getClusterIdAndRole():1022] finished to get cluster id: 720115122, role: FOLLOWER and no
de name: 192.168.40.98_9010_1637030333352
2021-11-16 10:38:57,047 INFO (main|1) [Catalog.loadImage():1482] start load image from /mnt/dorisdb/fe/doris-meta/image/image.2099283.
is ckpt: false
2021-11-16 10:38:57,047 INFO (main|1) [Catalog.loadHeader():1623] finished replay header from image
2021-11-16 10:38:57,051 INFO (main|1) [Catalog.loadMasterInfo():1634] finished replay masterInfo from image
2021-11-16 10:38:57,261 INFO (main|1) [Catalog.loadDb():1677] finished replay databases from image
2021-11-16 10:38:57,275 INFO (main|1) [Catalog.loadLoadJob():1708] finished replay loadJob from image
2021-11-16 10:38:57,297 INFO (main|1) [Catalog.loadAlterJob():1740] finished replay alterJob from image
2021-11-16 10:38:57,297 INFO (main|1) [Catalog.loadRecycleBin():1882] finished replay recycleBin from image
2021-11-16 10:38:57,314 INFO (main|1) [Catalog.loadGlobalVariable():2192] finished replay globalVariable from image
2021-11-16 10:38:57,318 INFO (main|1) [Catalog.loadCluster():6349] finished replay cluster from image
2021-11-16 10:38:57,328 INFO (main|1) [Catalog.loadBrokers():6446] finished replay brokerMgr from image
2021-11-16 10:38:57,330 INFO (main|1) [Catalog.loadResources():1914] finished replay resources from image
2021-11-16 10:38:57,349 INFO (main|1) [Catalog.loadExportJob():1725] finished replay exportJob from image
2021-11-16 10:38:57,354 INFO (main|1) [Catalog.loadBackupHandler():1827] finished replay backupHandler from image
fe.out新增内容如下:
java.io.IOException: failed read PrivTable
at org.apache.doris.mysql.privilege.PrivTable.read(PrivTable.java:223)
at org.apache.doris.mysql.privilege.Auth.readFields(Auth.java:1407)
at org.apache.doris.catalog.Catalog.loadAuth(Catalog.java:1852)
at org.apache.doris.catalog.Catalog.loadImage(Catalog.java:1508)
at org.apache.doris.catalog.Catalog.initialize(Catalog.java:800)
at org.apache.doris.DorisDbFe.start(DorisDbFe.java:108)
at org.apache.doris.DorisDbFe.main(DorisDbFe.java:63)
Caused by: java.lang.ClassNotFoundException: com.starrocks.mysql.privilege.UserPrivTable
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.doris.mysql.privilege.PrivTable.read(PrivTable.java:213)
… 6 more
现在新增加的fe会shutdown,通过ps命令查不到fe进程。
【附件】