notify new FE type transfer: UNKNOWN

【详述】
重启FE节点,3个FE节点。重启后其中有一个节点启动不起来。报错:notify new FE type transfer: UNKNOWN
【背景】起不来节点重算了2次,没有成功就关闭了此节点FE。
【业务影响】
【StarRocks版本】版本:Server version: 5.1.0 StarRocks version 1.19.3
【集群规模】例如:3fe(1 follower+2observer)+10be(fe与be混部)
【附件】

  • fe.warn.log/be.warn.log/相应截图
    mysql> SHOW PROC ‘/frontends’;
    ±-------------------------------±------------±--------------±------------±---------±----------±--------±---------±---------±-----------±-----±------±------------------±--------------±---------±--------------+
    | Name | IP | HostName | EditLogPort | HttpPort | QueryPort | RpcPort | Role | IsMaster | ClusterId | Join | Alive | ReplayedJournalId | LastHeartbeat | IsHelper | ErrMsg |
    ±-------------------------------±------------±--------------±------------±---------±----------±--------±---------±---------±-----------±-----±------±------------------±--------------±---------±--------------+
    | xxx.xx.8.12_9010_1632279278328 | xxx.xx.8.12 | pdorisanebd02 | 9010 | 8030 | 9030 | 9020 | FOLLOWER | false | 1202495291 | true | true | 290917118 | NULL | true | |
    | xxx.xx.8.11_9010_1632279142597 | xxx.xx.8.11 | pdorisanebd01 | 9010 | 8030 | 9030 | 9020 | FOLLOWER | false | 1202495291 | true | false | 290909581 | NULL | true | Unknown error |
    | xxx.xx.8.13_9010_1645852054447 | xxx.xx.8.13 | pdorisanebd03 | 9010 | 8030 | 9030 | 9020 | FOLLOWER | true | 1202495291 | true | true | 290917091 | NULL | true | |
    ±-------------------------------±------------±--------------±------------±---------±----------±--------±---------±---------±-----------±-----±------±------------------±--------------±---------±--------------+

mysql> SHOW PROC ‘/backends’;
±----------±----------------±------------±--------------±--------------±-------±---------±---------±--------------±--------------±------±---------------------±----------------------±----------±-----------------±--------------±--------------±--------±---------------±-------±--------±---------------------------------------±------------------±------------+
| BackendId | Cluster | IP | HostName | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime | LastHeartbeat | Alive | SystemDecommissioned | ClusterDecommissioned | TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct | MaxDiskUsedPct | ErrMsg | Version | Status | DataTotalCapacity | DataUsedPct |
±----------±----------------±------------±--------------±--------------±-------±---------±---------±--------------±--------------±------±---------------------±----------------------±----------±-----------------±--------------±--------------±--------±---------------±-------±--------±---------------------------------------±------------------±------------+
| 12089 | default_cluster | xxx.xx.8.14 | pdorisanebd04 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 174488 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
| 12090 | default_cluster | xxx.xx.8.15 | pdorisanebd05 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 175302 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
| 12091 | default_cluster | xxx.xx.8.16 | pdorisanebd06 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 175309 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
| 12092 | default_cluster | xxx.xx.8.17 | pdorisanebd07 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 174532 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
| 12093 | default_cluster | xxx.xx.8.18 | pdorisanebd08 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 175267 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
| 12094 | default_cluster | xxx.xx.8.19 | pdorisanebd09 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 175269 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
| 12095 | default_cluster | xxx.xx.8.20 | pdorisanebd10 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 175275 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
| 1028353 | default_cluster | xxx.xx.8.21 | pdorisanebd11 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 169315 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
| 1028464 | default_cluster | xxx.xx.8.22 | pdorisanebd12 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 170463 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
| 1028466 | default_cluster | xxx.xx.8.23 | pdorisanebd13 | 9050 | 9060 | 8040 | 8060 | NULL | NULL | true | false | false | 171415 | .000 | 3.824 TB | 4.029 TB | 5.09 % | 5.09 % | | | {“lastSuccessReportTabletsTime”:“N/A”} | 3.824 TB | 0.00 % |
±----------±----------------±------------±--------------±--------------±-------±---------±---------±--------------±--------------±------±---------------------±----------------------±----------±-----------------±--------------±--------------±--------±---------------±-------±--------±---------------------------------------±------------------±------------+
10 rows in set (0.03 sec)

fe.log

2022-08-30 15:12:45,101 WARN (UNKNOWN xxx.xx.8.11_9010_1632279142597(-1)|1) [Catalog.notifyNewFETypeTransfer():2314] notify new FE type transfer: UNKNOWN
2022-08-30 15:16:57,083 INFO (main|1) [StarRocksFE.start():104] StarRocks FE starting…
2022-08-30 15:16:57,088 INFO (main|1) [FrontendOptions.analyzePriorityCidrs():123] configured prior_cidrs value: xxx.xx.8.0/24;
2022-08-30 15:16:57,092 INFO (main|1) [FrontendOptions.init():91] local address: /xxx.xx.8.10.
2022-08-30 15:16:57,290 INFO (main|1) [Catalog.getHelperNodes():1173] get helper nodes: [xxx.xx.8.10:9010]
2022-08-30 15:16:57,304 INFO (main|1) [Catalog.getClusterIdAndRole():1060] finished to get cluster id: 1202495291, role: FOLLOWER and node name: xxx.xx.8.11_9010_1632279142597
2022-08-30 15:16:57,314 INFO (main|1) [Catalog.loadImage():1468] start load image from /doris/fe_9010/doris-meta/image/image.290845373. is ckpt: false
2022-08-30 15:16:57,314 INFO (main|1) [Catalog.loadHeader():1611] finished replay header from image
2022-08-30 15:16:57,315 INFO (main|1) [Catalog.loadMasterInfo():1622] finished replay masterInfo from image
2022-08-30 15:16:59,631 INFO (main|1) [Catalog.loadDb():1665] finished replay databases from image
2022-08-30 15:17:01,706 INFO (main|1) [Catalog.loadLoadJob():1696] finished replay loadJob from image
2022-08-30 15:17:01,706 INFO (main|1) [Catalog.loadAlterJob():1728] finished replay alterJob from image
2022-08-30 15:17:01,885 INFO (main|1) [Catalog.loadRecycleBin():1870] finished replay recycleBin from image
2022-08-30 15:17:01,901 INFO (main|1) [Catalog.loadGlobalVariable():2180] finished replay globalVariable from image
2022-08-30 15:17:01,904 INFO (main|1) [Catalog.loadCluster():6589] finished replay cluster from image
2022-08-30 15:17:01,906 INFO (main|1) [Catalog.loadBrokers():6786] finished replay brokerMgr from image
2022-08-30 15:17:01,908 INFO (main|1) [Catalog.loadResources():1902] finished replay resources from image
2022-08-30 15:17:01,908 INFO (main|1) [Catalog.loadExportJob():1713] finished replay exportJob from image
2022-08-30 15:17:01,908 INFO (main|1) [Catalog.loadBackupHandler():1815] finished replay backupHandler from image
2022-08-30 15:17:01,913 INFO (main|1) [Catalog.loadAuth():1842] finished replay auth from image
2022-08-30 15:17:14,948 INFO (main|1) [Catalog.loadTransactionState():1851] finished replay transactionState from image
2022-08-30 15:17:14,950 INFO (main|1) [Catalog.loadColocateTableIndex():1878] finished replay colocateTableIndex from image
2022-08-30 15:17:14,997 INFO (main|1) [TxnStateCallbackFactory.addCallback():41] add callback of txn state : 872865. current callback size: 1
2022-08-30 15:17:15,012 INFO (main|1) [TxnStateCallbackFactory.addCallback():41] add callback of txn state : 280795. current callback size: 2
2022-08-30 15:17:15,013 INFO (main|1) [Catalog.loadRoutineLoadJobs():1886] finished replay routineLoadJobs from image
2022-08-30 15:17:19,387 INFO (main|1) [Catalog.loadLoadJobsV2():1894] finished replay loadJobsV2 from image
2022-08-30 15:17:19,388 INFO (main|1) [Catalog.loadSmallFiles():1910] finished replay smallFiles from image
2022-08-30 15:17:19,388 INFO (main|1) [Catalog.loadPlugins():7262] finished replay plugins from image
2022-08-30 15:17:19,451 INFO (main|1) [Catalog.loadDeleteHandler():1828] finished replay deleteHandler from image
2022-08-30 15:17:19,455 INFO (main|1) [Catalog.loadAnalyze():7298] finished replay analyze job from image
2022-08-30 15:17:19,455 INFO (main|1) [Catalog.loadImage():1518] finished to load image in 22141 ms
2022-08-30 15:17:20,279 INFO (UNKNOWN xxx.xx.8.11_9010_1632279142597(-1)|1) [BDBEnvironment.setup():168] add helper[xxx.xx.8.10:9010] as ReplicationGroupAdmin
2022-08-30 15:17:20,284 WARN (UNKNOWN xxx.xx.8.11_9010_1632279142597(-1)|1) [Catalog.notifyNewFETypeTransfer():2314] notify new FE type transfer: UNKNOWN

fe.out
[2022-08-30 15:12:45] notify new FE type transfer: UNKNOWN
com.sleepycat.je.EnvironmentFailureException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.3.7) xxx.xx.8.11_9010_1632279142597(1):/doris/fe_9010/doris-meta/bdb Feeder: xxx.xx.8.13_9010_1645852054447(5). Conflicting hostnames for replica id: xxx.xx.8.11_9010_1632279142597(1) Feeder thinks it is: xxx.xx.8.11 Replica is configured to use: xxx.xx.8.10 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:827)
at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:781)
at com.sleepycat.je.log.LogManager.getLogEntryHandleFileNotFound(LogManager.java:932)
at com.sleepycat.je.dbi.DiskOrderedScanner.fetchEntry(DiskOrderedScanner.java:2062)
at com.sleepycat.je.dbi.DiskOrderedScanner.fetchAndProcessBINs(DiskOrderedScanner.java:1634)
at com.sleepycat.je.dbi.DiskOrderedScanner.scanSerial(DiskOrderedScanner.java:783)
at com.sleepycat.je.dbi.DiskOrderedScanner.scan(DiskOrderedScanner.java:703)
at com.sleepycat.je.dbi.DatabaseImpl.count(DatabaseImpl.java:2266)
at com.sleepycat.je.Database.count(Database.java:1910)
at com.starrocks.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:223)
at com.starrocks.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:304)
at com.starrocks.persist.EditLog.open(EditLog.java:839)
at com.starrocks.catalog.Catalog.initialize(Catalog.java:837)
at com.starrocks.StarRocksFE.start(StarRocksFE.java:110)
at com.starrocks.StarRocksFE.main(StarRocksFE.java:65)
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.3.7) xxx.xx.8.11_9010_1632279142597(1):/doris/fe_9010/doris-meta/bdb Feeder: xxx.xx.8.13_9010_1645852054447(5). Conflicting hostnames for replica id: xxx.xx.8.11_9010_1632279142597(1) Feeder thinks it is: xxx.xx.8.11 Replica is configured to use: xxx.xx.8.10 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed.
at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:334)
at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:259)
at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:691)
at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:474)
at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:409)
at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1873)
using java version 8
-Xmx65536m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:/doris/fe_9010/log/fe.gc.log.20220830-151655
Tue Aug 30 15:16:56 CST 2022

<?xml version="1.0" encoding="utf-8"?>





%d{yyyy-MM-dd HH:mm:ss,SSS} %p (%t|%tid) [%C{1}.%M():%L] %m%n














%d{yyyy-MM-dd HH:mm:ss,SSS} %p (%t|%tid) [%C{1}.%M():%L] %m%n














%d{yyyy-MM-dd HH:mm:ss,SSS} [%c{1}] %m%n














%d{yyyy-MM-dd HH:mm:ss,SSS} [%c{1}] %m%n






































SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/doris/fe_9010/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/doris/fe_9010/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[2022-08-30 15:17:20] notify new FE type transfer: UNKNOWN
com.sleepycat.je.EnvironmentFailureException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.3.7) xxx.xx.8.11_9010_1632279142597(1):/doris/fe_9010/doris-meta/bdb Feeder: xxx.xx.8.13_9010_1645852054447(5). Conflicting hostnames for replica id: xxx.xx.8.11_9010_1632279142597(1) Feeder thinks it is: xxx.xx.8.11 Replica is configured to use: xxx.xx.8.10 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:827)
at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:781)
at com.sleepycat.je.log.LogManager.getLogEntryHandleFileNotFound(LogManager.java:932)
at com.sleepycat.je.dbi.DiskOrderedScanner.fetchEntry(DiskOrderedScanner.java:2062)
at com.sleepycat.je.dbi.DiskOrderedScanner.fetchAndProcessBINs(DiskOrderedScanner.java:1634)
at com.sleepycat.je.dbi.DiskOrderedScanner.scanSerial(DiskOrderedScanner.java:783)
at com.sleepycat.je.dbi.DiskOrderedScanner.scan(DiskOrderedScanner.java:703)
at com.sleepycat.je.dbi.DatabaseImpl.count(DatabaseImpl.java:2266)
at com.sleepycat.je.Database.count(Database.java:1910)
at com.starrocks.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:223)
at com.starrocks.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:304)
at com.starrocks.persist.EditLog.open(EditLog.java:839)
at com.starrocks.catalog.Catalog.initialize(Catalog.java:837)
at com.starrocks.StarRocksFE.start(StarRocksFE.java:110)
at com.starrocks.StarRocksFE.main(StarRocksFE.java:65)

fe.warn.log

2022-08-30T15:17:17.933+0800: 21.453: [CMS-concurrent-sweep: 1.756/3.093 secs] [Times: user=34.51 sys=1.59, real=3.09 secs]
2022-08-30T15:17:17.937+0800: 21.457: [CMS-concurrent-reset-start]
2022-08-30T15:17:17.981+0800: 21.501: [GC (Allocation Failure) 2022-08-30T15:17:17.981+0800: 21.501: [ParNew: 629120K->69888K(629120K), 0.3227240 secs] 6256507K->6005379K(10008100K), 0.3228394 secs] [Times: user=12.49 sys=0.10, real=0.32 secs]
2022-08-30T15:17:18.374+0800: 21.894: [CMS-concurrent-reset: 0.113/0.437 secs] [Times: user=12.72 sys=0.10, real=0.44 secs]
2022-08-30T15:17:18.654+0800: 22.174: [GC (Allocation Failure) 2022-08-30T15:17:18.654+0800: 22.174: [ParNew: 629120K->69888K(629120K), 0.3013505 secs] 6564611K->6313481K(10008100K), 0.3014510 secs] [Times: user=11.66 sys=0.12, real=0.30 secs]
2022-08-30T15:17:19.456+0800: 22.976: [GC (Allocation Failure) 2022-08-30T15:17:19.456+0800: 22.976: [ParNew: 629120K->69888K(629120K), 0.3217757 secs] 6872713K->6628710K(10008100K), 0.3218803 secs] [Times: user=12.59 sys=0.11, real=0.33 secs]
2022-08-30T15:17:19.778+0800: 23.298: [GC (CMS Initial Mark) [1 CMS-initial-mark: 6558822K(9378980K)] 6628710K(10008100K), 0.0273049 secs] [Times: user=0.08 sys=0.00, real=0.02 secs]
2022-08-30T15:17:19.805+0800: 23.325: [CMS-concurrent-mark-start]
Heap
par new generation total 629120K, used 293146K [0x00007f8028000000, 0x00007f8052aa0000, 0x00007f813b990000)
eden space 559232K, 39% used [0x00007f8028000000, 0x00007f8035a068e8, 0x00007f804a220000)
from space 69888K, 100% used [0x00007f804a220000, 0x00007f804e660000, 0x00007f804e660000)
to space 69888K, 0% used [0x00007f804e660000, 0x00007f804e660000, 0x00007f8052aa0000)
concurrent mark-sweep generation total 9378980K, used 6558822K [0x00007f813b990000, 0x00007f83780b9000, 0x00007f9028000000)
Metaspace used 31802K, capacity 32380K, committed 32768K, reserved 32768K

请帮忙看一下什么原因导致的。