为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
fe连接错误,报错 Could not determine master from helpers,后发现两台fe节点挂掉
【背景】做过哪些操作?
【业务影响】
【是否存算分离】
三台混步,1leader 2follower
【StarRocks版本】3.0.8
【附件】
- fe.log/beINFO/相应截图
fe.log
2024-06-18 00:00:23,768 WARN (thrift-server-pool-20533139|20634633) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330352: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:24,505 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:24,944 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:26,527 WARN (JournalWriter|22188) [BDBJEJournal.rebuildCurrentTransaction():444] transaction is invalid, rebuild the txn with 2 kvs
2024-06-18 00:00:29,508 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:29,945 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:32,845 WARN (es repository|42) [EsRepository.runAfterCatalogReady():104] Thread es repository: Exception happens when fetch index [journal_info] meta data from remote es cluster. Table info: [Table [id=2754732, name=journal_info, type=ELASTICSEARCH]]
2024-06-18 00:00:34,511 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:34,947 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:36,528 ERROR (JournalWriter|22188) [BDBJEJournal.batchWriteCommit():422] failed to commit journal after retried 2 times! txn[] db[CloseSafeDatabase{db=202876799}]
2024-06-18 00:00:38,599 WARN (thrift-server-pool-20533250|20634745) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 20634436, owner name: thrift-server-pool-20532944, owner stack: dump thread: thrift-server-pool-20532944, id: 20634436
2024-06-18 00:00:38,599 WARN (thrift-server-pool-20533250|20634745) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330333: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:38,600 WARN (thrift-server-pool-20533252|20634747) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 20634436, owner name: thrift-server-pool-20532944, owner stack: dump thread: thrift-server-pool-20532944, id: 20634436
2024-06-18 00:00:38,600 WARN (thrift-server-pool-20533252|20634747) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330334: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:38,601 WARN (thrift-server-pool-20533253|20634748) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 20634436, owner name: thrift-server-pool-20532944, owner stack: dump thread: thrift-server-pool-20532944, id: 20634436
2024-06-18 00:00:38,601 WARN (thrift-server-pool-20533251|20634746) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 20634436, owner name: thrift-server-pool-20532944, owner stack: dump thread: thrift-server-pool-20532944, id: 20634436
2024-06-18 00:00:38,601 WARN (thrift-server-pool-20533253|20634748) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330337: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:38,601 WARN (thrift-server-pool-20533251|20634746) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330332: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:38,603 WARN (thrift-server-pool-20533254|20634749) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 20634436, owner name: thrift-server-pool-20532944, owner stack: dump thread: thrift-server-pool-20532944, id: 20634436
2024-06-18 00:00:38,603 WARN (thrift-server-pool-20533254|20634749) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330330: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:39,512 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:39,623 WARN (thrift-server-pool-20533261|20634756) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 20634436, owner name: thrift-server-pool-20532944, owner stack: dump thread: thrift-server-pool-20532944, id: 20634436
2024-06-18 00:00:39,624 WARN (thrift-server-pool-20533261|20634756) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330336: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:39,949 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:40,135 WARN (thrift-server-pool-20533268|20634763) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 20634436, owner name: thrift-server-pool-20532944, owner stack: dump thread: thrift-server-pool-20532944, id: 20634436
2024-06-18 00:00:40,135 WARN (thrift-server-pool-20533267|20634762) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 20634436, owner name: thrift-server-pool-20532944, owner stack: dump thread: thrift-server-pool-20532944, id: 20634436
2024-06-18 00:00:40,135 WARN (thrift-server-pool-20533268|20634763) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330339: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:40,136 WARN (thrift-server-pool-20533267|20634762) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330338: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:41,529 WARN (JournalWriter|22188) [BDBJEJournal.rebuildCurrentTransaction():444] transaction is invalid, rebuild the txn with 2 kvs
2024-06-18 00:00:42,182 WARN (thrift-server-pool-20533284|20634779) [Database.logTryLockFailureEvent():175] try db lock failed. type: writeLock, current owner id: 20634436, owner name: thrift-server-pool-20532944, owner stack: dump thread: thrift-server-pool-20532944, id: 20634436
2024-06-18 00:00:42,182 WARN (thrift-server-pool-20533284|20634779) [FrontendServiceImpl.loadTxnCommit():1230] failed to commit txn_id: 84330340: get database write lock timeout, database=fin_test, timeoutMillis=15000
2024-06-18 00:00:43,079 WARN (es repository|42) [EsRepository.runAfterCatalogReady():104] Thread es repository: Exception happens when fetch index [journal_info] meta data from remote es cluster. Table info: [Table [id=2754732, name=journal_info, type=ELASTICSEARCH]]
2024-06-18 00:00:44,514 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:44,951 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:49,516 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:49,953 WARN (starrocks-mysql-nio-pool-114|20538036) [AcceptListener.lambda$handleEvent$1():116] connect processor exception because
2024-06-18 00:00:51,530 ERROR (JournalWriter|22188) [BDBJEJournal.batchWriteCommit():422] failed to commit journal after retried 3 times! txn[] db[CloseSafeDatabase{db=202876799}]
2024-06-18 00:00:51,530 WARN (JournalWriter|22188) [JournalWriter.writeOneBatch():133] failed to commit batch, will abort current 2 journals.
2024-06-18 00:00:51,531 WARN (JournalWriter|22188) [BDBJEJournal.batchWriteAbort():480] failed to abort transaction because no running transaction, will just ignore and return.
2024-06-18 00:00:51,531 ERROR (JournalWriter|22188) [JournalWriter.abortJournalTask():176] failed to commit journal after retried 3 times! txn[] db[CloseSafeDatabase{db=202876799}]
fe.out
WARNING: correlationId:5923464 timeout with bound channel =>[id: 0x7847afe0, L:/10.118.1.183:42162 - R:/10.118.1.180:8060]
[2024-06-18 00:00:51] failed to commit journal after retried 3 times! txn[] db[CloseSafeDatabase{db=202876799}]
using java version 8
-Xmx16384m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:/home/qateadmin/StarRocks-3.1.4/fe/log/fe.gc.log.20240618-102010ll