be节点报 java.net.ConnectException: 拒绝连接 (Connection refused)

【详述】be节点报 java.net.ConnectException: 拒绝连接 (Connection refused)
【背景】创建表报be节点需要3个,show backends 查看be节点信息,出现 java.net.ConnectException: 拒绝连接
| 10002 | default_cluster | 192.168.2.231 | 9051 | 9061 | 8043 | 8061 | 2022-10-27 13:23:10 | 2022-12-01 13:42:54 | false | false | false | 930 | 95.806 GB | 237.442 GB | 849.585 GB | 72.05 % | 72.05 % | java.net.ConnectException: 拒绝连接 (Connection refused) | 2.2.2-e27e2aa | {“lastSuccessReportTabletsTime”:“2022-12-01 13:42:05”} | 333.248 GB | 28.75 % | 32 |
| 10003 | default_cluster | 192.168.2.232 | 9051 | 9061 | 8043 | 8061 | 2022-10-27 16:39:38 | 2022-12-01 13:42:54 | false | false | false | 773 | 49.683 GB | 211.333 GB | 849.585 GB | 75.13 % | 75.13 % | java.net.ConnectException: 拒绝连接 (Connection refused) | 2.2.2-e27e2aa | {“lastSuccessReportTabletsTime”:“2022-12-01 13:42:11”} | 261.016 GB | 19.03 % | 32 |
| 10004 | default_cluster | 192.168.2.233 | 9051 | 9061 | 8043 | 8061 | 2022-10-27 16:39:43 | 2022-12-01 15:35:50 | true | false | false | 928 | 103.703 GB | 286.651 GB | 849.585 GB | 66.26 % | 66.26 % | | 2.2.2-e27e2aa | {“lastSuccessReportTabletsTime”:“2022-12-01 15:35:28”} | 390.354 GB | 26.57 % | 32 |
| 10005 | default_cluster | 192.168.2.220 | 9051 | 9061 | 8043 | 8061 | 2022-10-27 16:39:43 | 2022-12-01 15:35:50 | true | false | false | 922 | 89.190 GB | 442.851 GB | 1.037 TB | 58.29 % | 58.29 % | | 2.2.2-e27e2aa | {“lastSuccessReportTabletsTime”:“2022-12-01 15:34:55”} | 532.041 GB | 16.76 % | 32 |
【业务影响】
【StarRocks版本】例如:2.2.2
【集群规模】2fe+4be
【附件】
be.out
start time: 2022年 10月 26日 星期三 16:05:46 CST
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1026 16:05:47.638027 238373 options.cpp:69] path can not be canonicalized. may be not exist. path=/home/apps/StarRocks-2.2.2/be/be/storage
W1026 16:05:47.638314 238373 options.cpp:139] fail to parse storage_root_path config. value=[/home/apps/StarRocks-2.2.2/be/be/storage]
F1026 16:05:47.638329 238373 starrocks_main.cpp:156] parse config storage path failed, path=/home/apps/StarRocks-2.2.2/be/be/storage
*** Check failure stack trace: ***
start time: 2022年 10月 26日 星期三 17:58:10 CST
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1026 17:58:11.369169 75279 utils.cpp:90] Fail to get master client from cache. host= port=0 code=THRIFT_RPC_ERROR
W1026 17:58:11.369247 75279 task_worker_pool.cpp:1355] Fail to report task to :0, err=-1
start time: 2022年 10月 27日 星期四 11:20:08 CST
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1027 11:20:10.563994 100676 utils.cpp:90] Fail to get master client from cache. host= port=0 code=THRIFT_RPC_ERROR
W1027 11:20:10.564064 100676 task_worker_pool.cpp:1355] Fail to report task to :0, err=-1
start time: 2022年 10月 27日 星期四 13:22:26 CST
start time: 2022年 10月 27日 星期四 13:23:06 CST
*** Check failure stack trace: ***
*** Check failure stack trace: ***
*** Check failure stack trace: ***
@ 0x3ca312d google::LogMessage::Fail()
@ 0x3ca312d google::LogMessage::Fail()
@ 0x3ca312d google::LogMessage::Fail()
@ 0x3ca559f google::LogMessage::SendToLog()
@ 0x3ca559f google::LogMessage::SendToLog()
@ 0x3ca559f google::LogMessage::SendToLog()
@ 0x3ca2c7e google::LogMessage::Flush()
@ 0x3ca2c7e google::LogMessage::Flush()
@ 0x3ca2c7e google::LogMessage::Flush()
@ 0x3ca5ba9 google::LogMessageFatal::~LogMessageFatal()
@ 0x3ca5ba9 google::LogMessageFatal::~LogMessageFatal()
@ 0x3ca5ba9 google::LogMessageFatal::~LogMessageFatal()
@ 0x19811c9 starrocks::TabletMeta::_save_meta()
@ 0x19811c9 starrocks::TabletMeta::_save_meta()
@ 0x19811c9 starrocks::TabletMeta::_save_meta()
@ 0x198127d starrocks::TabletMeta::save_meta()
@ 0x198127d starrocks::TabletMeta::save_meta()
@ 0x198127d starrocks::TabletMeta::save_meta()
@ 0x195b22b starrocks::Tablet::save_meta()
@ 0x195b22b starrocks::Tablet::save_meta()
@ 0x195b22b starrocks::Tablet::save_meta()
@ 0x1973cc8 starrocks::TabletManager::_add_tablet_unlocked()
@ 0x1973cc8 starrocks::TabletManager::_add_tablet_unlocked()
@ 0x1973cc8 starrocks::TabletManager::_add_tablet_unlocked()
@ 0x19793c6 starrocks::TabletManager::_internal_create_tablet_unlocked()
@ 0x19793c6 starrocks::TabletManager::_internal_create_tablet_unlocked()
@ 0x19793c6 starrocks::TabletManager::_internal_create_tablet_unlocked()
@ 0x1979b9e starrocks::TabletManager::create_tablet()
@ 0x1979b9e starrocks::TabletManager::create_tablet()
@ 0x1979b9e starrocks::TabletManager::create_tablet()
@ 0x194886f starrocks::StorageEngine::create_tablet()
@ 0x194886f starrocks::StorageEngine::create_tablet()
@ 0x194886f starrocks::StorageEngine::create_tablet()
@ 0x22bff9b starrocks::TaskWorkerPool::_create_tablet_worker_thread_callback()
@ 0x22bff9b starrocks::TaskWorkerPool::_create_tablet_worker_thread_callback()
@ 0x22bff9b starrocks::TaskWorkerPool::_create_tablet_worker_thread_callback()
@ 0x56e7590 execute_native_thread_routine
@ 0x56e7590 execute_native_thread_routine
@ 0x56e7590 execute_native_thread_routine
@ 0x7ff3cc1eeea5 start_thread
@ 0x7ff3cc1eeea5 start_thread
@ 0x7ff3cc1eeea5 start_thread
@ 0x7ff3cb80996d __clone
@ 0x7ff3cb80996d __clone
@ 0x7ff3cb80996d __clone
@ (nil) (unknown)
@ (nil) (unknown)
@ (nil) (unknown)

用命令启动该be节点
ss -antpl|grep -E ‘9051|9061|8043|8061’
查不到相关端口

show proc “/backends”;
查看一下BE的状态。
如果BE关机状态,启动一下吧。
把be.warn和be.out的日志提供一下吧

show proc “/backends”;出现的拒绝连接
| 10002 | default_cluster | 192.168.2.231 | 9051 | 9061 | 8043 | 8061 | 2022-10-27 13:23:10 | 2022-12-01 13:42:54 | false | false | false | 930 | 95.806 GB | 237.442 GB | 849.585 GB | 72.05 % | 72.05 % | java.net.ConnectException: 拒绝连接 (Connection refused) | 2.2.2-e27e2aa | {“lastSuccessReportTabletsTime”:“2022-12-01 13:42:05”} | 333.248 GB | 28.75 % | 32 |
| 10003 | default_cluster | 192.168.2.232 | 9051 | 9061 | 8043 | 8061 | 2022-10-27 16:39:38 | 2022-12-01 13:42:54 | false | false | false | 773 | 49.683 GB | 211.333 GB | 849.585 GB | 75.13 % | 75.13 % | java.net.ConnectException: 拒绝连接 (Connection refused) | 2.2.2-e27e2aa | {“lastSuccessReportTabletsTime”:“2022-12-01 13:42:11”} | 261.016 GB | 19.03 % | 32 |
| 10004 | default_cluster | 192.168.2.233 | 9051 | 9061 | 8043 | 8061 | 2022-10-27 16:39:43 | 2022-12-01 15:35:50 | true | false | false | 928 | 103.703 GB | 286.651 GB | 849.585 GB | 66.26 % | 66.26 % | | 2.2.2-e27e2aa | {“lastSuccessReportTabletsTime”:“2022-12-01 15:35:28”} | 390.354 GB | 26.57 % | 32 |
| 10005 | default_cluster | 192.168.2.220 | 9051 | 9061 | 8043 | 8061 | 2022-10-27 16:39:43 | 2022-12-01 15:35:50 | true | false | false | 922 | 89.190 GB | 442.851 GB | 1.037 TB | 58.29 % | 58.29 % | | 2.2.2-e27e2aa | {“lastSuccessReportTabletsTime”:“2022-12-01 15:34:55”} | 532.041 GB | 16.76 % | 32 |

W1201 13:42:14.610126 218130 compaction.cpp:106] fail to do base compaction. res=Internal error: reader get_next error: Not found: /home/apps/StarRocks-2.2.2/be/storage/data/441/62115/1364751263/02000000026971e60f48760ca1fe1f6d3d65e1b39cba22b6_0.dat: No such file or directory
/root/starrocks/be/src/storage/fs/file_block_manager.cpp:404 value_or_err_L404
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:315 _opts.block_mgr->open_block(_segment->file_name(), &_rblock)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:630 _init()
/root/starrocks/be/src/storage/vectorized/merge_iterator.cpp:141 fill(i)
/root/starrocks/be/src/storage/vectorized/merge_iterator.cpp:185 init()
/root/starrocks/be/src/storage/vectorized/tablet_reader.cpp:92 _collect_iter->get_next(chunk, source_masks), tablet=62115, output_version=0-25606
W1201 13:42:14.610157 218130 storage_engine.cpp:656] failed to init vectorized base compaction. res=Internal error: reader get_next error: Not found: /home/apps/StarRocks-2.2.2/be/storage/data/441/62115/1364751263/02000000026971e60f48760ca1fe1f6d3d65e1b39cba22b6_0.dat: No such file or directory
/root/starrocks/be/src/storage/fs/file_block_manager.cpp:404 value_or_err_L404
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:315 _opts.block_mgr->open_block(_segment->file_name(), &_rblock)
/root/starrocks/be/src/storage/rowset/vectorized/segment_iterator.cpp:630 _init()
/root/starrocks/be/src/storage/vectorized/merge_iterator.cpp:141 fill(i)
/root/starrocks/be/src/storage/vectorized/merge_iterator.cpp:185 init()
/root/starrocks/be/src/storage/vectorized/tablet_reader.cpp:92 _collect_iter->get_next(chunk, source_masks)
/root/starrocks/be/src/storage/vectorized/base_compaction.cpp:38 do_compaction(), table=62115.1364751263.4d485481fd2f88bc-917dbaee94c968b2
W1201 13:42:58.365137 218139 kv_store.cpp:165] IO error: No such file or directory: While open a file for appending: /home/apps/StarRocks-2.2.2/be/storage/meta/003548.log: No such file or directory
W1201 13:42:58.365450 218138 kv_store.cpp:165] IO error: No such file or directory: While open a file for appending: /home/apps/StarRocks-2.2.2/be/storage/meta/003548.log: No such file or directory
F1201 13:42:58.365469 218139 tablet_meta.cpp:396] fail to save tablet meta:IO error: IO error: No such file or directory: While open a file for appending: /home/apps/StarRocks-2.2.2/be/storage/meta/003548.log: No such file or directory. tablet_id=105197, schema_hash=1150333759
F1201 13:42:58.365844 218138 tablet_meta.cpp:396] fail to save tablet meta:IO error: IO error: No such file or directory: While open a file for appending: /home/apps/StarRocks-2.2.2/be/storage/meta/003548.log: No such file or directory. tablet_id=105189, schema_hash=1150333759
W1201 13:42:58.365954 218137 kv_store.cpp:165] IO error: No such file or directory: While open a file for appending: /home/apps/StarRocks-2.2.2/be/storage/meta/003548.log: No such file or directory
F1201 13:42:58.365844 218138 tablet_meta.cpp:396] fail to save tablet meta:IO error: IO error: No such file or directory: While open a file for appending: /home/apps/StarRocks-2.2.2/be/storage/meta/003548.log: No such file or directory. tablet_id=105189, schema_hash=1150333759F1201 13:42:58.469576 218137 tablet_meta.cpp:396] fail to save tablet meta:IO error: IO error: No such file or directory: While open a file for appending: /home/apps/StarRocks-2.2.2/be/storage/meta/003548.log: No such file or directory. tablet_id=105193, schema_hash=1150333759

启动后,进程是有的,但ss -antpl|grep -E ‘9051|9061|8043|8061’
查不到相关端口

4个BE 有2个BE出问题了。看报错 是不是把数据目录删了?

前面storage_root_path没有设置,是默认的位置,后面设置并迁移了storage文件夹,报的还是原来的路径,所有be节点文件都改了

具体的步骤能发一下吗?

先关掉所有节点,然后改所有节点的be.conf文件 中加入 storage_root_path配置,然后把be下的 storage文件夹移动到配置的路径,然后启动所有节点

操作的时候FE没有关吗?

2个fe 4个be都重启了 6个的be文件都改了

要把FE,BE关掉 才能修改 storage_root_path 然后再启动

是都 stop了说错了 关闭

有2个BE可以 2个BE不行,是不是有什么不同之处。现在集群能正常使用吗?

配置都是一样的,我把storage文件夹删了就好了,估计是移动文件出现什么损失把

4BE 还有丢了2个BE,如果表是3副本,那应该还有一个副本,把storage_root_path 目录清空,他会从其他BE拉数据恢复。