docker 手动部署3.2.7集群创建表和查询表超时

【详述】由于部署环境对镜像有严格检查,官方的镜像扫描不通过,所以我就自己基于扫描通过的jdk11镜像把官方的starrocks-3.2.7包打进去形成一个自定义的镜像包,然后分别通过环境变量控制启动1个fe 和 2个cn 节点,从监控图看节点都已加入集群,但是外部使用 dbeaver 客户端连接正常,但是查询表和创建表时一直超时,容器里面创建表也超时.
【背景】正常部署
【业务影响】
【是否存算分离】
【StarRocks版本】例如:存算分离 3.2.7
【集群规模】例如:1fe+2cn
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】
【附件】






2024-06-18 09:06:54.669Z WARN (starrocks-mysql-nio-pool-12|263) [ExecuteExceptionHandler.handleRpcException():96] Query cancelled by crash of backends or RpcException, [QueryId=1ba4fc13-2d52-11ef-a652-ce855df501a7] [SQL=/* ApplicationName=DBeaver 24.0.4 - Metadata */ SELECT * FROM information_schema.TABLES t
WHERE
t.TABLE_SCHEMA = ‘information_schema’
AND t.TABLE_NAME = ‘CHECK_CONSTRAINTS’] [Plan=PLAN FRAGMENT 0(F00)
Output Exprs:1: TABLE_CATALOG | 2: TABLE_SCHEMA | 3: TABLE_NAME | 4: TABLE_TYPE | 5: ENGINE | 6: VERSION | 7: ROW_FORMAT | 8: TABLE_ROWS | 9: AVG_ROW_LENGTH | 10: DATA_LENGTH | 11: MAX_DATA_LENGTH | 12: INDEX_LENGTH | 13: DATA_FREE | 14: AUTO_INCREMENT | 15: CREATE_TIME | 16: UPDATE_TIME | 17: CHECK_TIME | 18: TABLE_COLLATION | 19: CHECKSUM | 20: CREATE_OPTIONS | 21: TABLE_COMMENT
Input Partition: UNPARTITIONED
RESULT SINK

0:SCAN SCHEMA
cardinality: -1
column statistics:
]
com.starrocks.rpc.RpcException: Couldn’t open transport for starrocks-fe1-0.starrocks-fe.srdocker.svc.cluster.local:9020 (Could not resolve host for client socket.), host: unknown
at com.starrocks.qe.DefaultCoordinator.getNext(DefaultCoordinator.java:779) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1100) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:606) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:413) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:608) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:915) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
2024-06-18 09:06:54.669Z WARN (starrocks-mysql-nio-pool-12|263) [StmtExecutor.execute():621] retry 1 times. stmt: /* ApplicationName=DBeaver 24.0.4 - Metadata */ SELECT * FROM information_schema.TABLES t
WHERE
t.TABLE_SCHEMA = ‘information_schema’
AND t.TABLE_NAME = ‘CHECK_CONSTRAINTS’
2024-06-18 09:06:54.677Z WARN (starrocks-mysql-nio-pool-12|263) [DefaultCoordinator.getNext():756] get next fail, need cancel. status errorCode THRIFT_RPC_ERROR Couldn’t open transport for starrocks-fe1-0.starrocks-fe.srdocker.svc.cluster.local:9020 (Could not resolve host for client socket.), query id: 5ac52152-18bd-4ead-b012-4640ad9bdd82
2024-06-18 09:06:54.677Z WARN (starrocks-mysql-nio-pool-12|263) [DefaultCoordinator.updateStatus():731] one instance report fail throw updateStatus(), need cancel. job id: -1, query id: 5ac52152-18bd-4ead-b012-4640ad9bdd82, instance id: NaN
2024-06-18 09:06:54.678Z WARN (starrocks-mysql-nio-pool-12|263) [StmtExecutor.execute():706] execute IOException
com.starrocks.rpc.RpcException: Couldn’t open transport for starrocks-fe1-0.starrocks-fe.srdocker.svc.cluster.local:9020 (Could not resolve host for client socket.), host: unknown
at com.starrocks.qe.DefaultCoordinator.getNext(DefaultCoordinator.java:779) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1100) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:606) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:413) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:608) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:915) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]

starrocks-fe1-0.starrocks-fe.srdocker.svc.cluster.local:9020 感觉这个域名好奇怪啊,不是 Operator 部署出来的结果吧?

不是, 是通过上面自己打包的镜像,然后手动在k8s部署的 [StatefulSet] 类型服务,相当于是主机部署一样手动配置集群,只是环境是k8s启动的节点。 9020 这个端口应该是集群内部访问的吧? 不需要暴漏到k8s环境外部客户端访问吧?

我没有遇到过使用 9020 的情况

经过排查时创建的service 服务有问题导致域名解析失败了