REFRESH EXTERNAL TABLE FAILD失败报错

现使用版本为2.2.2

好的,这边排查下

您好,升级到2.2.8也还是有这个问题,请问排查有结果没呢?

收到,正在排查中,有结果第一时间给您反馈

是不是refresh的元数据比较大?可以在执行sql之前设置一下query_timeout set query_timeout=xxx,单位是秒

元数据挺大的,但是之前就是重启fe节点会恢复,fefresh也只要3s左右,但是升级后重启节点都不可以了,现在升级到2.2.8了

那你试下条大query_timeout看看呢?是从什么版本升级过来的?

扩大了10倍设置为了3000,也没有效果,从2.2.2版本升级的

2022-10-27 06:36:15,025 WARN (Thread-188|10376) [FrontendServiceProxy.call():29] call frontend thrift rpc failed, addr: TNetworkAddress(hostname:...228, port:9020), retried: 0
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:455) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:354) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:243) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[libthrift-0.13.0.jar:0.13.0]
at com.starrocks.thrift.FrontendService$Client.recv_refreshTable(FrontendService.java:620) ~[starrocks-fe.jar:?]
at com.starrocks.thrift.FrontendService$Client.refreshTable(FrontendService.java:607) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.lambda$null$4(Catalog.java:6852) ~[starrocks-fe.jar:?]
at com.starrocks.rpc.FrontendServiceProxy.call(FrontendServiceProxy.java:25) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.lambda$refreshOtherFesTable$5(Catalog.java:6849) ~[starrocks-fe.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_202]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_202]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_202]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_202]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_202]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_202]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_202]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_202]
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:125) ~[libthrift-0.13.0.jar:0.13.0]
… 12 more
2022-10-27 06:36:15,030 WARN (Thread-188|10376) [Catalog.lambda$refreshOtherFesTable$5():6855] call fe TNetworkAddress(hostname:10.
.***.228, port:9020) refreshTable rpc method failed
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:455) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:354) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:243) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[libthrift-0.13.0.jar:0.13.0]
at com.starrocks.thrift.FrontendService$Client.recv_refreshTable(FrontendService.java:620) ~[starrocks-fe.jar:?]
at com.starrocks.thrift.FrontendService$Client.refreshTable(FrontendService.java:607) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.lambda$null$4(Catalog.java:6852) ~[starrocks-fe.jar:?]
at com.starrocks.rpc.FrontendServiceProxy.call(FrontendServiceProxy.java:25) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.lambda$refreshOtherFesTable$5(Catalog.java:6849) ~[starrocks-fe.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_202]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_202]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_202]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_202]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_202]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_202]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_202]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_202]
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:125) ~[libthrift-0.13.0.jar:0.13.0]
… 12 more

这是最新的报错

总是228这一台报错吗,如果是的话,refresh的时候打印下jstack jstack -l pid

您好,不一定,之前一段时间都是300这台主机,call frontend thrift rpc failed, addr: TNetworkAddress(hostname:10.10.226.228, port:9020) 这个报错说明啥呢?一般refresh我重新启动对应的报错节点就又可以了。但是经常出现这样的问题。是因为访问rpc端口访问有问题吗? jstack jstack -l pid这个是怎操作呢?

下次遇到问题之后, 在linux环境下,执行jstack -l $pid ,其中$pid是fe的进程号 .这边根据堆栈信息进一步分析

fe_jstack.log (1.0 MB)
您好,这是最新失败的时候跑的jstack的日志

fe_jstack.log (1.0 MB)
您好,这是失败的时候跑的jstack的日志

好的,估计是表的分区数太多,fe访问hive metastore卡住了,请问咱们这个表的分区数是多少啊?

这表没有分区哎。数据量在1700多万

我看您的表结构PARTITION BY (etl_dt, etl_hour) ,数据是按小时分区的,分区数应该不少了,您再看下hive表

额,看错表了,这个表是有很多分区。但是其他的表有部分没有分区,数据量也只有千万。然后也是报一样的错误。

先拿这个表分析吧,分区数多少?怀疑是个已知问题,其他报错的表, refresh的时候也打印下jstack -l $pid

分区数:18,166 数量:4219795127 这张表是这个情况,是因为分区过多、数据量过大吗?