收到,正在排查中,有结果第一时间给您反馈
是不是refresh的元数据比较大?可以在执行sql之前设置一下query_timeout set query_timeout=xxx,单位是秒
元数据挺大的,但是之前就是重启fe节点会恢复,fefresh也只要3s左右,但是升级后重启节点都不可以了,现在升级到2.2.8了
那你试下条大query_timeout看看呢?是从什么版本升级过来的?
扩大了10倍设置为了3000,也没有效果,从2.2.2版本升级的
2022-10-27 06:36:15,025 WARN (Thread-188|10376) [FrontendServiceProxy.call():29] call frontend thrift rpc failed, addr: TNetworkAddress(hostname:...228, port:9020), retried: 0
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:455) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:354) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:243) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[libthrift-0.13.0.jar:0.13.0]
at com.starrocks.thrift.FrontendService$Client.recv_refreshTable(FrontendService.java:620) ~[starrocks-fe.jar:?]
at com.starrocks.thrift.FrontendService$Client.refreshTable(FrontendService.java:607) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.lambda$null$4(Catalog.java:6852) ~[starrocks-fe.jar:?]
at com.starrocks.rpc.FrontendServiceProxy.call(FrontendServiceProxy.java:25) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.lambda$refreshOtherFesTable$5(Catalog.java:6849) ~[starrocks-fe.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_202]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_202]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_202]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_202]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_202]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_202]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_202]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_202]
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:125) ~[libthrift-0.13.0.jar:0.13.0]
… 12 more
2022-10-27 06:36:15,030 WARN (Thread-188|10376) [Catalog.lambda$refreshOtherFesTable$5():6855] call fe TNetworkAddress(hostname:10..***.228, port:9020) refreshTable rpc method failed
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:455) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:354) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:243) ~[libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[libthrift-0.13.0.jar:0.13.0]
at com.starrocks.thrift.FrontendService$Client.recv_refreshTable(FrontendService.java:620) ~[starrocks-fe.jar:?]
at com.starrocks.thrift.FrontendService$Client.refreshTable(FrontendService.java:607) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.lambda$null$4(Catalog.java:6852) ~[starrocks-fe.jar:?]
at com.starrocks.rpc.FrontendServiceProxy.call(FrontendServiceProxy.java:25) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.lambda$refreshOtherFesTable$5(Catalog.java:6849) ~[starrocks-fe.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_202]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_202]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_202]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_202]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_202]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_202]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_202]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_202]
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:125) ~[libthrift-0.13.0.jar:0.13.0]
… 12 more
这是最新的报错
总是228这一台报错吗,如果是的话,refresh的时候打印下jstack jstack -l pid
您好,不一定,之前一段时间都是300这台主机,call frontend thrift rpc failed, addr: TNetworkAddress(hostname:10.10.226.228, port:9020) 这个报错说明啥呢?一般refresh我重新启动对应的报错节点就又可以了。但是经常出现这样的问题。是因为访问rpc端口访问有问题吗? jstack jstack -l pid这个是怎操作呢?
下次遇到问题之后, 在linux环境下,执行jstack -l $pid ,其中$pid是fe的进程号 .这边根据堆栈信息进一步分析
好的,估计是表的分区数太多,fe访问hive metastore卡住了,请问咱们这个表的分区数是多少啊?
这表没有分区哎。数据量在1700多万
额,看错表了,这个表是有很多分区。但是其他的表有部分没有分区,数据量也只有千万。然后也是报一样的错误。
先拿这个表分析吧,分区数多少?怀疑是个已知问题,其他报错的表, refresh的时候也打印下jstack -l $pid
分区数:18,166 数量:4219795127 这张表是这个情况,是因为分区过多、数据量过大吗?
分区太多了,fe访问hive metastore,长时间占用db读锁问题,可以升级下2.2.9,有优化通过添加session变量来控制获取表级别统计信息时访问的分区数,默认5000 ;2.5重构之后增加了分区级别统计信息,首次查询不会拉全表
好,谢谢您,我们找时间升级版本试试
请问这张表有40多亿数据量?分区数是:18,166 是吗?
