fe proxy 发送数据超时，如何修改对应的nginx配置让其生效？

starrocks搬运工 · 2024年04月15日 06:38

存算分离集群，版本3.1.7。使用starrocks-operator方式部署。

默认定制资源：kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/starrocks.com_starrocksclusters.yaml

默认operator：kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/operator.yaml

目前存在数据发送超时的情况，我需要调大nginx的proxy_send_timeout，这个直接改nginx配置后发现被重置了，有什么方式可以修改fe-proxy中的nginx配置？

nginx报错日志

2024/04/15 03:11:30 [error] 22#22: *597153 upstream timed out (110: Connection timed out) while sending request to upstream, client: 172.x.201.x, server: , request: “PUT /api/link_restore/hit_risk_detail_dk/_stream_load HTTP/1.1”, upstream: “http://172.x.0.x:8040/api/link_restore/hit_risk_detail_dk/_stream_load”, host: “x”

fe-proxy

starRocksFeProxySpec:
replicas: 2
limits:
cpu: 4
memory: 8Gi
requests:
cpu: 4
memory: 8Gi
service:
type: NodePort
ports:
- containerPort: 8080
name: http-port
port: 8080
nodePort: 31102
resolver: “coredns-coredns.kube-system.svc.cluster.local”
podLabels:
app: starrocks-fe-proxy

alias · 2024年04月16日 08:47

目前 nginx.conf 还不支持修改。需要 fork operator 项目，在 feproxy_configmap.go 中修改该文件，然后重新构建容器镜像。

starrocks搬运工 · 2024年04月16日 08:47

使用重新构建operator的方式解决了。但后续可以支持修改nginx.conf吗…重新构建也挺麻烦的

alias · 2024年04月16日 09:18

目前的 60s 正常来讲是够的。 https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_send_timeout

Sets a timeout for transmitting a request to the proxied server. The timeout is set only between two
successive write operations, not for the transmission of the whole request. If the proxied server
does not receive anything within this time, the connection is closed.

能讲一下你们的导入场景吗？是在什么情况下出现的？

starrocks搬运工 · 2024年04月16日 11:32

flink 导入，流量会比较大。当前部署方式K8S+CEPH，使用ceph对象存储。导入表结构就两个字段一个id一个data,data使用json类型，平均大小几十KB 最大可能有3M，一天流量在20亿左右，目前flink配置flush.max-bytes=3G flush.max-rows=800000，flush-interval-ms=800000ms ,checkpoint 间隔1200s。

我们也有其他这种流量等级的表（一天20亿这种），但没有大字段，他的flink导入会比较稳定。

alias · 2024年04月16日 11:47

了解。这个问题我看你应该是建了 issue。后续我们会评估下。多谢你的建议。