CREATE REPOSITORY导致fe发生重启

【详述】chart版本1.6.1安装的服务,远程仓库类型为oss,计划通过 CREATE REPOSITORY备份数据,备份数据,但在执行命令时会导致fe重启,并且fe和be都查不到任何报错,查看监控指标也都正常,k8s events显示liveness检测失败:
91s Warning Unhealthy pod/kube-starrocks-fe-0 Readiness probe failed: Get “http://10.100.80.31:8030/api/health”: read tcp 10.100.80.13:56778->10.100.80.31:8030: read: connection reset by peer
所执行命令:
CREATE REPOSITORY backup_repo
WITH BROKER
ON LOCATION “oss://my-bucket/starrocks_backup”
PROPERTIES(
“fs.oss.accessKeyId” = “my-ak”,
“fs.oss.accessKeySecret” = “my-sk”,
“fs.oss.endpoint” = “oss-cn-shenzhen-internal.aliyuncs.com”,
“timeout” = “3600”
);
【背景】通过helm安装的starrocks-fe和starrocks-be,安装时chart版本为1.6.1,根据业务实际情况自定义了部分values内容,由于该版本没有初始化密码的设置,导致存在一定的安全隐患,所以改用1.7.1版本chart新建了一套集群,但是1.7.1更改了pvc的相关机制,原pvc无法继续在新集群使用,只能进行备份和恢复的方式,1.6.1版本的values.yaml如下:

# Default values for kube-starrocks.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
global:
  rbac:
    create: true
  # set the starrockscluster CRD version. from operator v1.6 we have supported v1 CRD and as default CRD version. but the v1alpha1 also supported.
  # in future v1alpha1 as verify CRD, when the ability is stable will be added into v1.
  # supported value: v1, v1alpha1
  crdVersion: v1

# the starrockscluster crd' name.
nameOverride: ""
starrocksOperator:
  # provide a name for operator deployment.
  enabled: true
  namespaceOverride: ""
  image:
    # image sliced by "repository:tag"
    repository: starrocks/operator
    tag: latest
  resources:
    limits:
      cpu: 500m
      memory: 200Mi
    requests:
      cpu: 500m
      memory: 200Mi


# deploy a starrocks cluster
starrocksCluster:
  # Provide a name for starprocks cluster
  name: ""
  namespace: ""
  # annotations for starrocks cluster
  annotations: {}
  labels:
    object: starrocks
  # specify the cn deploy or not.
  enabledCn: false


# spec to deploy fe.
starrocksFESpec:
  # Number of replicas to deploy for a fe statefulset.
  replicas: 1
  image:
    # image sliced by "repository:tag"
    repository: starrocks/fe-ubuntu
    tag: 2.5.9

  # add annotations for fe pods. example, if you want to config monitor for datadog, you can config the annotations. etc...
  annotations: {}

  # A special supplemental group that applies to all containers in a pod. Some volume types allow the Kubelet to change the ownership of that volume to be owned by the pod:
  # Ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
  fsGroup: 0

  # specify the service name and port config and serviceType
  # the service type refer https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
  service:
    # the fe service type, only supported ClusterIP, NodePort, LoadBalancer, ExternalName
    type: "NodePort"
    targetPort: 9030
    # the loadbalancerIP for static ip config when the type=LoadBalancer and loadbalancerIp is not empty.
    # loadbalancerIP: ""

  # contains the secret name
  imagePullSecrets: []
  # - name: "image-pull-secret"

  # serviceAccount for fe access cloud service.
  serviceAccount: ""

  # If specified, the pod's nodeSelector,displayName="Map of nodeSelectors to match when scheduling pods on nodes"
  # Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
  nodeSelector: {}
  #  beta.kubernetes.io/arch: amd64
  #  beta.kubernetes.io/os: linux

  # the pod labels for user select or classify pods.
  podLabels: {}

  ## hostAliases allows adding entries to /etc/hosts inside the containers
  hostAliases: []
  # - ip: "127.0.0.1"
  #   hostnames:
  #   - "example.com"

  # Additional fe container environment variables
  # You specify this manually like you would a raw deployment manifest.
  # This means you can bind in environment variables from secrets.
  # Ref: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
  feEnvVars: []
  # e.g. static environment variable:
  #  - name: DEMO_GREETING
  #    value: "Hello from the environment"
  #
  # e.g. secret environment variable:
  #  - name: USERNAME
  #    valueFrom:
  #      secretKeyRef:
  #        name: mysecret
  #        key: username

  # Pod affinity
  affinity: {}
  #  nodeAffinity:
  #    requiredDuringSchedulingIgnoredDuringExecution:
  #      nodeSelectorTerms:
  #      - matchFields:
  #        - key: metadata.name
  #          operator: In
  #          values:
  #          - target-host-name

  # Node tolerations for fe pod scheduling to nodes with taints
  # Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  tolerations: []
  #  - key: "key"
  #    operator: "Equal|Exists"
  #    value: "value"
  #    effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
  resources:
    requests:
      cpu: 4
      memory: 4Gi
    limits:
      cpu: 8
      memory: 8Gi

  # fe storageSpec for persistent meta data.
  storageSpec:
    # the name of volume for mount. if not will use emptyDir.
    name: "starrocks-fe"
    # the storageClassName represent the used storageclss name. if not set will use k8s cluster default storageclass.
    # you must set name when you set storageClassName
    storageClassName: "alibabacloud-cnfs-nas"
    #the persistent volume size default 1 Gi.
    storageSize: 1Gi

  # the config for start fe. the base information as follows.
  config: |
    LOG_DIR = ${STARROCKS_HOME}/log
    DATE = "$(date +%Y%m%d-%H%M%S)"
    JAVA_OPTS="-Dlog4j2.formatMsgNoLookups=true -Xmx8192m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:${LOG_DIR}/fe.gc.log.$DATE"
    JAVA_OPTS_FOR_JDK_9="-Dlog4j2.formatMsgNoLookups=true -Xmx8192m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xlog:gc*:${LOG_DIR}/fe.gc.log.$DATE:time"
    http_port = 8030
    rpc_port = 9020
    query_port = 9030
    edit_log_port = 9010
    mysql_service_nio_enabled = true   
    sys_log_level = INFO

## spec for compute node, compute node provide compute function.
starrocksCnSpec:
  image:
    # image sliced by "repository:tag"
    repository: starrocks/cn-ubuntu
    tag: 2.5.9
  # serviceAccount for cn access cloud service.
  #serviceAccount: starrocks

  # add annotations for cn pods. example, if you want to config monitor for datadog, you can config the annotations. etc...
  annotations: {}

  # A special supplemental group that applies to all containers in a pod. Some volume types allow the Kubelet to change the ownership of that volume to be owned by the pod:
  # Ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
  fsGroup: 0

  # specify the service name and port config and serviceType
  # the service type refer https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
  service:
    # the fe service type, only supported ClusterIP, NodePort, LoadBalancer, ExternalName
    type: "ClusterIP"
    # targetPort: 9030
    # the loadbalancerIP for static ip config when the type=LoadBalancer and loadbalancerIp is not empty.
    loadbalancerIP: ""

  # contains the secret name
  imagePullSecrets: []
  # - name: "image-pull-secret"

  # serviceAccount for fe access cloud service.
  serviceAccount: ""

  # If specified, the pod's nodeSelector,displayName="Map of nodeSelectors to match when scheduling pods on nodes"
  # Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
  nodeSelector: {}
  #  beta.kubernetes.io/arch: amd64
  #  beta.kubernetes.io/os: linux

  # the pod labels for user select or classify pods.
  podLabels: {}

  ## hostAliases allows adding entries to /etc/hosts inside the containers
  hostAliases: []
  # - ip: "127.0.0.1"
  #   hostnames:
  #   - "example.com"

  # Additional cn container environment variables
  # You specify this manually like you would a raw deployment manifest.
  # This means you can bind in environment variables from secrets.
  # Ref: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
  cnEnvVars: []
  # e.g. static environment variable:
  #  - name: DEMO_GREETING
  #    value: "Hello from the environment"
  # e.g. secret environment variable:
  #  - name: USERNAME
  #    valueFrom:
  #      secretKeyRef:
  #        name: mysecret
  #        key: username

  # Pod affinity
  affinity: {}
  #  nodeAffinity:
  #    requiredDuringSchedulingIgnoredDuringExecution:
  #      nodeSelectorTerms:
  #        - matchFields:
  #            - key: metadata.name
  #              operator: In
  #              values:
  #                - target-host-name


  # Node tolerations for cn pod scheduling to nodes with taints
  # Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  tolerations: []
  #  - key: "key"
  #    operator: "Equal|Exists"
  #    value: "value"
  #    effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"

  autoScalingPolicy: {}
  #  maxReplicas: 10
  #  minReplicas: 1
  #  hpaPolicy:
  #    metrics:
  #    - type: Resource
  #      resource:
  #        name: memory
  #        target:
  #          averageUtilization: 30
  #          type: Utilization
  #    - type: Resource
  #      resource:
  #        name: cpu
  #        target:
  #          averageUtilization: 30
  #          type: Utilization
  #    behavior:
  #      scaleUp:
  #        policies:
  #        - type: Pods
  #          value: 1
  #          periodSeconds: 10
  #      scaleDown:
  #        selectPolicy: Disabled

  # define resources requests and limits for cn pods.
  resource:
    limits:
      cpu: 8
      memory: 8Gi
    requests:
      cpu: 4
      memory: 8Gi
  # the config start for cn, the base information as follows.
  config: |
    sys_log_level = INFO
    # ports for admin, web, heartbeat service
    thrift_port = 9060
    webserver_port = 8040
    heartbeat_service_port = 9050
    brpc_port = 8060


## spec for component be, be provide storage and compute function.
starrocksBeSpec:
  ## Number of replicas to deploy for a be statefulset.
  replicas: 1
  image:
    # image sliced by "repository:tag"
    repository: starrocks/be-ubuntu
    tag: 2.5.9
  ## serviceAccount for be access cloud service.
  #serviceAccount: starrocks

  # add annotations for be pods. example, if you want to config monitor for datadog, you can config the annotations. etc...
  annotations: {}

  # A special supplemental group that applies to all containers in a pod. Some volume types allow the Kubelet to change the ownership of that volume to be owned by the pod:
  # Ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
  fsGroup: 0

  # specify the service name and port config and serviceType
  # the service type refer https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
  service:
    # the fe service type, only supported ClusterIP, NodePort, LoadBalancer, ExternalName
    type: "ClusterIP"
    # targetPort: 9050
    # the loadbalancerIP for static ip config when the type=LoadBalancer and loadbalancerIp is not empty.
    # loadbalancerIP: ""

  # contains the secret name
  imagePullSecrets: []
  # - name: "image-pull-secret"

  # serviceAccount for fe access cloud service.
  serviceAccount: ""

  # If specified, the pod's nodeSelector,displayName="Map of nodeSelectors to match when scheduling pods on nodes"
  # Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
  nodeSelector: {}
  #  beta.kubernetes.io/arch: amd64
  #  beta.kubernetes.io/os: linux

  # the pod labels for user select or classify pods.
  podLabels: {}

  ## hostAliases allows adding entries to /etc/hosts inside the containers
  hostAliases: []
  # - ip: "127.0.0.1"
  #   hostnames:
  #   - "example.com"
  
  # Additional be container environment variables
  # You specify this manually like you would a raw deployment manifest.
  # This means you can bind in environment variables from secrets.
  #
  # Ref: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
  beEnvVars: []
  # e.g. static environment variable:
  #  - name: DEMO_GREETING
  #    value: "Hello from the environment"
  #
  # e.g. secret environment variable:
  #  - name: USERNAME
  #    valueFrom:
  #      secretKeyRef:
  #        name: jXmivBDAX9skaHma
  #        key: root

  ## Pod affinity
  affinity: {}
  #  nodeAffinity:
  #    requiredDuringSchedulingIgnoredDuringExecution:
  #      nodeSelectorTerms:
  #      - matchFields:
  #        - key: metadata.name
  #          operator: In
  #          values:
  #          - target-host-name

  # Node tolerations for be pod scheduling to nodes with taints
  # Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  tolerations: []
  #  - key: "key"
  #    operator: "Equal|Exists"
  #    value: "value"
  #    effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"

  resource:
    requests:
      cpu: 4
      memory: 4Gi
    limits:
      cpu: 8
      memory: 8Gi
  ## specify storageclass name and request size.
  storageSpec:
    # the name of volume for mount. if not will use emptyDir.
    name: "starrocks-be"
    # the storageClassName represent the used storageclss name. if not set will use k8s cluster default storageclass.
    # you must set name when you set storageClassName
    storageClassName: "alibabacloud-cnfs-nas"
    storageSize: 100Gi

  # the config for start be. the base information as follows.
  config: |
    be_port = 9060
    webserver_port = 8040
    heartbeat_service_port = 9050
    brpc_port = 8060
    sys_log_level = INFO
    default_rowset_type = beta

【业务影响】root账号为空密码
【StarRocks版本】2.5.9
【集群规模】1fe + 1be
【机器信息】Pod配置:8C8G200Mbps
【联系方式】iamvea@163.com
【附件】
image
重启:


监控:

日志:

fe.log 中搜 starting,往上看看有没有报错或者堆栈信息

没有发现诶
image

主要目的是想要改root的初始化密码是吧,可以参考这个修改 https://github.com/StarRocks/starrocks-kubernetes-operator/blob/main/doc/change_root_password_howto.md

找到包含 创建仓库时刻的 fe.log ,less fe.log 然后 /starting 看看

对的,主要目的是改root密码,直接改的话会导致be连不上fe而起不来,所以才大费周折重装[捂脸],我参照文档试一下这个方法。

我参考这个文档,修改了自定义values.yaml,再helm upgrade重新配置安装后,确认root密码生效了,fe和be都能够正常启动,感谢大佬指路。

请问标题中提到的 create repo 命令导致 FE 节点下线的问题有解决办法不?