Issue
The kuberay-operator microservice experiences continuous restarts even when Ray-based machine learning workloads are not being utilized. This may also result in the Fusion UI showing a persistent yellow status due to a reported service outage, even though it does not affect core operations.
Diagnosis
To determine whether the Fusion instance is actively using Ray resources, run the following commands:
kubectl get rayjobs.ray.io -A
kubectl get rayclusters.ray.io -AIf both commands return no resources, the kuberay-operator is not being used and can safely be disabled or scaled down.
Environment
Fusion 5.9.14 and later
Kubernetes (EKS, AKS, GKE, or on-prem)
Cause
The kuberay-operator pod expects certain Ray Custom Resource Definitions (CRDs) to be present in the cluster. If these CRDs (such as RayJob.ray.io, RayCluster.ray.io, or RayService.ray.io) are not installed, the operator repeatedly fails during startup, loses leader election, and restarts continuously.
Example log errors:
error: failed to get restmapping: no matches for kind "RayJob" in group "ray.io"
error: leader election lost
error: failed to wait for raycluster caches to syncThis behavior occurs even if Ray is not used, because the operator is still deployed by default.
Resolution
If Ray is not used in your deployment, choose one of the following options to stop the restarts and remove the yellow status from the Fusion UI:
Option 1: Temporarily scale down the deployment
You can scale down the deployment to zero replicas using kubectl:
# Identify the deployment
kubectl get deployment -n <namespace> | grep kuberay-operator
# Scale the deployment to 0
kubectl scale deployment <deployment_name_kuberay-operator> --replicas=0 -n <namespace>This stops the pod but may still show the service as “down” in the UI.
Option 2: Permanently disable kuberay in values.yaml
To fully disable the kuberay-operator and prevent the service from appearing in Fusion:
Open your Helm
values.yamlfile.Locate the configuration block for
kuberay-operator.Set the following value:
kuberay-operator:
enabled: falseApply the change using your standard Helm upgrade process:
helm upgrade <release-name> lucidworks/fusion-values -f values.yamlDisabling the operator this way prevents its pod from being deployed and avoids triggering a yellow status in the Fusion UI.
If Ray or Seldon is needed in the future, this setting can be re-enabled.