Issue
During a sequential platform upgrade to Fusion 5.9.15 on Kubernetes, rebuilding containerized Machine Learning (ML) images using the standard ray[serve] component versions can trigger two distinct engineering blockers:
Security scanning tools (such as Docker Scout) flag severe security vulnerabilities inside older
ray[serve]packages, failing compliance checks.Upgrading the
ray[serve]dependency inside custom Bring Your Own Model (BYOM) container images to a secure version (such as2.52.0) introduces heavy log repetition containing resource warnings.
The recurring log string presents as follows:
(gcs_server) gcs_autoscaler_state_manager.cc:89: There are tasks with infeasible resource requests that cannot be scheduled. See https://docs.ray.io/en/latest/ray-core/scheduling/index.html#ray-scheduling-resources for more details. Possible solutions: 1. Updating the ray cluster to include nodes with all required resources 2. To cause the tasks with infeasible requests to raise an error instead of hanging, set the 'RAY_enable_infeasible_task_early_exit=true'. This feature will be turned on by default in a future release of Ray.Diagnosis
To pinpoint whether the vulnerabilities or scheduling discrepancies stem from an infrastructure mismatch or a configuration misattribution, verify the deployment layout using the steps below.
Confirm Component Independence
Validate that the ml-model-service pod does not host or package the Python Ray runtime. The ml-model-service is a Java/Spring Boot microservice communicating with Ray over HTTP REST endpoints. The underlying Ray version is determined completely by your custom model Docker image, not by the core Fusion platform charts.
Inspect the Active Ray Topology status
Execute the state query inside the running Ray head container to review the specific resource shape being advertised to the global controller store (GCS):
kubectl exec -it <ray-head-pod-name> -n <fusion-namespace> -- ray statusReview the allocation metrics to determine if the requested worker pod CPU shapes match the allocatable thresholds of the hosting nodes.
Trace Schedulable Shapes Across Infrastructure Layers
Verify if the CPU or memory requests specified in the model deployment job match the parameters assigned to the individual KubeRay pod definitions. For example, if a model replica requests num_cpus: 4 but individual nodes or individual pods specify --num-cpus=2 or --num-cpus=0, the task will be permanently flagged as structurally infeasible.
Environment
Platform: Fusion (Self-Hosted)
Versions: 5.9.14, 5.9.15, 5.9.16
Infrastructure: Cloud-native Kubernetes clusters (e.g., AKS, EKS, GKE)
Microservices:
ml-model-service,kuberay-operatorModel Serving Engine: Seldon Core / Ray Serve Custom Images
Cause
The dual issues are caused by security boundaries in standard open-source libraries and changes in structural resource accounting introduced in newer Ray versions:
CVE-2023-48022 (ShadowRay)
Affects multiple open-source Ray versions due to an unauthenticated job submission API design. Scanners continuously flag this vulnerability because access controls must be explicitly enabled at the runtime layer.
CVE-2025-62593
A critical DNS rebinding vulnerability affecting versions prior to Ray 2.52.0.
Strict Resource Accounting and Reserved Core Overheads
Starting with Ray 2.52.0, system processes reserve a larger portion of CPU capacity for internal orchestration by default. If a cluster head node pod is deployed with a strict Kubernetes specification of 1 CPU, and Ray reserves a fraction of that core for system overhead, the available capacity drops below 1 full core. If an incoming model actor subsequently demands num_cpus: 1, the GCS autoscaler determines that no single pod type contains a large enough resource block to fulfill the request, spawning constant logging spam.
Per-Node Resource Pool Isolation
Ray actors must fit entirely within the bounds of a single node shape; they cannot pool fractional allocations distributed across multiple separate worker pods.
Resolution
Follow the steps below to systematically clear vulnerability flags and align infrastructure allocations.
Step 1: Remediate Base Vulnerabilities via BYOM Rebuilds
Update your model requirements file (requirements.txt) to replace older packages with ray[serve]==2.52.0 or higher. Rebuild and push the custom Docker container image.
ray[serve]==2.52.0This updates the internal runtime environment, resolves CVE-2025-62593, and introduces the necessary authentication infrastructure to mitigate CVE-2023-48022.
Step 2: Align Resource Commands inside the RayCluster Custom Resource
Modify your RayCluster deployment manifest to ensure that the core parameters are explicitly specified, preventing the scheduling engine from falling back to zero-allocation defaults.
kubectl edit raycluster <ray-cluster-name> -n <fusion-namespace>Update the configuration fields under the head pod definition to match the resource shapes expected by your workloads:
spec:
headGroupSpec:
rayStartParams:
num-cpus: "4"
memory: "8589934592"
workerGroupSpecs:
- replicas: 2
minReplicas: 2
maxReplicas: 4
rayStartParams:
num-cpus: "4"
memory: "8589934592"Ensure that the environmental template variable KUBERAY_GEN_RAY_START_CMD is configured so that worker params are generated uniformly alongside the head.
Step 3: Handle or Filter Autoscaler Logging Noise
If the resource allocations are balanced across all layers (Kubernetes pod limits, RayCluster parameters, and your model job configurations) but the autoscaler warnings continue to trigger during scale-up phases, this is benign logging noise caused by upstream orchestration delays (Ray community issue #59151). Use one of the two strategies below to manage the alerts:
Option A: Inject the Fast-Exit Configuration Flag
To force Ray to fast-fail or cleanly evaluate resource limitations instead of maintaining active task queues during autoscaling shifts, add the following property within your cluster configuration map environment scope:
env:
- name: RAY_enable_infeasible_task_early_exit
value: "true"Note that in highly dynamic environments that scale to zero replicas, this can occasionally cause short-term request errors during fresh pod scheduling windows rather than letting the tasks wait.
Option B: Establish Log Ingestion Filters
Since the warnings do not cause runtime task degradation, model serving timeouts, or pod crashes when physical hardware limits are sufficient, configure your logging ingestion patterns (e.g., Fluentd, Logstash, Datadog) to drop strings originating from gcs_autoscaler_state_manager.cc that match the infeasible request criteria.