Goal
Determine whether it is safe to configure the Kubernetes autoscaler settingskip-nodes-with-local-storage: false in an AKS-hosted Fusion environment without causing data loss or application disruption
Environment
Fusion 5.x (Helm-based deployments on Azure Kubernetes Service)
Guide
Understand the skip-nodes-with-local-storage setting
By default, the Kubernetes autoscaler is configured with skip-nodes-with-local-storage: true, which prevents scale-down of nodes that run pods using emptyDir or hostPath volumes. These volumes are considered local ephemeral storage and will be lost upon pod eviction or node removal.
Setting this option to false allows the autoscaler to remove such nodes, provided that the workloads are stateless or can tolerate data loss from local storage.
Step 1: Identify use of local ephemeral volumes
Run the following command to find pods using emptyDir or hostPath:
kubectl get deployments -n <namespace> -o json | jq '.items[].spec.template.spec.volumes | map(select(.emptyDir or .hostPath))'Replace <namespace> with your Fusion namespace.
Review the output for any volumes using emptyDir or hostPath. These indicate usage of local storage.
Step 2: Review volumeMounts in affected pods
To understand how the local volumes are used, inspect the mount points with:
kubectl get deployment <deployment-name> -n <namespace> -o yaml | grep -A10 "volumeMounts"This helps identify whether the mounted directories are used for transient data like /tmp, /logs, or plugin extraction directories (/app/plugin).
Fusion services typically use these local volumes for ephemeral operations. Examples include:
/tmp: temporary files/app/logs: runtime logsplugin-dirorplugins-dir: plugin unpacking and runtime content
If these mount paths are used only for runtime and not for storing persistent or critical data, they can be safely evicted and reinitialized on pod restart.
Step 3: Verify exceptions
Ensure there are no custom services or storage patterns that persist important data in local volumes. For example, services like Milvus may use hostPath volumes for internal data directories (e.g., /var/lib/milvus/db). These do not use emptyDir but must still be reviewed for eviction safety.
If a service requires persistence and currently uses local storage, migrate that volume to a Persistent Volume (PV) to avoid data loss during autoscaler-triggered evictions.
Step 4: Configure pod eviction
To allow the autoscaler to remove nodes running pods with local storage, add the following pod-level annotation:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "true"This tells the autoscaler that the pod can be safely rescheduled or terminated.
Step 5: Update autoscaler setting
After confirming that only non-critical or ephemeral volumes are in use, update the autoscaler configuration to set:
--skip-nodes-with-local-storage=falseThis enables the autoscaler to consider these nodes for scale-down, improving cluster resource utilization.
Step 6: Monitor system behavior post-change
After applying the change, monitor pod rescheduling and service health. If any services fail due to lost data, investigate and migrate the affected volume to persistent storage.