Issue
Ingestion operations in Fusion become non-functional, and dependent services such as indexing, admin, and connectors may fail. Log messages typically show errors such as:
Stage solr_index::3 encountered an error
An error occurred while processing the Solr Index Stage
connect timed out executing GET http://admin/api/v1/collections/<collection_name>This may follow a scenario where the fs-fusion-mysql service or its underlying persistent volume claim (PVC) is unavailable.
Diagnosis
Check if the MySQL service pod is running and has a bound PVC:
kubectl get pods -n <fusion-namespace> | grep mysql
kubectl get pvc -n <fusion-namespace>If the MySQL pod is stuck in Pending or CrashLoopBackOff, inspect the status of the volume claim:
kubectl describe pvc fs-fusion-mysql -n <fusion-namespace>Review logs from MySQL, Zookeeper, Solr, and dependent services (admin, connectors, indexing) to determine cascading failures:
kubectl logs <pod_name> -n <fusion-namespace>Common symptoms include:
- Admin API errors or timeouts
- Solr not initializing due to lost coordination with Zookeeper
- Connectors unable to register or run
- Fusion UI becoming unresponsive or unavailable
Environment
- Fusion 5.9.x
- Applies to self-hosted Fusion installations using PVC-backed MySQL deployments
Cause
A failure or detachment of the persistent volume backing the MySQL deployment can prevent MySQL from starting. Since MySQL stores critical metadata for Fusion, its failure cascades to other components:
- Zookeeper cannot track Solr nodes without MySQL state
- Solr initialization fails due to lost coordination
- Admin, Connectors, and Ingest services cannot access necessary configuration and fail to start or respond
Kubernetes may not automatically restart all dependent pods in the correct order after PVC restoration, exacerbating the issue.
Resolution
1. Restore the persistent volume
If the underlying storage (e.g., EBS in AWS) has failed or detached:
- Recreate or remount the PVC for MySQL. Example manifest:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fs-fusion-mysql
namespace: <fusion-namespace>
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standardApply with:
kubectl apply -f mysql-pvc.yaml2. Restart core Fusion services in order
Once PVC is healthy and MySQL is back online:
kubectl rollout restart deployment fs-fusion-mysql -n <fusion-namespace>
kubectl rollout restart statefulset fs-fusion-zookeeper -n <fusion-namespace>
kubectl rollout restart statefulset fs-fusion-solr -n <fusion-namespace>
kubectl rollout restart deployment fs-fusion-admin -n <fusion-namespace>
kubectl rollout restart deployment fs-fusion-connectors -n <fusion-namespace>
kubectl rollout restart deployment fs-fusion-indexing -n <fusion-namespace>
You may also need to manually delete pods that are stuck or unresponsive:
kubectl delete pod <pod_name> -n <fusion-namespace>3. Validate recovery
- Confirm MySQL is serving data: check logs for successful startup.
- Ensure Solr collections and Zookeeper coordination is restored.
- Access the Fusion Admin UI and verify services are running.
- Re-test ingestion via indexing API or UI.
If ingestion errors persist, verify indexing service logs and consider restarting fs-fusion-indexing again:
kubectl rollout restart deployment fs-fusion-indexing -n <fusion-namespace>Once indexing APIs return successful responses, normal operation is restored.