Fusion ingestion fails due to persistent volume claim or MySQL service outage – Lucidworks

Issue

Ingestion operations in Fusion become non-functional, and dependent services such as indexing, admin, and connectors may fail. Log messages typically show errors such as:

Stage solr_index::3 encountered an error
An error occurred while processing the Solr Index Stage
connect timed out executing GET http://admin/api/v1/collections/<collection_name>

This may follow a scenario where the fs-fusion-mysql service or its underlying persistent volume claim (PVC) is unavailable.

Diagnosis

Check if the MySQL service pod is running and has a bound PVC:

kubectl get pods -n <fusion-namespace> | grep mysql
kubectl get pvc -n <fusion-namespace>

If the MySQL pod is stuck in Pending or CrashLoopBackOff, inspect the status of the volume claim:

kubectl describe pvc fs-fusion-mysql -n <fusion-namespace>

Review logs from MySQL, Zookeeper, Solr, and dependent services (admin, connectors, indexing) to determine cascading failures:

kubectl logs <pod_name> -n <fusion-namespace>

Common symptoms include:

Admin API errors or timeouts
Solr not initializing due to lost coordination with Zookeeper
Connectors unable to register or run
Fusion UI becoming unresponsive or unavailable

Environment

Fusion 5.9.x
Applies to self-hosted Fusion installations using PVC-backed MySQL deployments

Cause

A failure or detachment of the persistent volume backing the MySQL deployment can prevent MySQL from starting. Since MySQL stores critical metadata for Fusion, its failure cascades to other components:

Zookeeper cannot track Solr nodes without MySQL state
Solr initialization fails due to lost coordination
Admin, Connectors, and Ingest services cannot access necessary configuration and fail to start or respond

Kubernetes may not automatically restart all dependent pods in the correct order after PVC restoration, exacerbating the issue.

Resolution

1. Restore the persistent volume

If the underlying storage (e.g., EBS in AWS) has failed or detached:

Recreate or remount the PVC for MySQL. Example manifest:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fs-fusion-mysql
  namespace: <fusion-namespace>
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard

Apply with:

kubectl apply -f mysql-pvc.yaml

2. Restart core Fusion services in order

Once PVC is healthy and MySQL is back online:

kubectl rollout restart deployment fs-fusion-mysql -n <fusion-namespace>
kubectl rollout restart statefulset fs-fusion-zookeeper -n <fusion-namespace>
kubectl rollout restart statefulset fs-fusion-solr -n <fusion-namespace>
kubectl rollout restart deployment fs-fusion-admin -n <fusion-namespace>
kubectl rollout restart deployment fs-fusion-connectors -n <fusion-namespace>
kubectl rollout restart deployment fs-fusion-indexing -n <fusion-namespace>

You may also need to manually delete pods that are stuck or unresponsive:

kubectl delete pod <pod_name> -n <fusion-namespace>

3. Validate recovery

Confirm MySQL is serving data: check logs for successful startup.
Ensure Solr collections and Zookeeper coordination is restored.
Access the Fusion Admin UI and verify services are running.
Re-test ingestion via indexing API or UI.

If ingestion errors persist, verify indexing service logs and consider restarting fs-fusion-indexing again:

kubectl rollout restart deployment fs-fusion-indexing -n <fusion-namespace>

Once indexing APIs return successful responses, normal operation is restored.