Issue
ZooKeeper pod corruption can result in a Fusion deployment becoming non-functional, with symptoms including empty configuration directories in ZooKeeper and inability to reach Fusion or Solr services. This may be caused by inadvertent deletion of ZooKeeper nodes or data corruption.
Diagnosis
To determine if the ZooKeeper pod has lost critical Fusion configuration data:
-
Connect to the ZooKeeper pod using the built-in CLI:
-
Check for expected data under the
/lwfusion/5.0path:If this path returns an empty list (
[]), it suggests a loss of collection configuration data. -
Review the remaining directory structure to confirm if other critical paths (e.g.,
pipelines,index-profiles,services) are also missing or empty. -
Inspect the available snapshot file timestamps to determine whether any snapshots were created before the incident:
If the snapshot files were created after the deletion, recovery via snapshots is unlikely.
Environment
Fusion 5.x
Cause
Fusion stores configuration and state information in ZooKeeper. Direct interaction with zkCli.sh can lead to accidental deletion of data if commands are entered incorrectly.
ZooKeeper is not configured with undo capabilities; once critical znodes are deleted and new snapshots written, the data is generally unrecoverable.
Resolution
If critical configuration data (e.g., /lwfusion/5.0/core/collections) is missing and cannot be restored from a valid snapshot, follow these steps:
Rebuild the environment using Helm
-
Delete the existing Fusion release:
-
Reinstall the release using your stored values file or Helm configuration:
-
Verify all services are running:
-
Reimport application data:
-
Reload pipelines, collections, jobs, and any other configuration either from backup files or by re-running initial configuration scripts.
-
Restore data feeds or indexing jobs as appropriate.
-
Additional guidance
-
Validate that the
zkSnapshotRecursiveSummaryToolkit.shtool is present before relying on it for snapshot analysis. In some builds, this utility is not included. -
ZooKeeper logging changes after an upgrade may indicate changes in logging configuration or Helm chart defaults. If older environments show more detailed logs, compare the running images and Helm chart versions to ensure consistency.
-
ZooKeeper version mismatches between chart expectations and deployed containers (e.g., 3.7.1 instead of expected 3.9.1) should be reviewed and corrected if necessary.
Notes
Note: ZooKeeper should not be used interactively in production unless explicitly required and changes are fully understood. Consider implementing RBAC, using Kubernetes secrets or config maps, and maintaining regular external backups of Fusion applications for disaster recovery.