Issue
Zookeeper data directory is filling up rapidly due to excessive snapshot and transaction log file generation, eventually exhausting disk space.
Diagnosis
The issue can typically be confirmed by reviewing the contents of the Zookeeper version-2 data directory. If the directory contains an unusually high number of snapshot.* and log.* files, it may indicate that snapshots are being created too frequently due to a low snapCount value or frequent service restarts.
Check the current configuration values for the following properties in the Zookeeper configuration file or startup environment:
snapCount=10
preAllocSize=1000
autopurge.purgeInterval=1
autopurge.snapRetainCount=3
If snapCount is significantly lower than the default (100,000), snapshots will be triggered after a small number of transactions, increasing disk write activity and file accumulation.
Environment
Zookeeper 3.8.3 and compatible Fusion deployments using embedded Zookeeper in Kubernetes.
Cause
A non-default snapCount setting that is too low (e.g., snapCount=10) leads Zookeeper to take a snapshot after every 10 transactions. This behavior can generate excessive snapshot files and transaction logs, especially in systems with frequent transaction activity or restarts. Each restart can trigger a new log file without corresponding snapshots, exacerbating disk usage.
Resolution
Review and update Zookeeper settings
To reduce snapshot frequency and avoid overwhelming disk usage, either:
-
Remove the
snapCountconfiguration entirely to revert to the default (100000), or -
Set it to a higher, more reasonable value that matches the cluster’s transaction throughput, such as
snapCount=50000.
Also review and adjust the following related properties as needed:
autopurge.purgeInterval=1
autopurge.snapRetainCount=3
These settings ensure that old snapshot and transaction log files are automatically purged.
Restart considerations
Zookeeper creates a new transaction log on each restart. If the cluster is restarting frequently before enough transactions are processed to trigger a snapshot (based on snapCount), then autopurge will not remove logs due to absence of snapshots. This can lead to log file buildup.
Ensure the cluster is stable and restarts are minimized to allow snapshot creation and purging to function correctly.
Reference
For additional guidance, refer to the Zookeeper log and snapshot maintenance documentation:
Zookeeper Log Snapshot Maintenance.