Introduction to Log files and Snapshots:
ZooKeeper servers use local storage to persist transactions. The transactions are logged to transaction logs, similar to the approach of sequential append-only log files used in database systems.
The servers in the ZooKeeper service also keep on saving point-in-time copies or snapshots of the ZooKeeper tree or the namespace onto the local filesystem.
The ZooKeeper snapshot files and transactional logs enable recovery of data in times of catastrophic failure or user error. The data directory is specified by the dataDir parameter in the ZooKeeper configuration file.
As sample output of the directory will look similar to this
-rw-r--r-- 1 solr solr 65M Aug 23 14:09 log.17858cf
-rw-r--r-- 1 solr solr 65M Aug 24 10:11 log.1786897
-rw-r--r-- 1 solr solr 65M Aug 25 08:46 log.1787824
-rw-r--r-- 1 solr solr 61M Feb 5 2016 snapshot.149945d
-rw-r--r-- 1 solr solr 61M Feb 5 2016 snapshot.14aa871
Purging Zookeeper Log and Snapshot files:
Starting Zookeeper 3.4.0 we have additional parameters which we can add to Zookeeper configuration file so that purging the logs and snapshot files can be automated.
You can add the following two parameters to the Zookeeper configuration file
When enabled, ZooKeeper auto purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. Defaults to 3. Minimum value is 3.
The time interval in hours for which the purge task has to be triggered. Set to a positive integer (1 and above) to enable the auto purging. Defaults to 0.
You will need to restart the Zookeeper after adding these two values to the configuration file. This will prevent ZK logs building up over time in your setup.
More details can be found here:
Using PurgeTxnLog Utility from Zookeeper to purge logs
A ZooKeeper server will not remove old snapshots and log files when using the default configuration, this is the responsibility of the operator.
The PurgeTxnLog utility implements a simple retention policy that administrators can use
PurgeTxnLog dataLogDir [snapDir] -n count
dataLogDir -- path to the txn log directory
snapDir -- path to the snapshot directory (optional, mostly dataLogDir and snapDir are same)
count -- the number of old snaps/logs you want to keep (minimum 3)
1. Please change to the directory to the Zookeeper installation directory (lets say $ZK,
here we assume the directory is /var/zookeeper-3.4.6)
In this location we have (zookeeper-3.4.6.jar and libs directory) which are of interest to
execute the PurgeTxnLogs command
2. Execute the PurgeTxnLog command
java -cp /var/zookeeper-3.4.6/zookeeper-3.4.6.jar:/var/zookeeper-3.4.6/lib/*:/var/zookeeper-3.4.6/conf org.apache.zookeeper.server.PurgeTxnLog /var/zookeeper-3.4.6/zoodata/ -n 3