Issue
Kafka’s Persistent Volume Claim (PVC) is filling up and log files are not being auto-deleted, resulting in Kafka running out of disk space. The logs continue to grow uncontrollably, causing poor performance and potentially halting Kafka’s ability to process messages.
Diagnosis
The issue is generally caused by improper log retention settings, resulting in log files being kept for too long or occupying too much space. To confirm if this is the case:
- Check the current disk usage for Kafka’s PVC.
- Review Kafka’s log retention settings, such as
log.retention.bytes
andlog.retention.hours
.
Environment
- Version: Lucidworks Fusion 5.9.2+ (but applicable to any Kafka setup)
- Environment: Kubernetes cluster, PVC-backed storage
- Relevant Components: StatefulSet configuration for Kafka
Cause
This issue occurs when Kafka's log retention settings are misconfigured, causing Kafka to retain logs for too long or beyond the PVC’s capacity. In particular:
- Log retention times (
log.retention.hours
) may be set too high, resulting in logs not being deleted frequently enough. - Log size limits (
log.retention.bytes
) may be set too high, causing Kafka to retain more data than the available disk space allows. log.cleanup.policy
may not be set todelete
, which prevents automatic deletion of old logs.
Resolution
Before modifying Kafka's configuration, check the current disk usage to assess how much space is being consumed.
kubectl exec -it KAFKA-POD -- df -h /bitnami/kafka/data
Review Kafka's retention settings
Kafka’s log retention settings dictate how long logs are retained and how much space they consume. If the values are set too high, Kafka will retain logs longer than necessary.
kubectl exec -it KAFKA-POD -- cat /bitnami/kafka/config/server.properties | grep -i "log.retention"
Modify Kafka retention settings
Based on disk usage, adjust retention policies to avoid filling the PVC.
Command to set retention size to 1 GB:
kubectl exec -it KAFKA-POD -- kafka-configs.sh --zookeeper zk-service:2181 --entity-type topics --entity-name NAME-OF-TOPIC --alter --add-config retention.bytes=1073741824
Command to set retention time to 72 hours:
kubectl exec -it KAFKA-POD -- kafka-configs.sh --zookeeper zk-service:2181 --entity-type topics --entity-name NAME-OF-TOPIC --alter --add-config retention.ms=259200000
Ensure log deletion is enabled
Ensure that Kafka is set to delete logs rather than compacting them.
Add log.cleanup.policy
to StatefulSet:
kubectl edit statefulset STA-NAME
Add this env variable:
- name: KAFKA_CFG_LOG_CLEANUP_POLICY
value: "delete"
Check log cleanup frequency
Kafka checks for logs to delete at regular intervals defined by log.retention.check.interval.ms
. Reducing this interval ensures that Kafka deletes logs more frequently.
Modify log.retention.check.interval.ms
:
kubectl edit statefulset STS-NAME
Add or modify the following env variable:
- name: KAFKA_CFG_LOG_RETENTION_CHECK_INTERVAL_MS
value: "120000"
This is equal to 2 minutes
Restart Kafka pods to apply changes
After modifying Kafka’s configuration, restart the Kafka pods to apply the new settings.
kubectl rollout restart statefulset STS-NAME
Monitor disk usage and log cleanup
Continue monitoring Kafka's disk usage and confirm that logs are being deleted as expected.
Monitor disk usage:
kubectl exec -it KAFKA-POD -- df -h /bitnami/kafka/data
Verify log deletion:
kubectl exec -it KAFKA-POD -- ls /bitnami/kafka/data | grep -i NAME-OF-TOPIC
Increase PVC size
If Kafka’s disk usage remains high despite optimizing retention settings, consider expanding the PVC to provide more storage for logs.
Comments
0 comments
Article is closed for comments.