If any of your collections have signals enabled and the associated _signals collection has documents, then a default click signal aggregation job is scheduled to run on those signals every 2 minutes. When running in cluster mode, i.e. you've started spark-master and -worker processes, the aggregation job creates a copy of the Fusion shaded JAR (~250MB) whenever the driver starts. These JARs are only cleaned up every 24 hours by default. Consequently, this can lead to unwanted disk usage under apps/spark-dist/work.
This issue will be addressed in Fusion 3.0.1 and only applies for users with signals on Fusion 3.0.0 running the spark-master and spark-worker processes. To work-around this issue, you have a few options:
- Avoid running the spark-master and -worker processes for signal node deployments and/or small number of signals; local mode works well for small number of signals (<10M).
- Update the schedule for the signals aggregation job to run less frequently, such as once per hour.
- Cleanup worker data more frequently by setting the spark.worker.cleanup.appDataTtl configuration property (value in secs); default is once per 24 hours (86400 secs).