Issue
When triggering a signal aggregation job (such as click signals), the job fails with one or more of the following errors:
FileNotFoundException related to a missing temporary JAR file.
Kubernetes 422 error due to executor CPU request exceeding namespace limits.
Forbidden error on shutdown hook when listing persistent volume claims (PVCs).
Diagnosis
Check the Spark driver logs for the following patterns:
Missing JAR file reference in
spark.jarsorspark.repl.local.jars.Pod creation error:
Invalid value: "X": must be less than or equal to cpu limit of YForbidden error in shutdown hook:
persistentvolumeclaims is forbidden: User "system:serviceaccount:<namespace>:<service-account>" cannot list resource "persistentvolumeclaims"
Use the following command to inspect job-related Spark properties:
kubectl -n <namespace> logs <driver-pod-name> | grep -i 'spark.jars\|repl.local.jars'Environment
Fusion 5.9.7
Kubernetes (EKS, version 1.29)
Self-hosted deployment
Cause
A stale or missing JAR reference left over from a previous job attempt.
Spark executor CPU request exceeds the namespace’s resource limit policy.
The Spark job's service account does not have permission to list or delete PVCs during shutdown.
Resolution
1. Remove stale or invalid JAR references
In the job's Spark configuration, clear the following fields if they reference non-existent JARs:
spark.jars
spark.repl.local.jarsAllow Fusion to manage the classpath internally instead of referencing specific uploaded JARs.
2. Align executor CPU requests with namespace policy
If the namespace enforces a CPU limit (e.g., 1 core), configure the job's executor to match:
spark.kubernetes.executor.request.cores=1
spark.executor.cores=1
spark.executor.instances=2Alternatively, if more CPU per executor is required, first inspect the namespace limit:
kubectl -n <namespace> describe limitrangesThen, explicitly configure both the request and limit in the job:
spark.kubernetes.executor.request.cores=3
spark.kubernetes.executor.limit.cores=3
spark.executor.cores=33. Grant RBAC permissions to clean up PVCs
If the Spark job fails on shutdown with a PVC access error, ensure the service account has the correct permissions:
kubectl -n <namespace> create role spark-pvc-cleanup \
--verb=get,list,watch,delete \
--resource=persistentvolumeclaims
kubectl -n <namespace> create rolebinding spark-pvc-cleanup-binding \
--role=spark-pvc-cleanup \
--serviceaccount=<namespace>:<job-launcher-service-account>4. Review SQL rollup fields
If the final rollup query references a field not present in the aggregated dataset, either:
Remove the field from the query.
Ensure it is included in the first aggregation's
SELECTandGROUP BYclauses so it carries through.
Example of valid rollup SQL if params_dv_mv_partners_ss is unnecessary:
SELECT concat_ws('|', query_s, doc_id_s, filters_s) as id,
query_s,
query_s as query_t,
doc_id_s,
filters_s,
first(aggr_type_s) AS aggr_type_s,
SPLIT(filters_s, ' \\$ ') AS filters_ss,
SUM(weight_d) AS weight_d,
SUM(aggr_count_i) AS aggr_count_i
FROM my_signals_aggr
GROUP BY query_s, doc_id_s, filters_sIf the field is required, it must be included in both aggregation layers.