Issue
A job remains in a persistent "running" state and cannot be stopped using the Fusion UI or API. Attempts to abort the job via Postman or API return errors such as:
Job couldn't be abortedThis issue commonly affects scheduled or recurring jobs such as refresh-autocomplete.
Diagnosis
This issue may be confirmed by the following observations:
Job status remains in "running" beyond its expected execution time window.
Restarting the job does not change its state.
Fusion logs (
job-configpod) show Kafka publish errors, such as:
WARN [kafka-producer-network-thread | job.kafka.event.publisher-<ID>] - Error connecting to node <kafka-host>:9092Or:
WARN [KafkaEventPublisher@44] - Fusion CRUD event JobStatusEvent(...) failed to be published to fusion.system.job.eventThese errors suggest a failure to update job state in Kafka, resulting in Fusion being unable to reconcile the job’s state.
Environment
Fusion 5.9.x and above, running on Kubernetes.
Cause
This issue is typically caused by the job-config microservice failing to publish job status updates to Kafka due to a temporary networking or internal cluster error. When job status updates cannot be communicated, the job remains stuck in a "running" state, and Fusion is unable to abort or reconcile its state properly.
Resolution
Step 1: Restart the job-config pod
Use kubectl to restart the job-config pod:
kubectl delete pod -l app=job-config -n <fusion-namespace>This forces a restart and may resolve the stuck state if Kafka connectivity is restored.
Step 2: If unsuccessful, restart the job-launcher pod
If the job remains stuck, restart the job-launcher pod:
kubectl delete pod -l app=job-launcher -n <fusion-namespace>The job-launcher manages orchestration and job lifecycle reconciliation.
Step 3: Manually clear the system job history (if still stuck)
If restarting pods does not clear the job, use Fusion's internal endpoints to delete the stuck job's entry from system_job_history.
Warning: This should only be done after verifying that the job is not actively doing work and is safe to remove.
DELETE /api/system/jobs/history/<job-id>?job-name=<job-name>
You can retrieve the job ID via:GET /api/system/jobs/history?job-name=<job-name>After deletion, re-trigger the job as needed to confirm successful execution.