Goal:
This article aims to simplify the process of checking the status of multiple jobs created in different Fusion instances using a script written in Python. It provides an automated way to track job status and receive alerts, eliminating the need for manual checks.
Environment:
Fusion 4.2.6 and above.
Guide:
Fusion has several Job APIs that allow you to check status, define schedules and configure parameters for different types of jobs such as datasources, tasks, and Spark jobs.
Datasource
A job to ingest data according to the specified datasource configuration, such as datasource:movie-db
. Datasources are created using the Connector Datasources API or the Fusion UI.
Spark
Spark job to process data, such as spark:dailyMetricsRollup-counters
. Spark jobs are created using the Spark Jobs API or the Fusion UI.
Task
A job to perform an HTTP call or log cleanup, such as task:delete-old-system-logs
. Tasks are created using the Tasks API or the Fusion UI.
A few sample API endpoints that we use in this script to check the job status are given below.
http://35.230.114.182:6764/api/jobs/spark:milvus_test_collection_create
http://35.230.114.182:6764/api/jobs/task:delete-old-job-history
http://35.230.114.182:6764/api/jobs/datasource:web_test
As you can see in the examples, /api/jobs/ is the URL path we follow and datasource:web_test is the type and the name of the job for which we're trying to get the status. We need to specify the appropriate job type (such as spark, task or datasource)in the API calls in order to get a valid response from these endpoints.
A sample python script that takes a list of Job URLs and iterates through it to get the status and send an email alert is given below for your reference.
# Import required modules
import smtplib
import requests
import json
from requests.auth import HTTPBasicAuth
# smtpServer: Stores the SMTP server address for sending emails.
# fromaddr: Defines the sender email address for the notification.
# toaddr: Lists the recipient email addresses for the notification.
# subj: Specifies the subject line of the email notification.
smtpServer = '*****.****.lucidworks.com' # SMTP server that handles email requests
fromaddr = 'Job Status'
toaddr=['******@lucidworks.com']
subj = 'Sachin_Dev Job Status'
# This function (checkJobStatus) iterates through a list of job URLs, Makes a GET request with basic authentication & Loads the JSON response data
# Filters out specific fields from the JSON to get the job status, Checks if the response has a "counter.output" key to identify document count in datasource jobs
# Calls the email function to send an alert to the list of e-mail id's.
def checkJobStatus(joburl):
response2 = []
for x in joburl:
response = requests.get(x, auth=HTTPBasicAuth('*****', '*******'))
data = json.loads(response.text)
last_part = x.rsplit("/", 1)[-1]
if 'counter.output' in data["extra"]:
response1 = "Status of the job "+last_part+" is:", data["status"] +" Last run completed on: "+data["lastEndTime"]+" Counter Output: "+str(data["extra"]["counter.output"])
response2.append(response1)
else:
response1 = "Status of the job "+last_part+" is:", data["status"] +" Last run completed on: "+data["lastEndTime"]
response2.append(response1)
print(response2)
#email(toaddr, fromaddr, subj, response2, smtpServer)
# This function (email) builds an email message, connects to the SMTP server and sends the email with the message content and recipient/sender details
def email(toaddr, fromaddr, subj, response2, smtpServer):
msg = ("From: %s\r\nTo: %s\r\nSubject: %s\r\n" % (fromaddr, toaddr, subj))
server = smtplib.SMTP(smtpServer)
server.sendmail(fromaddr, toaddr, msg + response2)
server.quit()
# Main function calls the checkJobStatus function with the list of endpoints to check the job status
if __name__ == '__main__':
listOfJobs = ['http://35.230.114.182:6764/api/jobs/spark:milvus_test_collection_create',
'http://35.230.114.182:6764/api/jobs/task:delete-old-job-history',
'http://35.230.114.182:6764/api/jobs/datasource:web_test']
checkJobStatus(listOfJobs)
Output
Status of the job spark:milvus_test_collection_create is: success Last run completed on: 2023-11-15T13:28:23.960Z
Status of the job task:delete-old-job-history is: success Last run completed on: 2023-12-03T04:25:00.239Z
Status of the job datasource:web_test is: success Last run completed on: 2023-11-16T17:04:00.888Z Counter Output: 48
The listOfJobs is a python list where we can add any number of Job URLs based on our requirement and the script prints the status as well as sends out an email alert for all the jobs based on the SMTP configurations you have coded in the smtpServer global variable's section.
checkJobStatus function iterates through a list of job URLs, creates a GET request with basic authentication & loads the JSON response data. For each job URL, it retrieves the status information and extracts relevant details (currently limited to status, last complete time, document counter)and stores it to the variable response1. Finally, all the status information is combined and stored under response2 which is included in the email body.
NOTE: email function is currently commented out since the smtpServer configurations need to be changed based on your environment.
You can further customise the script based on your requirement and schedule it to run on specific intervals based on the schedule of the Fusion jobs.
Automating Script at Scheduled Intervals
Linux
Here's how to schedule a Python script with Cron:
- Open the crontab editor:
crontab -e
- Choose an editor (e.g., nano) and add a new line specifying the schedule and command.
- For example, to run your script every hour at 10 minutes past:
10 * * * * python /path/to/your/script.py
- Save and exit the editor.
More details here: https://cronitor.io/guides/cron-jobs
Windows
Here's how to schedule a Python script with Task Scheduler:
- Open Task Scheduler: Search for "Task Scheduler" in the Start menu.
- Click "Create Basic Task..." in the right pane.
- Give your task a name and description.
- Choose "Start a program" for the trigger and click Next.
- Browse to your Python executable file (e.g.,
python.exe
). - In the "Add arguments" field, enter the path to your Python script as an argument.
- Set your desired schedule (e.g., daily, hourly, specific time).
- Click Finish to create the task.
More details here: https://www.windowscentral.com/how-create-automated-task-using-task-scheduler-windows-10
Comments
0 comments
Article is closed for comments.