The Databricks Jobs API follows the guiding principles of the REST (Representational State Transfer) architecture. We can use either Databricks personal access token or password for the Authentication and access to Databricks REST API. The Databricks Jobs API 2.1 supports jobs with multiple tasks.
All Databricks Jobs API are mentioned below:
Creating a New Job
Users can send requests to the server to create a new task. The Databricks Jobs API uses the HTTP POST request method, which consists of a request body schema as follows:
Schema | Data Type | Description |
name | String | The default name is – “Untitled” It is an optional name for the job. |
job_clusters | Array | It is a list of job cluster specifications. In a shared job cluster we can’t declare libraries. You can declare it in task settings. |
tags | Object | The default tag is – “{}” A map of tags associated with the job. At max 25 tags can be added to the job. |
tasks | Array | Consist of a list of task specifications to be executed by the job. |
email_notifications | Object | In case of success or failure, it will notify the particular email. |
timeout_seconds | integer | The default behavior is to have no timeout. But we can specify the timeout for each job. |
schedule | Object | It is for running a job at a user-defined time. |
access_control_list | Array | Consist of a List of permissions to set on the job. |
The request from Databricks Jobs API results in any of the four responses –
- 200: Indicates that the job was successfully created.
- 400: Indicates that the request was malformed.
- 401: Indicates that the request was unauthorized.
- 500: Indicates that the request was not handled correctly due to a server error.
Example -
URL - https://<databricks-instance>/api/2.1/jobs/create
{
"name": "test_job",
"email_notifications": {
"no_alert_for_skipped_runs": false
},
"webhook_notifications": {},
"timeout_seconds": 0,
"max_concurrent_runs": 1,
"tasks": [
{
"task_key": "test_notebook",
"notebook_task": {
"notebook_path": "/aws/test",
"source": "WORKSPACE"
},
"job_cluster_key": "test_cluster",
"timeout_seconds": 36000,
"email_notifications": {}
}
],
"job_clusters": [
{
"job_cluster_key": "test_cluster",
"new_cluster": {
"cluster_name": "",
"spark_version": "11.3.x-scala2.12",
"spark_conf":
"spark.databricks.delta.formatCheck.enabled": "false"
},
"aws_attributes": {
"first_on_demand": 6,
"availability": "SPOT_WITH_FALLBACK",
"zone_id": "auto",
"spot_bid_price_percent": 100,
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100
},
"node_type_id": "r6g.8xlarge",
"custom_tags": {
"Function": "sparkcluster",
"CreatedBy": "autocluster",
"ManagingTeamEmail": "xyz@gmail.com,
"CodeMaturity": "dev"
},
"enable_elastic_disk": true,
"runtime_engine": "STANDARD",
"autoscale": {
"min_workers": 3,
"max_workers": 10
}
}
}
],
"format": "MULTI_TASK"
}
Listing All the Jobs
We have a GET HTTP method for listing all jobs. We will get the result in JSON format including details of each job.
Example
URL - https://<databricks-instance>/api/2.1/jobs/list
{
"jobs": [
{
"job_id": 12589654,
"creator_user_name": "xyz@gmail.com",
"settings": {
"name": "test_job",
"tags": {
"cost-center": "engineering",
"team": "jobs"
},
"tasks": [
{
"task_key": "test1",
"description": "Running test1 job",
"depends_on": [],
"existing_cluster_id": "0923-164208-meows279",
"spark_jar_task": {
"main_class_name": "com.databricks.test",
},
"libraries": [
{
"jar": "dbfs:/mnt/test/databricks/jdbc.jar"
}
],
"timeout_seconds": 86400,
"max_retries": 3,
"min_retry_interval_millis": 2000,
"retry_on_timeout": false
},
{
"task_key": "test2",
"description": "Running test2 job",
"depends_on": [],
"job_cluster_key": "auto_scaling_cluster",
},
"libraries": [
{
"jar": "dbfs:/mnt/databricks/jdbc.jar"
}
],
"timeout_seconds": 86400,
"max_retries": 3,
"min_retry_interval_millis": 2000,
"retry_on_timeout": false
},
{
"task_key": "test3",
"description": "Matches orders with user sessions",
"depends_on": [
{
"task_key": "test1"
},
{
"task_key": "test2"
}
],
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "i3.xlarge",
"spark_conf": {
"spark.speculation": true
},
"aws_attributes": {
"availability": "SPOT",
"zone_id": "us-west-2a"
},
"autoscale": {
"min_workers": 2,
"max_workers": 16
}
},
"notebook_task": {
"notebook_path": "/Users/test/test_notebook",
"source": "WORKSPACE",
"base_parameters": {
"name": "John Doe",
"age": "35"
}
},
"timeout_seconds": 86400,
"max_retries": 3,
"min_retry_interval_millis": 2000,
"retry_on_timeout": false
}
],
"job_clusters": [
{
"job_cluster_key": "auto_scaling_cluster",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "i3.xlarge",
"spark_conf": {
"spark.speculation": true
},
"aws_attributes": {
"availability": "SPOT",
"zone_id": "us-west-2a"
},
"autoscale": {
"min_workers": 2,
"max_workers": 16
}
}
}
],
"timeout_seconds": 86400,
"schedule": {
"quartz_cron_expression": "20 30 * * * ?",
"timezone_id": "Europe/London",
"pause_status": "PAUSED"
},
"max_concurrent_runs": 10,
"format": "MULTI_TASK"
},
"created_time": 1601370337343
}
],
"has_more": false
}
You can also get the details of a single job through its job_id. Just you have to pass the job_id in the parameter and you will get the details of that particular job.
Updating and Resetting Jobs
We have a reset endpoint in Databricks Job API for updating the job. With the help of this endpoint, you can easily add, update, or remove specific settings of an existing job.
This is a POST HTTP request and the body schema is defined below –
job_id | Integer | job_id is the unique id associated with every job at the time of creation. It is a mandatory field while updating the job. |
new_settings | Object | The new settings which you want to update. |
fields_to_remove | Array | Remove top-level fields in the job settings. Removing nested fields is not supported. This field is optional. |
Example
URL - https://<databricks-instance>/api/2.1/jobs/update
{
"job_id": 123456789,
"creator_user_name": "databricks_user",
"run_as_user_name": "databricks_user",
"run_as_owner": true,
"new_settings": {
"name": "test_job",
"email_notifications": {
"no_alert_for_skipped_runs": false
},
"timeout_seconds": 0,
"max_concurrent_runs": 2,
"tasks": [
{
"task_key": "test",
"notebook_task": {
"notebook_path": "/User/test_notebook",
"source": "WORKSPACE"
},
"job_cluster_key": "test_job_cluster",
"timeout_seconds": 0,
"email_notifications": {}
}
],
"job_clusters": [
{
"job_cluster_key": "test_job_cluster",
"new_cluster": {
"cluster_name": "",
"spark_version": "7.3.x-scala2.12",
"aws_attributes": {
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"zone_id": "auto",
"spot_bid_price_percent": 100,
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 3,
"ebs_volume_size": 100
},
"node_type_id": "r5.4xlarge",
"enable_elastic_disk": false,
"autoscale": {
"min_workers": 30,
"max_workers": 70
}
}
}
],
"format": "MULTI_TASK",
"access_control_list": [
{
"user_name": "xyz@gmail.com",
"permission_level": "CAN_MANAGE"
}
]
}
}
As a result, you will receive any of the four responses mentioned above.
Deleting a Job and Task
For deleting any job we have a data bricks POST API which will delete the job based on the job_id. In the body, you have to only pass the job_id in JSON format and we will get a result in any of the four responses mentioned above.
Example
URL - https://<databricks-instance>/api/2.1/jobs/delete
{
"job_id": 123456789
}
Benefits of using Databricks Jobs API
- Through Databricks Jobs API we can create, modify, list, delete, and check job runs using API requests without getting in touch with the UI.
- It can be integrated with any language which can initiate a request
- Integrating the Databricks Jobs API with other tools enables event-based triggers for Databricks jobs, creating more efficient runs.
Conclusion
In this blog, you learned about Databricks and the basic operations of Databricks Jobs API. You have got the idea of how to create, list, update and delete the job and the body/parameter required for every request. For more such blogs you can click here
References
https://hevodata.com/learn/databricks-jobs-api/
https://mcloud.devoteam.com/expert-view/taking-advantage-of-the-databricks-apis-to-trigger-and-monitor-jobs/
