Know more about Jobs in Kubernetes

Table of contents

Reading Time: 4 minutes

As we know, all the controllers have the responsibility of starting up pods and running them continuously, in fact, ensuring that they always stay online based on the type of controller that we’re working with, whether it be a deployment, a ReplicaSet, or a DaemonSet. But what if there is a need to run just a single task in the cluster, or a need to run that single task periodically in the cluster? This is where two additional controller types came into picture, the job and the CronJob. Jobs in Kubernetes have the responsibility of creating one or more pods.

The main purpose behind the job is to ensure that the pods created by the job complete their tasks and terminates successfully. It’s not necessary for all the pods to terminate successfully, and only then the job would be considered done. NOOO !!!
A job is considered completed, even when the specified number of completions is reached.

NOTE :

Pods are just container-based applications that run code. And so it’s the responsibility of a job to run a program in a container until its completion. And it’s the responsibility of the job controller to ensure that the specified number of pods complete successfully.

A job is specified to run pods that will not run indefinitely.
Successful completion of pods means that the jobs have completed successfully.
Deleting a job will delete all the pods that it creates.
If a job has a specified number upto which times, it has to be carried out; and if any of those times, a pod fails, the job starts a new pod to replace the older versions, keeping the number of pods required to run, the same.
i.e,
If a job had to run 5 times, and imposingly after the first two runs, the third run fails, then that run will be restarted, thereby making the total number of runs to six, in order to maintain the total number of successful runs to five.

Job Spec

The job spec includes :

apiVersion
kind
metadata
podTemplate section
pod selector field is optional.

RestartPolicy

The RestartPolicy can be set to Never or OnFailure.
A container in a pod can fail due to any number of reasons. So in such a case, if the restartPolicy in the spec.template.spec section is set to “OnFailure”, then the pod stays on the node, and the container is started again.

But if in such a case, if the restartPolicy is set to “Never”, then the job controller starts a new pod.

The restartPolicy of a pod in the spec section always defaults to Always. Job pods can’t use the default policy, because they’re not meant to run indefinitely. Therefore, one needs to explicitly set the restart policy to either OnFailure or Never.

Parallel Jobs

There are many variations in such a case :

Non-parallel jobs :

Here, normally only one pod is up, unless that pod fails. Only when a pod fails, a new pod is started.
The job is said to be completed as soon as the pods terminates successfully.

Running multiple pod instances in a Job

Use of completions and parallelism fields :

Jobs can be configured to create more than one pod instances and run them in parallel or in a sequential manner.
completions :

The spec.completions field is assigned a non-zero positive integer.
In order to run a job more than once, set completions to how many times you want the Job’s pod to run. parallelism :
One can specify how many pods should run in parallel,
with the parallelism Job spec property.

for example, By setting parallelism to 2, the Job creates two pods and runs them in parallel

The parallelism property of the job can be changed even when the job is running, with the help of scale command.
kubectl scale job job-name --replicas count

Few extra fields that can be used with jobs :

backOffLimit :

This is useful when we want to a job to be considered as a failure after some amount of retries due to a logical error,etc.
Use .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6.

activeDeadlineSeconds :

A pod’s time can be limited by setting the activeDeadlineSeconds field in the pod spec. If the pod runs longer than that, the system will try to terminate it and will mark the Job as failed.

Note :

Note that a Job’s .spec.activeDeadlineSeconds takes precedence over its .spec.backoffLimit.
Even after the termination of a job, the pods are not deleted, instead they are preserved in order for us to view the logs.

ttlSecondsAfterFinished
This is again a way of cleanup of finished jobs (either Completed or Failed). The field .spec.ttlSecondsAfterFinished specifies the time period for how long a finished job will persist in the system. If this field is set to zero, the jobs are automatically deleted / cleaned up as soon as they are finished. But if the field is unset, the job won’t be cleaned up the TTL controller. Important Note :

In order to delete a job but leave it’s pods running, perform

 kubectl delete job job-name --cascade=false

Use case of jobs :

In a Message queuing system, ( consider as a producer – consumer service ) : The producer can send the message and exit, but each message has to be processed by a consumer, that can be done by creating a job each for all the consumers.
One-time initialising of resources such as databases, etc.

CronJobs

Jobs are classified as :

Runs to completion ( JOBS )
Scheduled jobs ( CRONJOBS )

A cronjob creates a job to be repeated on a schedule. A cron job in Kubernetes is configured by creating a CronJob resource. The schedule for running the job is specified in the well-known cron format.
All CronJob schedules times are based on the timezone of the kube-controller-manager.

The cron schedule format
From left to right, the schedule contains the following five entries:

Minute
Hour
Day of month
Month
Day of week.

Ending Note :
Pods that perform a batch task should be created through a Kubernetes Job resource, and not directly or through a ReplicationController, etc.
Jobs that need to run sometime in the future can be created through CronJob resources.