Deploy modes in Apache Spark

Reading Time: 2 minutes

Spark is an open-source framework engine that has high-speed and easy-to-use nature in the field of big data processing and analysis. Spark has some built-in modules for graph processing, machine learning, streaming, SQL, etc. The spark execution engine supports in-memory computation that makes it faster and cyclic data flow and it can run either on cluster mode or standalone mode and can also access diverse data sources like HBase, HDFS, Cassandra, etc.

Spark deploy modes

When for execution, we submit a spark job to local or on a cluster, the behaviour of spark job totally depends on one parameter, that is the “Driver” component. Where the “Driver” component of the spark job will reside, it defines the behaviour of the spark job.

There are 2 deployment modes in Apache Spark:-

  1. Client mode
  2. Cluster mode

1. Client mode

In the mode, the Spark driver component runs on the machine node where we submit the spark job. This mode supports both interactive shells and the job submission commands.

The main disadvantage of the client mode is:-

  • If the machine node fails, then the entire job fails.
  • The performance of this mode is worst and is not preferred in production environments.

To overcome the worst situations that occur in this mode we use Cluster Mode for deployment in a production environment.

2. Cluster mode

The deploy mode is said to be in cluster mode if the spark job driver component does not run on the machine from which the spark job has been submitted.

  • The spark job launches the driver component within the cluster as a part of the sub-process of ApplicationMaster.
  • In this mode, we use the spark-submit command for deployment and it does not support interactive shell mode.
  • In case the program fails, the driver program is re-instantiated because the driver programs are run in ApplicationMaster.
  • In this mode, there is a dedicated cluster manager (such as stand-alone, YARN, Apache Mesos, Kubernetes, etc) for allocating the resources required for the job to run as shown in the below architecture.

Conclusion

In this blog, We get to know about the deployment modes in Apache Spark. It depends upon our goals which deploy modes of spark are best for us.

Written by 

Rakhi Pareek is a Software Consultant at Knoldus. She believes in continuous learning with new technologies. Her current practice area is Scala. She loves to maintain a diary to put on her thoughts daily. Her hobby is doodle art.