In one of our previous blog, Setup a Apache Spark Cluster in your Single Standalone Machine, we showed how to setup a standalone cluster for running spark applications. But we never discussed on how to deploy our Spark applications on that cluster. So, in this blog, we will see how to deploy our Spark application on a cluster and use it to run our spark jobs.
For deploying our Spark application on a cluster, we need to use the spark-submit script of Spark. We can find this script in bin folder of Spark distribution. But, before we dive into the details of the command/script spark-submit, lets take a look at a sample command to deploy our Spark application on Spark.
Now, that we have looked at the command to deploy the spark application, lets dive into its details.
- The first thing we see is spark-submit command. We need this command to submit our application to the cluster.
- Next is the –class MainClass argument, which specifies the main class to run when application is submitted to the cluster.
- After that we have –master spark://127.0.0.1:7077 argument, through which we provide the URL of Master Node of Spark Cluster.
- At last we have spark-application.jar, which is the jar of the spark application which we want to deploy on Spark Cluster.
The above mentioned command contains the minimum number of arguments required for deploying a Spark application on a Cluster. But, spark-submit contains other options too, which we can learn about using command spark-submit –help.
Now, when we have understood the command for deploying our spark application on a cluster, lets see it working with a example.
- Download a sample Spark Application from following link: https://github.com/knoldus/spark-spray-starter
- Create a JAR of the downloaded Spark application using command:
- Then deploy the application using following command:
And we are done !!! Our Spark application is deployed on the cluster. We can verify our applications’ status on http://localhost:8080/