Deploy a Spark Application on Cluster


In one of our previous blog, Setup a Apache Spark Cluster in your Single Standalone Machine, we showed how to setup a standalone cluster for running spark applications. But we never discussed on how to deploy our Spark applications on that cluster. So, in this blog, we will see how to deploy our Spark application on a cluster and use it to run our spark jobs.

For deploying our Spark application on a cluster, we need to use the spark-submit script of Spark. We can find this script in bin folder of Spark distribution. But, before we dive into the details of the command/script spark-submit, lets take a look at a sample command to deploy our Spark application on Spark.

spark-submit --class MainClass --master spark://127.0.0.1:7077 spark-application.jar

Now, that we have looked at the command to deploy the spark application, lets dive into its details.

  • The first thing we see is spark-submit command. We need this command to submit our application to the cluster.
  • Next is the –class MainClass argument, which specifies the main class to run when application is submitted to the cluster.
  • After that we have –master spark://127.0.0.1:7077 argument, through which we provide the URL of Master Node of Spark Cluster.
  •  At last we have spark-application.jar, which is the jar of the spark application which we want to deploy on Spark Cluster.

The above mentioned command contains the minimum number of arguments required for deploying a Spark application on a Cluster. But, spark-submit contains other options too, which we can learn about using command spark-submit –help.

Now, when we have understood the command for deploying our spark application on a cluster, lets see it working with a example.

  • Download a sample Spark Application from following link: https://github.com/knoldus/spark-spray-starter
  • Create a JAR of the downloaded Spark application using command:
     sbt clean assembly 
  • Then deploy the application using following command:
     spark-submit --class com.knoldus.sprayservices.StartSpark --master spark://localhost:7077 path/to/spark-spray-starter/target/scala-2.10/spark-spray-starter-assembly-1.0.jar 

And we are done !!! Our Spark application is deployed on the cluster. We can verify our applications’ status on http://localhost:8080/

This entry was posted in Scala, Spark and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s