How to run an application on Standalone cluster in Spark?


In this blog, we will run an application on a standalone cluster in spark.

cluster-overview

Steps
1. Launch the cluster.
2. Create a package of the application.
3. Run command to launch

Step-1:

To run an application on standalone cluster, we need to run a cluster on our standalone machine. for that refer this blog.

Click Here
Your Master and slaves, all should be alive. In my case, i have 3 slave instances with 1024 MB Mamory.

Step-2:

First of all we will create a package of the application. the package is a jar file of the application.
To create package we will follow these commands:

$ cd <path-of application>    //It will take us to the directory of the application.
$ sbt package             //It will create a package of the application.Screenshot from 2015-03-27 15:22:55

Screenshot from 2015-03-27 15:23:27

The package will be created in “target/scala-2.11/<application-name>.jar”. It is ready to use. As you can see in the picture.

Step-3:

To run an application we use “spark-submit” command to run “bin/spark-submit” script. It takes some options-:

 –class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
–master: The master URL for the cluster (e.g. spark://knoldus-vostro-3546:7077)

Screenshot from 2015-03-27 14:45:39

–deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)*
–conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your         cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
application-arguments: Arguments passed to the main method of your main class, if any

To run this command we have to go to Spark folder, remember this is location of your spark directory (e.g-: My spark is located in this directory “/home/knoldus/Softwares/spark-1.3.0/”)

Screenshot from 2015-03-27 15:28:25

$ cd <path to spark folder>        //e.g. cd  /home/knoldus/Softwares/spark-1.3.0/
$ ./bin/spark-submit \
–master spark://knoldus-Vostro-3546:7077 \
–class sparkSql.SparkSQLjson \
/home/knoldus/Desktop/spark-1-3-0_2.11-1.0.jar        //here we are using spark-submit command with some options.

here –master spark://IP:PORT   (it can be your taken from your Spark master UI page. http://localhost:8080/)
–class (It is for the class file which you want to run. e.g. main class)

At last you have to give path of your application package (jar file)

Note: the sparkSQL.SparkSQLjson$.class is according to my jar file. you have to give it according to your package.
if your class file it in sparkSql/SparkSQLjson.class folder in the jar, so you have to write it like this, sparkSql.SparkSQLjson

You can take a look to your Spark UI http://localhost:8080 to check Running/Completed Applications.

This is how you can run an application on cluster.

Advertisements

About pushpendupurkait

I am RHCE, DB2, Openstack certified. working on scala and AngularJs for Knoldus.
This entry was posted in Scala. Bookmark the permalink.

2 Responses to How to run an application on Standalone cluster in Spark?

  1. Pingback: How to run an application on Standalone cluster in Spark? | Apache Spark Central

  2. Priyank D says:

    Is it possible to run application without spark-submit or spark-shell? I have a scala app, I would like it to connect to spark master, submit job and get results.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s