In this blog, we will run an application on a standalone cluster in spark.
Steps
1. Launch the cluster.
2. Create a package of the application.
3. Run command to launch
Step-1:
To run an application on standalone cluster, we need to run a cluster on our standalone machine. for that refer this blog.
Click Here
Your Master and slaves, all should be alive. In my case, i have 3 slave instances with 1024 MB Mamory.
Step-2:
First of all we will create a package of the application. the package is a jar file of the application.
To create package we will follow these commands:
$ cd <path-of application> //It will take us to the directory of the application.
$ sbt package //It will create a package of the application.
The package will be created in “target/scala-2.11/<application-name>.jar”. It is ready to use. As you can see in the picture.
Step-3:
To run an application we use “spark-submit” command to run “bin/spark-submit” script. It takes some options-:
–class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
–master: The master URL for the cluster (e.g. spark://knoldus-vostro-3546:7077)
–deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)*
–conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
application-arguments: Arguments passed to the main method of your main class, if any
To run this command we have to go to Spark folder, remember this is location of your spark directory (e.g-: My spark is located in this directory “/home/knoldus/Softwares/spark-1.3.0/”)
$ cd <path to spark folder> //e.g. cd /home/knoldus/Softwares/spark-1.3.0/
$ ./bin/spark-submit \
–master spark://knoldus-Vostro-3546:7077 \
–class sparkSql.SparkSQLjson \
/home/knoldus/Desktop/spark-1-3-0_2.11-1.0.jar //here we are using spark-submit command with some options.
here –master spark://IP:PORT (it can be your taken from your Spark master UI page. http://localhost:8080/)
–class (It is for the class file which you want to run. e.g. main class)
At last you have to give path of your application package (jar file)
Note: the sparkSQL.SparkSQLjson$.class is according to my jar file. you have to give it according to your package.
if your class file it in sparkSql/SparkSQLjson.class folder in the jar, so you have to write it like this, sparkSql.SparkSQLjson
You can take a look to your Spark UI http://localhost:8080 to check Running/Completed Applications.
This is how you can run an application on cluster.
Is it possible to run application without spark-submit or spark-shell? I have a scala app, I would like it to connect to spark master, submit job and get results.