Start/Deploy Apache Spark application programmatically using spark launcher


Sometimes we need to start our spark application from the another scala/java application. So we can use SparkLauncher. we have an example in which we make spark application and run it with another scala application.

Let see our spark application code.


import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object SparkApp extends App{
val conf=new SparkConf().setMaster("local[*]").setAppName("spark-app")
val sc=new SparkContext(conf)
val rdd=sc.parallelize(Array(2,3,2,1))
rdd.saveAsTextFile("result")
sc.stop()
}

This is our simple spark application, make a jar of this application using sbt assembly, now we make a scala application through which we start this spark application as follows:

import org.apache.spark.launcher.SparkLauncher

object Launcher extends App {

val spark = new SparkLauncher()
.setSparkHome("/home/knoldus/spark-1.4.0-bin-hadoop2.6")
.setAppResource("/home/knoldus/spark_launcher-assembly-1.0.jar")
.setMainClass("SparkApp")
.setMaster("local[*]")
.launch();
spark.waitFor();

}

In the above code we use SparkLauncher object and set values for its like

setSparkHome(“/home/knoldus/spark-1.4.0-bin-hadoop2.6”) is use to set spark home which is use internally to call spark submit.

.setAppResource(“/home/knoldus/spark_launcher-assembly-1.0.jar”) is use to specify jar of our spark application.

.setMainClass(“SparkApp”) the entry point of the spark program i.e driver program.

.setMaster(“local[*]”) set the address of master where its start here now we run it on loacal machine.

.launch() is simply start our spark application.

Its a minimal requirement you can also set many other configurations like pass arguments, add jar , set configurations etc.

For source code you can check out following git repo:

Spark_laucher is our spark application

launcher_app is our scala application which start spark application

Change path according to you and make a jar of Spark_laucher, run launcher_app and see result RDD in this directory as a result of spark application because we simple save it as a text file.

https://github.com/phalodi/Spark-launcher

Advertisements

About sandeep

I m working as an software consultant in Knoldus Software LLP . I m working on scala, play, spark,hive, hdfs, hadoop and many big data technologies.
This entry was posted in apache spark, Scala, Spark and tagged , . Bookmark the permalink.

2 Responses to Start/Deploy Apache Spark application programmatically using spark launcher

  1. RW says:

    Can this be used with py Spark? (Its a weird use case but I am trying to submit a PySpark job from a Scala Application).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s