Spark with Spray Starter Kit


Over the last few months, Spark has gained a lot of momentum in Big Data world. It has won a lot competitions & surveys, like Daytona Gray Sort 100TB competition or becoming top level Apache Project and many more. Irrespective of whether it is a product which is a fast/general engine for large-scale data processing, Spark has found its use everywhere.

The best part about Spark being that is it can be combined with any tool to perform data crunching or to build REST Services, etc.

The starter kit which is the main part of this post, is a template which allows you to accept incoming REST requests, route them to the corresponding Spark server, which provides a response and then carrying back this response to the actual user in a non-blocking, performant way.

We chose Spray as a REST service integration tool to build this template as Spray has become a de-facto standard in industry to build REST Services. Also, Spray has an asynchronous, actor-based, fast, lightweight and modular way to connect our Scala applications to the world.

The main characters of this example are
1. The SparkServices which is responsible for communication to and from Spark server.
2. Spark Server which would interact with the SparkServices.
3. Client call which would call a REST service on the SparkServices and await a response.

Let us see how the interactions happen. For simplicity, we would assume that the client is making a GET call on the browser with something like

http://localhost:8000/spark/version

Now the call lands up on the SparkServices which has exposed a REST service. The HTTP server on Spray is started like this

object StartSpark extends App {

 // we need an ActorSystem to host our application in
 implicit val actorSystem = ActorSystem("spark-services")
 implicit val timeout = 30 seconds

 // create and start our service actor
 val service = actorSystem.actorOf(Props[SparkServices], "spark-services")

 // start a new HTTP server on port 8000 with our service actor as the handler
 IO(Http) ! Http.Bind(service, interface = "localhost", port = 8000)

}

Ofcourse we can externalize a few things like Host & Port by placing them configuration(.conf) files.

If you notice, we are binding to the service called SparkServices. This what wraps over our routes.

class SparkServices extends Actor with SparkService {
 def actorRefFactory: ActorContext = context
 def receive: Actor.Receive = runRoute(sparkRoutes)
}

and the sparkRoutes are defined as

val sparkConf: SparkConf = new SparkConf().setAppName("spark-spray-starter").setMaster("local")
val sc: SparkContext = new SparkContext(sparkConf)

 val sparkRoutes =
   path("spark" / "version") {
     get {
       complete {
         HttpResponse(OK, "Spark version in this template is: " + sc.version)
       }
     }
   }

Here, we are accepting a request as per the routes defined. It is a GET request of the kind /spark/version which tells us the Spark version we are using in this template.

Now comes the tricky part, i.e., how to make an JAR of our REST application and run it via spark-submit on a Spark server.

Obviously, to make the JAR file we can use sbt-assembly but there are some complexities in using it, since both Spark & Spray have some common files which needs a proper Merge strategy. So, to make a JAR file properly using sbt-assembly we can add following merge strategy in build.sbt file along with other settings.

mergeStrategy in assembly := {
 case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
 case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
 case "reference.conf" => MergeStrategy.concat
 case _ => MergeStrategy.first
 }

Now we can create a JAR file of our application using sbt assembly command and deploy it on Spark cluster using spark-submit script of Spark. Or, we can simply run it on our local machine by using sbt run.

The entire starter kit is present here.

Advertisements
This entry was posted in Scala, Spark, Web, Web Services and tagged , , , , , . Bookmark the permalink.

10 Responses to Spark with Spray Starter Kit

  1. Tony says:

    I am still new to spark and when I run the jar, I am getting a, “akka.jvm-exit-on-fatal-error”. Would you happen to know what the cause is?

  2. Pingback: Using Spark , Spray and Couchbase for lightening fast REST Api’s | Knoldus

  3. Venkat says:

    Hi,
    i am getting “com.typesafe.config.ConfigException$UnresolvedSubstitution: reference.conf: 194: Could not resolve substitution to a value: ${spray.version}”. Can somebody help?

  4. Venkat says:

    Hi ,
    I am getting com.typesafe.config.ConfigException$UnresolvedSubstitution: reference.conf: 194: Could not resolve substitution to a value: ${spray.version}. Somebody help.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s