Spark with Spray Starter Kit

Table of contents

Reading Time: 3 minutes

Over the last few months, Spark has gained a lot of momentum in Big Data world. It has won a lot competitions & surveys, like Daytona Gray Sort 100TB competition or becoming top level Apache Project and many more. Irrespective of whether it is a product which is a fast/general engine for large-scale data processing, Spark has found its use everywhere.

The best part about Spark being that is it can be combined with any tool to perform data crunching or to build REST Services, etc.

The Spray Starter Kit which is the main part of this post, is a template which allows you to accept incoming REST requests, route them to the corresponding Spark server, which provides a response and then carrying back this response to the actual user in a non-blocking, performant way.

We chose Spray as a REST service integration tool to build this template as Spray has become a de-facto standard in industry to build REST Services. Also, Spray has an asynchronous, actor-based, fast, lightweight and modular way to connect our Scala applications to the world.

The main characters of this example are
1. The SparkServices which is responsible for communication to and from Spark server.
2. Spark Server which would interact with the SparkServices.
3. Client call which would call a REST service on the SparkServices and await a response.

Let us see how the interactions happen. For simplicity, we would assume that the client is making a GET call on the browser with something like

http://localhost:8000/spark/version

Now the call lands up on the SparkServices which has exposed a REST service. The HTTP server on Spray is started like this

object StartSpark extends App {

// we need an ActorSystem to host our application in
implicit val actorSystem = ActorSystem("spark-services")
implicit val timeout = 30 seconds

// create and start our service actor
val service = actorSystem.actorOf(Props[SparkServices], "spark-services")

// start a new HTTP server on port 8000 with our service actor as the handler
IO(Http) ! Http.Bind(service, interface = "localhost", port = 8000)

}

Ofcourse we can externalize a few things like Host & Port by placing them configuration(.conf) files.

If you notice, we are binding to the service called SparkServices. This what wraps over our routes.

class SparkServices extends Actor with SparkService {
def actorRefFactory: ActorContext = context
def receive: Actor.Receive = runRoute(sparkRoutes)
}

and the sparkRoutes are defined as

val sparkConf: SparkConf = new SparkConf().setAppName("spark-spray-starter").setMaster("local")
val sc: SparkContext = new SparkContext(sparkConf)

val sparkRoutes =
path("spark" / "version") {
get {
complete {
HttpResponse(OK, "Spark version in this template is: " + sc.version)
}
}
}

Here, we are accepting a request as per the routes defined. It is a GET request of the kind /spark/version which tells us the Spark version we are using in this template.

Now comes the tricky part, i.e., how to make an JAR of our REST application and run it via spark-submit on a Spark server.

Obviously, to make the JAR file we can use sbt-assembly but there are some complexities in using it, since both Spark & Spray have some common files which needs a proper Merge strategy. So, to make a JAR file properly using sbt-assembly we can add following merge strategy in build.sbt file along with other settings.

mergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}

Now we can create a JAR file of our application using sbt assembly command and deploy it on Spark cluster using spark-submit script of Spark. Or, we can simply run it on our local machine by using sbt run.

The entire starter kit is present here.

10 thoughts on “Spark with Spray Starter Kit3 min read”

Reblogged this on pushpendupurkait.

Reblogged this on himanshu2014.

Reblogged this on Rishi Khandelwal.

Reblogged this on Play!ng with Scala.

Reblogged this on sandeepknol.

Reblogged this on ujali tyagi.

I am still new to spark and when I run the jar, I am getting a, “akka.jvm-exit-on-fatal-error”. Would you happen to know what the cause is?

Pingback: Using Spark , Spray and Couchbase for lightening fast REST Api’s | Knoldus

Hi,
i am getting “com.typesafe.config.ConfigException$UnresolvedSubstitution: reference.conf: 194: Could not resolve substitution to a value: ${spray.version}”. Can somebody help?

Hi ,
I am getting com.typesafe.config.ConfigException$UnresolvedSubstitution: reference.conf: 194: Could not resolve substitution to a value: ${spray.version}. Somebody help.

Comments are closed.