Spark with Spray Starter Kit

Over the last few months, Spark has gained a lot of momentum in Big Data world. It has won a lot competitions & surveys, like Daytona Gray Sort 100TB competition or becoming top level Apache Project and many more. Irrespective of whether it is a product which is a fast/general engine for large-scale data processing, Spark has found its use everywhere.

The best part about Spark being that is it can be combined with any tool to perform data crunching or to build REST Services, etc.

The Spray Starter Kit which is the main part of this post, is a template which allows you to accept incoming REST requests, route them to the corresponding Spark server, which provides a response and then carrying back this response to the actual user in a non-blocking, performant way.

We chose Spray as a REST service integration tool to build this template as Spray has become a de-facto standard in industry to build REST Services. Also, Spray has an asynchronous, actor-based, fast, lightweight and modular way to connect our Scala applications to the world.

The main characters of this example are
1. The SparkServices which is responsible for communication to and from Spark server.
2. Spark Server which would interact with the SparkServices.
3. Client call which would call a REST service on the SparkServices and await a response.

Let us see how the interactions happen. For simplicity, we would assume that the client is making a GET call on the browser with something like

http://localhost:8000/spark/version

Now the call lands up on the SparkServices which has exposed a REST service. The HTTP server on Spray is started like this

Ofcourse we can externalize a few things like Host & Port by placing them configuration(.conf) files.

If you notice, we are binding to the service called SparkServices. This what wraps over our routes.

and the sparkRoutes are defined as

Here, we are accepting a request as per the routes defined. It is a GET request of the kind /spark/version which tells us the Spark version we are using in this template.

Now comes the tricky part, i.e., how to make an JAR of our REST application and run it via spark-submit on a Spark server.

Obviously, to make the JAR file we can use sbt-assembly but there are some complexities in using it, since both Spark & Spray have some common files which needs a proper Merge strategy. So, to make a JAR file properly using sbt-assembly we can add following merge strategy in build.sbt file along with other settings.

Now we can create a JAR file of our application using sbt assembly command and deploy it on Spark cluster using spark-submit script of Spark. Or, we can simply run it on our local machine by using sbt run.

The entire starter kit is present here.

Written by 

Himanshu Gupta is a lead consultant having more than 4 years of experience. He is always keen to learn new technologies. He not only likes programming languages but Data Analytics too. He has sound knowledge of "Machine Learning" and "Pattern Recognition".He believes that best result comes when everyone works as a team. He likes listening to Coding ,music, watch movies, and read science fiction books in his free time.

10 thoughts on “Spark with Spray Starter Kit

  1. I am still new to spark and when I run the jar, I am getting a, “akka.jvm-exit-on-fatal-error”. Would you happen to know what the cause is?

  2. Hi,
    i am getting “com.typesafe.config.ConfigException$UnresolvedSubstitution: reference.conf: 194: Could not resolve substitution to a value: ${spray.version}”. Can somebody help?

  3. Hi ,
    I am getting com.typesafe.config.ConfigException$UnresolvedSubstitution: reference.conf: 194: Could not resolve substitution to a value: ${spray.version}. Somebody help.

Leave a Reply

%d bloggers like this: