Over the last few months, Spark has gained a lot of momentum in Big Data world. It has won a lot competitions & surveys, like Daytona Gray Sort 100TB competition or becoming top level Apache Project and many more. Irrespective of whether it is a product which is a fast/general engine for large-scale data processing, Spark has found its use everywhere.
The best part about Spark being that is it can be combined with any tool to perform data crunching or to build REST Services, etc.
The Spray Starter Kit which is the main part of this post, is a template which allows you to accept incoming REST requests, route them to the corresponding Spark server, which provides a response and then carrying back this response to the actual user in a non-blocking, performant way.
We chose Spray as a REST service integration tool to build this template as Spray has become a de-facto standard in industry to build REST Services. Also, Spray has an asynchronous, actor-based, fast, lightweight and modular way to connect our Scala applications to the world.
The main characters of this example are
1. The SparkServices which is responsible for communication to and from Spark server.
2. Spark Server which would interact with the SparkServices.
3. Client call which would call a REST service on the SparkServices and await a response.
Let us see how the interactions happen. For simplicity, we would assume that the client is making a GET call on the browser with something like
Now the call lands up on the SparkServices which has exposed a REST service. The HTTP server on Spray is started like this
Ofcourse we can externalize a few things like Host & Port by placing them configuration(.conf) files.
If you notice, we are binding to the service called SparkServices. This what wraps over our routes.
and the sparkRoutes are defined as
Here, we are accepting a request as per the routes defined. It is a GET request of the kind /spark/version which tells us the Spark version we are using in this template.
Now comes the tricky part, i.e., how to make an JAR of our REST application and run it via spark-submit on a Spark server.
Obviously, to make the JAR file we can use sbt-assembly but there are some complexities in using it, since both Spark & Spray have some common files which needs a proper Merge strategy. So, to make a JAR file properly using sbt-assembly we can add following merge strategy in build.sbt file along with other settings.
Now we can create a JAR file of our application using sbt assembly command and deploy it on Spark cluster using spark-submit script of Spark. Or, we can simply run it on our local machine by using sbt run.
The entire starter kit is present here.