Scala, Couchbase, Spark and Akka-http: A combinatory tutorial for starters

Couchbase and Apache Spark are best so far , for the in-memory computation. I am using akka-http because its new in the business. If you are not a big fan of akka-http and don’t think it is yet ready for production then you can take a look on this blog, which displays how to do the same task using Spray.

If you are new to all these technologies and all these sounds just like some weird names 😉 do not worry we will walk through step by step and at the end you will be able to make a REST Api that can be deployed on Spark Cluster with Couchbase as the database.

So first things first :

What is Couchbase ?

Couchbase is one of the best in-memory database with lots of capabilities and a user friendly UI to manage the database. It  is a NoSQL document database with a distributed architecture for performance, scalability, and availability. It is available in both the enterprise as well as community edition.  If you do not have a Couchbase installation and want to get started refer this link 🙂

What is Spark ?

Apache Spark™ is a fast and general engine for large-scale data processing. Its mainly built on scala with RDD as its fundamental bit. It provides API in scala , python and R .

What is Akka-http ?

Akka HTTP is made for building integration layers based on HTTP and as such tries to “stay on the sidelines”. Therefore you normally don’t build your application “on top of” Akka HTTP, but you build your application on top of whatever makes sense and use Akka HTTP merely for the HTTP integration needs. For more information about what it is you can take a look here.

How to connect these two ?

For bringing the couchbase to the Spark world we will use the couchbase-spark connector  by Couchbase itself.

Pre-requisites:

Now here comes the nice and easy code, but before that I presume now you have a Couchbase 4.5 installation and Spark 1.6 installation. If not refer the links. If you do not want to deploy this application on a cluster and just want to use it on you local machine you do not need a spark installation so you can skip it.

Code:

If you directly want to jump to the code, here it is.

This repository has a sample guide of how to build a Spark-akka-http application with couchbase as a backend, and has a good understandable README file that explains it all.

So your build.sbt should look like this as it will be responsible of which version of  akka-http ,couchbase-spark connector and apache spark so please pay attention while specifying the version.

build.sbt:

Factories:

This is the part where we interact with the database with the help of spark connector.This code shows how to do CRUD operations and provide factories for the Couchbase using the connector.

 

Routes:

Now we want to develop routes to provide the rest end points to the user so that they can perform CRUD using the REST end points.

 

Server:

This is how we can establish the Http server in akka-http

The methods saveToCouchbase(), couchbaseGet(), couchbaseView(),couchbaseQuery() are provided by couchbase so that we can perform the functionality on RDD’s . This is a basic implementation of how to perform CRUD operation on couchbase using the Spark.

In further blogs we will be discussing of how to use the SQL queries provided by spark to query the Couchbase and how to use the Apache Spark 2.0.0 the latest release with the Couchbase.

So stay tuned till then Happy hAKKAing !

Code Repository : Spark-Akka Couhcabse guides 

References: 

  1.  Akka-http Documentation
  2.  Spark Documentation
  3.  Couchbase Installation
  4.  Couchbase-spark connector
  5.  Spark with Couchbase to Electrify Your Data Processing: Couchbase Connect 2015 by Michael Nitschinger
  6. Using Spark, Spray and Couchbase

6 thoughts on “Scala, Couchbase, Spark and Akka-http: A combinatory tutorial for starters

Leave a Reply

%d bloggers like this: