Spark – IoT : Combining Big Data Analysis with IoT

Welcome back , folks !

Time for some new gig ! I think that last series i.e. Scala – IOT was pretty amazing , which got an overwhelming response from you all which resulted in pumping up the idea of this new web-series Spark-IOT.

So let’s get started,

What was the motivation ?

I have been active in the IoT community here, and I found a gap in the community. As far as I have seen here in Delhi/NCR, India the IoT community is basically divided into two sets:

  1. The ones who understand hardware, sensor, protocols, sensor data.
  2. The ones who understand software, what to analyse , how to analyse and so on.

Maybe you can relate this into your own communityūüėČ .

I am the guy who understands software very well, and we have been working on the analysis part for quite a while. Hence I thought why not combine the two fields and put the big data analysis on the streaming data generated from an IoT device.

Follow – Up !!

If you do not know anything about IoT and protocols , I would suggest you check out the Scala – IoT series first , so that you can get a basic understanding of the protocols , MQTT and all. Here is the link to all the Scala – IoT blogs:

  1. Scala-IOT : Introduction to Internet Of Things.

  2. Scala-IOT : What is Mqtt ? How it is lightweight ?

  3. Scala-IOT: Getting started with RaspberryPi without Monitor or Screen.

  4. Scala ‚Äď IOT : First basic IOT application using Scala on¬†RaspberryPi

If you know a little about Apache Spark, I would like you to go through these , so that you have the basic understanding of what it is and then maybe you can move onto the official documentation. Here is the link.

What this web-series will be about ?

This web series will be about how to analyse the data generated from IoT device . In this case, it would be Raspberry Pi. So I have moulded the series like this :

  1. Spark – IoT: Setting up Apache Spark Cluster on RaspberryPi.

    In this, we will be talking about how to setup a basic Spark Cluster on Raspberry Pi. In which one Raspberry Pi would be master and the other one would be a slave. And we would be running the Spark Shell.

  2. Spark – IoT: Developing your first basic Streaming Application for analysis.

    In this , we will be developing a basic Spark Streaming Application for the data analysis , the data producer will be raspberryPi itself as we have done it in the first Scala -IoT Application.

  3. Spark – IoT: Deploying your application on RaspberryPi-Spark Cluster.

    In this, we will be deploying this application on the RaspberryPi-Spark Cluster for the data processing.

This is the overall idea,¬† but things can be changed so if you have any suggestions please drop it in the comments and maybe we can work on that too along with youūüôā

If you are interested in this web series , please let us know ! You can subscribe to our newsletter here !

And if you have any questions please feel free to contact me here or on Twitter : @shiv4nsh

Till then !

Happy hAKKAing !ūüôā




Posted in apache spark, IOT, Scala, Spark | Tagged , , , , , , , , , , , | Leave a comment

Streaming with Apache Spark Custom Receiver

Hello inqisitor. In previous blog we have seen about the predefined Stream receiver of Spark.

In this blog we are going to discuss about Custom receiver of spark so that we can source the data from any .

So if we want to use Custom Receiver than we should know first we are not going to use SparkSession as entry point , if there are not any such use case .

Continue reading

Posted in apache spark, big data, Scala | Tagged , | Leave a comment

Streaming with Apache Spark 2.0

Hello geeks we were discussed about Apache Spark 2.0 with hive in earlier blog.

Now i am going to describe how can we use spark to stream the data   .

At first we need to understand this new Spark Streaming architecture  .

Continue reading

Posted in apache spark, big data, Scala | Tagged , | Leave a comment

Kick-Off Java 9 (Project Jigsaw & ServiceLoader) Part ‚Äď II

Java  comes with major changes and new surprises for developers. Major changes came in  Java 5 and Java 8, but now Java 9 will come with new mysteries and changes. In the last post we were discussing about Java 9 with Jigsaw and in another post we were using ServiceLoader in Java instead of dependencies injections.

Java 9 gives us a clean way for using ServiceLoader in Java for maintain DI in application using modularization.  For detail, please visit our previous posts as mentioned above. Today we are creating Greetings example using Jigsaw and ServiceLoader. Below is our directory structure of sample application.
Continue reading

Posted in Scala | Leave a comment

Play With Java ServiceLoader forget about Dependency Injection(DI) Frameworks

In most of the applications we are using Dependency Injection for loosely couple of our code. Some time, we just require simple DI, nothing else, for those, we need to include some of our DI frameworks like Spring, Google Guice etc. This makes our project jar heavy and added some unnecessary classes as well. For all these, Java itself have a ServiceLoader class for inject your dependency at run time. ServiceLoader had introduce in JDK 3, but this is used for internal purpose. During JDK 6, this class scopes to public but still, it is a final class, we are not able to extends its functionality.

ServiceLoader will play important role in JDK 9, Which we will discuss in our next post. Today, we are creating a simple Greetings application using ServiceLoader as below:

Continue reading

Posted in Scala | 1 Comment

KnolX: Introduction to Apache Spark 2.0

Knoldus organized a KnolX session on Friday, 23 September 2016. In that one hour session we got an introduction of Apache Spark 2.0 and its API(s).

Spark 2.0 is a major release of Apache Spark. This release has brought many changes to API(s) and libraries of Spark. So in this KnolX, we looked at some improvements that were made in Spark 2.0. Also, in this KnolX we got an introduction to some new features in Spark 2.0 like SparkSession API and Structured Streaming.

The slides for the session are as follows:


Below is the Youtube video for the session.



Posted in Scala, Spark | Tagged , , , | 1 Comment

Introduction to Apache Hadoop: The Need

In this Blog we will read about the Hadoop fundamentals. After reading this blog we will be able to understand why we need Apache Hadoop, So lets starts with the problem.

Whats the Problem :- The problem is simple: the storage capacities of hard drives have increased massively over the years, access speeds‚ÄĒthe rate at which data can be read from drives‚ÄĒ have not kept up.

One typical drive from 1990 could store 1,370 MB of data and had a transfer speed of 4.4 MB/s,4 so you could read all the data from a full drive in around five minutes.

Over 20 years later, 1-terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.

Proposed Solution :- This is a long time to read all data on a single drive‚ÄĒand writing is even slower. The obvious way to reduce the time is to read from multiple disks at once. Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes.

this is what Hadoop provides. Now,lets know more about

Continue reading

Posted in Scala | Tagged , , | Leave a comment

Introduction To Hadoop Map Reduce

In this Blog we will be reading about Hadoop Map Reduce. As we all know to perform faster processing we needs to process the data in parallel. Thats Hadoop MapReduce Provides us.

MapReduce :- MapReduce is a programming model for data processing. MapReduce programs are inherently parallel, thus putting very large-scale data analysis into the hands of anyone with enough machines at their disposal.MapReduce works by breaking the processing into two phases:

  • The map phase and,
  • The reduce phase

Each phase has key-value pairs as input and output, the types of which may be chosen by the programmer. The programmer also specifies two functions:

  • The map function and,
  • The reduce function

The Map Function :- The key is the offset of the beginning of the line from the beginning of the file. Map function setting up the data in such a way that the reduce function can do its work on it. The map function is also a good place to drop bad records. so, generally we filter out the necessary data that we needs to process. To provide the body of Map Function we needs to extend Mapper class. To understand Map Function better lets take an example

Example :- For example we are considering NCDC raw Data here is the sample

Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , | Leave a comment

Akka with java

Hello friends ,

In last few days i was working on a project with akka using java . This was really an amazing experience in akka .

Here we will discuss that how to use Akka in java and write the test case for the same .

If we see documentation of Akka they extends a class named UntypedActor to create an actor .  But here we will discuss about AbstractActor which leads less cost and hence seems  pretty and concise   .

At first add following dependency for Akka and testcase :

"com.typesafe.akka" %% "akka-slf4j" % "2.4.8"

Now to create an actor at first we need to create Props   :

<strong>public static Props props() {</strong>
<strong>    return Props.create(HappyBdayActor.class);</strong>

Then after we write the responsiblity of an actor as following :

Continue reading

Posted in Akka, Java, Scala | Leave a comment

Apache Spark 2.0 with Hive

Hello geeks , we have discussed about how to start programming with spark in scala.

In this blog we will discuss about how we can use hive with spark 2.0.

When you start to work with hive , at first we need HiveContext (inherits SqlContext)  , core-site.xml , hdfs-site.xml and hive-site.xml for spark. In case if you dont configure hive-site.xml then the context automatically creates metastore_db in the current directory and creates warehouse directory indicated by HiveConf(which defaults user/hive/warehouse).

Continue reading

Posted in apache spark, Scala, Spark | Tagged , | Leave a comment