Streaming

Streaming in Spark, Flink and Kafka

Reading Time: 7 minutes There is a lot of buzz going on between when to use use spark, when to use flink, and when to use Kafka. Both spark streaming and flink provides exactly once guarantee that every record will be processed exactly once thereby eliminating any duplicates that might be available. Both provide very high throughput compared to any other processing system like storm, and the overhead of Continue Reading

Introducing Kafka Streams: Processing made easy

Reading Time: 2 minutes If you are working on huge amount of data, you might have heard about Kafka. At a very high level, Kafka is a fault tolerant, distributed publish-subscribe messaging system that is designed for fast processing of data and the ability to handle hundreds of thousands of messages. What is Stream Processing Stream processing is the real-time processing of data continuously, concurrently, and in a record-by-record Continue Reading

Twitter’s tweets analysis using Lambda Architecture

Reading Time: 3 minutes Hello Folks, In this blog i will explain  twitter’s tweets analysis with lambda architecture. So first we need to understand  what is lambda architecture,about its component and usage. According to Wikipedia, Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods. Now let us see  lambda architecture components and its detail.

Lambda Architecture with Spark

Reading Time: < 1 minute Hello folks, Knoldus  organized a knolx session on the topic : Lambda Architecture with Spark. The presentation covers lambda architecture and implementation with spark.In the presentaion we will discuss components of lambda architecure like batch layer,speed layer and serving layer.We will also discuss it’s advantages and benefits with spark. You can watch the video of presentation : Here you can check slide :   Thanks !!

Meetup: Stream Processing Using Spark & Kafka

Reading Time: < 1 minute Knoldus organized a Meetup on Friday, 9 September 2016. Topics which were covered in this meetup are: Overview of Spark Streaming. Fault-tolerance Semantics & Performance Tuning. Spark Streaming Integration with  Kafka. Meetup code sample available here Real time stream processing engine application code available here

Building Analytics Engine Using Akka, Kafka & ElasticSearch

Reading Time: 5 minutes In this blog , I will share my experience on building scalable, distributed and fault-tolerant  Analytics engine using Scala, Akka, Play, Kafka and ElasticSearch. I would like to take you through the journey of  building an analytics engine which was primarily used for text analysis. The inputs were structured, unstructured and semi-structured data and we were doing a lot of data crunching using it. The Analytics Continue Reading

Getting close to Apache Flink, albeit in a Träge manner – 3

Reading Time: 6 minutes In the last two blogs on Flink, I hope to have been able to underline the primacy of Windows in the scheme of things of Apache Flink’s streaming. I have shared my understanding of two types of Windows that can be attached to a stream of Events, namely (a) CountWindow and (b) TimeWindow. Variations of these types are offered too; for example, one can put Continue Reading

Getting close to Apache Flink, albeit in a Träge manner – 2

Reading Time: 6 minutes From the preceding post in this series In the last blog , we had taken a look at Flink’s CountWindow feature. Here’s a quick recap: As a stream of events enter a Flink-based application, we can apply a transformation of CountWindow on it (there are many such transformations the Flink offers us, we will meet them as we go). CountWindow allows us to create a Continue Reading

Getting close to Apache Flink, albeit in a Träge manner – 1

Reading Time: 7 minutes Of late, I have begun to read about Apache Flink. Apache Flink (just Flink hereafter), is an ‘open source platform for distributed stream and batch data processing’, to quote from the homepage.  What has caught my interest is Flink’s idea that, the ability operate on unit of data streaming in gives one the flexibility to decide what constitutes a batch: count of events or events Continue Reading