Reading Time: 2 minutes If you are working on huge amount of data, you might have heard about Kafka. At a very high level, Kafka is a fault tolerant, distributed publish-subscribe messaging system that is designed for fast processing of data and the ability to handle hundreds of thousands of messages. What is Stream Processing Stream processing is the real-time processing of data continuously, concurrently, and in a record-by-record Continue Reading
Reading Time: < 1 minute Hello!! Knoldus had organized half an hour session on Structured Streaming briefing about the API changes, how it is different from the early Stream Computation paradigm (DStreams) and example API demonstration. Hope you will enjoy. Below are the slides and Video from the session. Slide: Video:
Reading Time: 3 minutes Hello Folks, In this blog i will explain twitter’s tweets analysis with lambda architecture. So first we need to understand what is lambda architecture,about its component and usage. According to Wikipedia, Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods. Now let us see lambda architecture components and its detail.
Reading Time: < 1 minute Hello folks, Knoldus organized a knolx session on the topic : Lambda Architecture with Spark. The presentation covers lambda architecture and implementation with spark.In the presentaion we will discuss components of lambda architecure like batch layer,speed layer and serving layer.We will also discuss it’s advantages and benefits with spark. You can watch the video of presentation : Here you can check slide : Thanks !!
Reading Time: < 1 minute Knoldus organized a Meetup on Friday, 9 September 2016. Topics which were covered in this meetup are: Overview of Spark Streaming. Fault-tolerance Semantics & Performance Tuning. Spark Streaming Integration with Kafka. Meetup code sample available here Real time stream processing engine application code available here
Reading Time: 5 minutes In this blog , I will share my experience on building scalable, distributed and fault-tolerant Analytics engine using Scala, Akka, Play, Kafka and ElasticSearch. I would like to take you through the journey of building an analytics engine which was primarily used for text analysis. The inputs were structured, unstructured and semi-structured data and we were doing a lot of data crunching using it. The Analytics Continue Reading
Reading Time: 6 minutes In the last two blogs on Flink, I hope to have been able to underline the primacy of Windows in the scheme of things of Apache Flink’s streaming. I have shared my understanding of two types of Windows that can be attached to a stream of Events, namely (a) CountWindow and (b) TimeWindow. Variations of these types are offered too; for example, one can put Continue Reading
Reading Time: 6 minutes From the preceding post in this series In the last blog , we had taken a look at Flink’s CountWindow feature. Here’s a quick recap: As a stream of events enter a Flink-based application, we can apply a transformation of CountWindow on it (there are many such transformations the Flink offers us, we will meet them as we go). CountWindow allows us to create a Continue Reading
Reading Time: 7 minutes Of late, I have begun to read about Apache Flink. Apache Flink (just Flink hereafter), is an ‘open source platform for distributed stream and batch data processing’, to quote from the homepage. What has caught my interest is Flink’s idea that, the ability operate on unit of data streaming in gives one the flexibility to decide what constitutes a batch: count of events or events Continue Reading