Streaming Solutions

kafka with spark

Integrating Kafka With Spark Structure Streaming

Reading Time: 2 minutes Kafka is a messaging broker system which facilitates the passing of messages between producer and consumer whereas Spark Structure streaming consumes static and streaming data from various sources like kafka, flume, twitter or any other socket which can be processed and analysed using high level algorithm for machine learning and finally pushed the result out to external storage system. The main advantage of structured streaming Continue Reading

Spark Structured Streaming: A Simple Definition

Reading Time: 3 minutes “Structured Streaming”, nowadays we are hearing this term in Apache Spark ecosystem quite a lot, as it is being preached as next big thing in scalable big data world. Although, we all know that Structured Streaming means a stream having structured data in it, but very few of us knows what exactly it is and where we can use it. So, in this blog post Continue Reading

Exploring Spark Structured Streaming

Reading Time: 6 minutes Hello Spark Enthusiasts, Streaming apps are growing more complex. And it is getting difficult to do with current distributed streaming engines. Why streaming is hard ? Streaming computations don’t run in isolation. Data arriving out of time order is a problem for batch-processed processing. Writing stream processing operations from scratch is not easy. Problem with DStreams: Processing with event-time: dealing with late data. Interoperate streaming Continue Reading

Spark Streaming vs Kafka Stream

Reading Time: 4 minutes The demand for stream processing is increasing a lot these days. The reason is that often processing big volumes of data is not enough. Data has to be processed fast, so that a firm can react to changing business conditions in real time. Stream processing is the real-time processing of data continuously and concurrently. Streaming processing” is the ideal platform to process data streams or Continue Reading

Streaming in Spark, Flink and Kafka

Reading Time: 7 minutes There is a lot of buzz going on between when to use use spark, when to use flink, and when to use Kafka. Both spark streaming and flink provides exactly once guarantee that every record will be processed exactly once thereby eliminating any duplicates that might be available. Both provide very high throughput compared to any other processing system like storm, and the overhead of Continue Reading

Introducing Kafka Streams: Processing made easy

Reading Time: 2 minutes If you are working on huge amount of data, you might have heard about Kafka. At a very high level, Kafka is a fault tolerant, distributed publish-subscribe messaging system that is designed for fast processing of data and the ability to handle hundreds of thousands of messages. What is Stream Processing Stream processing is the real-time processing of data continuously, concurrently, and in a record-by-record Continue Reading

Introduction to Structured Streaming

Reading Time: < 1 minute Hello!!  Knoldus had organized half an hour session on Structured Streaming briefing about the API changes, how it is different from the early Stream Computation paradigm (DStreams) and example API demonstration. Hope you will enjoy. Below are the slides and Video from the session. Slide: Video:

Twitter’s tweets analysis using Lambda Architecture

Reading Time: 3 minutes Hello Folks, In this blog i will explain  twitter’s tweets analysis with lambda architecture. So first we need to understand  what is lambda architecture,about its component and usage. According to Wikipedia, Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods. Now let us see  lambda architecture components and its detail.

Tableau: Getting into Tableau Public

Reading Time: 2 minutes Big Data visualization and Business Intelligence got so easy using Tableau, millions and billions of records can be analyzed in just one go whether your data format is excel, csv, text or database, Tableau make it easy for you. So finally you have make your mind to generate visualizations using Tableau and want to know what are the heights of Tableau in visualizations?. You are Continue Reading

Business Intelligence-Data Visualization: Tableau

Reading Time: 3 minutes Spark, Bigdata, NoSQL, Hadoop are some of the most using and top in charts technologies that we frequently use in Knoldus, when these terms used than one thing comes into picture is ‘Huge Data, millions/billions of records’ Knoldus developers use these terms frequently, managing (and managing means here- storing data, rectifying data, normalizing it, cleaning it and much more) such amount of data is really Continue Reading

Lambda Architecture with Spark

Reading Time: < 1 minute Hello folks, Knoldus  organized a knolx session on the topic : Lambda Architecture with Spark. The presentation covers lambda architecture and implementation with spark.In the presentaion we will discuss components of lambda architecure like batch layer,speed layer and serving layer.We will also discuss it’s advantages and benefits with spark. You can watch the video of presentation : Here you can check slide :   Thanks !!

Meetup: Stream Processing Using Spark & Kafka

Reading Time: < 1 minute Knoldus organized a Meetup on Friday, 9 September 2016. Topics which were covered in this meetup are: Overview of Spark Streaming. Fault-tolerance Semantics & Performance Tuning. Spark Streaming Integration with  Kafka. Meetup code sample available here Real time stream processing engine application code available here

Building Analytics Engine Using Akka, Kafka & ElasticSearch

Reading Time: 5 minutes In this blog , I will share my experience on building scalable, distributed and fault-tolerant  Analytics engine using Scala, Akka, Play, Kafka and ElasticSearch. I would like to take you through the journey of  building an analytics engine which was primarily used for text analysis. The inputs were structured, unstructured and semi-structured data and we were doing a lot of data crunching using it. The Analytics Continue Reading