Apache Kafka

Unit Testing Of Kafka

Reading Time: 2 minutes Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you to pass messages from one end-point to another. Generally, data is published to topic via Producer API and  Consumers API consume data from subscribed topics. In this blog, we will see how to do unit testing of kafka. Unit testing your Kafka Continue Reading

Setting It Up: KAFKA Multi-Broker System

Reading Time: 5 minutes In this blog, I am going to cover up the leftovers of my last blog: “A Beginners Approach To KAFKA” in which I tried to explain the details of Kafka, like its terminologies, advantages and demonstrated like how to set up the Kafka environment and get our Single Broker Cluster up and then test it’s working. So the main thing that I am going to cover up here is How Continue Reading

Meetup: Stream processing using Kafka

Reading Time: < 1 minute Knoldus organized a Meetup on Friday, 7th April 2017 at 4:00 PM which was presented by Himani Arora and me(Prabhat Kashyap). Topics which were covered in this meetup: What is Stream processing Advantages of stream processing Type of stream processing What are KStreams Use cases of KStreams Overview of Kafka Connect Slides: Video Recording:

kafka with spark

Integrating Kafka With Spark Structure Streaming

Reading Time: 2 minutes Kafka is a messaging broker system which facilitates the passing of messages between producer and consumer whereas Spark Structure streaming consumes static and streaming data from various sources like kafka, flume, twitter or any other socket which can be processed and analysed using high level algorithm for machine learning and finally pushed the result out to external storage system. The main advantage of structured streaming Continue Reading

Spark Streaming vs Kafka Stream

Reading Time: 4 minutes The demand for stream processing is increasing a lot these days. The reason is that often processing big volumes of data is not enough. Data has to be processed fast, so that a firm can react to changing business conditions in real time. Stream processing is the real-time processing of data continuously and concurrently. Streaming processing” is the ideal platform to process data streams or Continue Reading

Streaming in Spark, Flink and Kafka

Reading Time: 7 minutes There is a lot of buzz going on between when to use use spark, when to use flink, and when to use Kafka. Both spark streaming and flink provides exactly once guarantee that every record will be processed exactly once thereby eliminating any duplicates that might be available. Both provide very high throughput compared to any other processing system like storm, and the overhead of Continue Reading

Twitter’s tweets analysis using Lambda Architecture

Reading Time: 3 minutes Hello Folks, In this blog i will explain  twitter’s tweets analysis with lambda architecture. So first we need to understand  what is lambda architecture,about its component and usage. According to Wikipedia, Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods. Now let us see  lambda architecture components and its detail.

Kafka – Sending Object as a message

Reading Time: 3 minutes Kafka lets us publish and subscribe to streams of records and the records can be of any type, it can be JSON, String, POJO, etc. Kafka gives user the ability to creates our own serializer and deserializer so that we can transmit different data type using it. In this blog I will demonstrate how to create a custom serializer and deserializer but first let’s understand Continue Reading

Short Interview With SMACK Tech Stack !!!

Reading Time: 3 minutes Hello guy’s, today’s we conduct short interview with SMACK about its architecture and there uses. Let’s start with of some introduction. Interviewer: How would you describe your self ? SMACK: I am SMACK (Spark, Mesos, Akka, Cassandra and Kafka) and belongs to all open source technologies. Mesosphere and Cisco collaboration bundles these technologies together and create a product called Infinity.  Which is used to solved Continue Reading

2017 – Year of FAST Data

Reading Time: < 1 minute As we approach 2017, there is a strong focus on Fast Data. This is a combination of data at rest and data in motion and the speed has to be remarkably fast. In the deck that follows, we at Knoldus present to you how we have implemented a complex multi scale solution for a large bank on the Fast Data Architecture philosophy. As we partner Continue Reading

Knoldus Partners with Confluent to Power Real Time Streams

Reading Time: 3 minutes Knoldus is pleased to announce a Consulting and System Integrator partnership with Confluent, the company founded by the creators of Apache KafkaTM Confluent, creators of the first streaming platform based on Apache KafkaTM, provides the most complete platform to build enterprise-scale streaming pipelines using Apache Kafka and simplify the development of stream processing applications. Via rapid adoption in the Fortune 500, Apache Kafka is quickly emerging as Continue Reading

Meetup: Stream Processing Using Spark & Kafka

Reading Time: < 1 minute Knoldus organized a Meetup on Friday, 9 September 2016. Topics which were covered in this meetup are: Overview of Spark Streaming. Fault-tolerance Semantics & Performance Tuning. Spark Streaming Integration with  Kafka. Meetup code sample available here Real time stream processing engine application code available here

Introduction to Kafka Connect

Reading Time: < 1 minute Knoldus organized a half an hour session on 29 July 2016 at 4:00 PM. It covers a brief introduction to Apache Kafka Connect, giving insights about the benefits of kafka connect, its use cases. It also covers the motivation behind building Kafka Connect and an introduction to its architecture. Here is the video for the same.