kafka

Introduction To Apache Kafka

Reading Time: 6 minutes Introduction Apache Kafka is a framework implementation of a software bus using stream-processing . It is an open source platform, developed by the Apache Software Foundation. It is written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library. Apache Continue Reading

Kafka Kerberos Authentication

Reading Time: 2 minutes In this article we will start looking into Kerberos authentication and will focus on the client-side configuration required to authenticate with clusters configured to use Kerberos. Kafka supports four different communication protocols between Consumers, Producers, and Brokers. Each protocol considers different security aspects, while PLAINTEXT is the old insecure communication protocol. PLAINTEXT (non-authenticated, non-encrypted) SSL (SSL authentication, encrypted) PLAINTEXT+SASL (authentication, non-encrypted) SSL+SASL (encrypted authentication, encrypted Continue Reading

Comparing Data Streaming Frameworks | Scala

Reading Time: 4 minutes In this Era of Technology, where the amount of data is growing exponentially and every bit of data holds value. Even, according to some reports, the number of bytes being generated and stored till now in the world has already exceeded the star counts in the sky. As every bit is useful so, it is very important to store them without losing any bit. When Continue Reading

Kafka Producer Internals

Reading Time: 3 minutes Hello everyone, I know there are lot of blogs present on the kafka you can go through. So that rather than explaing the basic concepts of kafka and architecture, Here we will look into the kafka producer internals.Will see that what happens internally when producer send the message into topic. Also will see what happens when consumer consumes messages. Imagine….. Let’s suppose producer wants to Continue Reading

Apache Kafka for beginners

Reading Time: 4 minutes Introduction One of the biggest challenges associated with big data is, analyzing the data. But before we get to that part, the data has to be first collected, and also for a system to process impeccably it should be able to grasp and make the data available to users. This is where Apache Kafka comes in handy. Let’s briefly understand how Kafka came into existence? Continue Reading

How to delete record from Kafka Topic : Tombstone

Reading Time: 4 minutes Hello Reader,Here we will see how can we delete records from Kafka’s topic(compacted topic as well as the non-compacted topic). Problem : GDPR: General Data Protection Regulation is a regulation that requires businesses to protect the personal data and privacy of EU citizens for transactions that occur within EU member states. CCPA: The California Consumer Privacy Act is a state-wide data privacy law that regulates Continue Reading

DevOps Shorts: How to increase the replication factor for a Kafka topic

Reading Time: 2 minutes Have you ever faced a situation where you had to increase the replication factor for a topic? Turns out it’s really easy to do it. In this super short blog, let’s try to do just that. We’d start with creating a topic, one, with a replication factor of just 1 and then work on bits that include creating the increase.json file and then actually triggering the plan. Step 1: Create Continue Reading

Using Apache Flink for Kinesis to Kafka Connect

Reading Time: 3 minutes In this blog, we are going to use kinesis as a source and kafka as a consumer. Let’s get started. Step 1: Apache Flink provides the kinesis and kafka connector dependencies. Let’s add them in our build.sbt: Step 2: The next step is to create a pointer to the environment on which this program runs. Step 3: Setting parallelism of x here will cause all Continue Reading

Comparison between different streaming engines

Reading Time: 5 minutes Distributed stream processing engines have been on the rise in the last few years, first Hadoop became popular as a batch processing engine, then focus shifted towards stream processing engines. Stream processing engines can make the job of processing data that comes in via a stream easier than ever before and by using clustering can enable processing data in larger sets in a timely manner. Continue Reading

Apache Kafka: What & Why?

Reading Time: 6 minutes What is Apache Kafka? Apache Kafka is a well-known name in the world of Big Data. It is one of the most used distributed streaming platforms. Kafka is just not a messaging queue but a full-fledged event streaming platform. It is a framework for storing, reading and analyzing streaming data. It is a publish-subscribe based durable messaging system exchanging data between processes, applications, and servers. Continue Reading

Understanding data persistence in Lagom

Reading Time: 4 minutes When we create any microservice, or in general any service, one of the biggest task is to manage data persistence. Lagom supports various databases for doing this task. By default, Lagom uses Cassandra to persist data.

Kafka Timestamp Extractor

Reading Time: 3 minutes Hi folks, I hope you all’re doing well, so if you land up here you probably looking for Timestamp Extractor for kafka streams, so whats the buzz is all about? So in this blog we are going to look what it is and would explore it as well, so buckle up. The Timestamp Extractor As per docs, A timestamp extractor extracts a timestamp from an Continue Reading

Custom Partitioner in Kafka: Let’s Take Quick Tour!

Reading Time: 5 minutes In this blog, we are going to explore the Kafka partitioner. We will try to understand why the default partitioner is not enough and when you might need a custom partitioner. We will also look at a use case and create code for the custom partitioner. I assumed that you have sound knowledge of Kafka. Let’s understand the behavior of the default partitioner. The default Continue Reading