Apache Kafka

Set-up Kafka Cluster On GCP

Reading Time: 4 minutes In this article, we are going to create Kafka Clusters on the GCP platform. We can do it in various ways like uploading Kafka directory to GCP, creating multiple zookeepers, by creating multiple copies of the server.properties file, etc. But, In this article, we are doing it in a simpler way i.e. by Creating a Kafka Cluster (with replication). Let’s Start… What is GCP?  GCP Continue Reading

Fault tolerance and Resiliency in Apache Kafka.

Reading Time: 5 minutes Kafka is known for it’s performance with resiliency & fault tolerance. In this article we’ll see how to make some changes in configuration to achieve fault tolerance and resilience for better architectural need. before starting the article, we need to have basic knowledge of Kafka or we can go through the Document. Apache Kafka is a distributed system, and the term fault tolerance is very Continue Reading

How to delete record from Kafka Topic : Tombstone

Reading Time: 4 minutes Hello Reader,Here we will see how can we delete records from Kafka’s topic(compacted topic as well as the non-compacted topic). Problem : GDPR: General Data Protection Regulation is a regulation that requires businesses to protect the personal data and privacy of EU citizens for transactions that occur within EU member states. CCPA: The California Consumer Privacy Act is a state-wide data privacy law that regulates Continue Reading

A Quick Demo: Kafka to Flink to Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Flink with Kafka and Cassandra to build a simple streaming data pipeline. Apache Flink is a framework and distributed processing engine. it is used for stateful computations over unbounded and bounded data streams.Kafka is a scalable, high performance, low latency platform. It allows reading and writing streams of data like a messaging system.Cassandra: A distributed and wide-column Continue Reading

DevOps Shorts: How to increase the replication factor for a Kafka topic

Reading Time: 2 minutes Have you ever faced a situation where you had to increase the replication factor for a topic? Turns out it’s really easy to do it. In this super short blog, let’s try to do just that. We’d start with creating a topic, one, with a replication factor of just 1 and then work on bits that include creating the increase.json file and then actually triggering the plan. Step 1: Create Continue Reading

Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline. Spark Structured Streaming is a component of Apache Spark framework that enables scalable, high throughput, fault tolerant processing of data streams.Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data Continue Reading

Kafka

Set-up Kafka Cluster using Kubernetes Statefulset

Reading Time: 3 minutes Hi readers, In this blog, we will be setting up a Kafka Statefulset cluster using Kubernetes and also get a basic knowledge of Statefulset. StatefulSet StatefulSet is the workload API object used to manage stateful applications. Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. Kafka Apache Kafka is an open-source stream-processing software platform developed by Continue Reading

Comparison between different streaming engines

Reading Time: 5 minutes Distributed stream processing engines have been on the rise in the last few years, first Hadoop became popular as a batch processing engine, then focus shifted towards stream processing engines. Stream processing engines can make the job of processing data that comes in via a stream easier than ever before and by using clustering can enable processing data in larger sets in a timely manner. Continue Reading

Lagom: Lets Stream Kafka Messages And Process using Akka Actor

Reading Time: 5 minutes Lagom is an opensource framework for building reactive applications using Java or Scala and it is built on Akka and Play, which are well-known technologies performing in production in some of the most performance-centric and scalable application systems. The Lagom has been continuously proving itself as a user-friendly and convenient framework to design and develop scalable microservices. However, the microservices can either based on orchestration, Continue Reading

Serialization in Kafka

Reading Time: 2 minutes Serialization is the process of converting an object into a stream of bytes that are used for transmission. Kafka stores and transmits these bytes of arrays in its queue. Deserialization, as the name suggests, does the opposite of serialization, in which we convert bytes of arrays into the desired data type. Apache Kafka stores as well as transmit these bytes of arrays in its queue. Continue Reading

Rebalancing: What the fuss is all about?

Reading Time: 4 minutes Apache Kafka is ruling in the world of Big Data. It is just not a messaging queue but a full-fledged event streaming platform. We have looked through the basic idea of Kafka and what makes it faster than any other messaging queue. You can read about it from my previous blog. Also, we looked through Partitions, Replicas, and ISR. We are now ready for our Continue Reading

Kafka

Apache Kafka: Topic Partitions, Replicas & ISR

Reading Time: 6 minutes In earlier blogs, we have gone through the basic terminologies of Kafka, and one step deeper into Zookeeper. Now let’s talk in detail about topic Partitions and replicas.  Topic Partitions The topic is a place holder of your data in Kafka. Data on a topic is further divided onto partitions. Each partition is an ordered, immutable sequence of records that is continually appended to a Continue Reading