Apache Kafka

Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline. Spark Structured Streaming is a component of Apache Spark framework that enables scalable, high throughput, fault tolerant processing of data streams.Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data Continue Reading

Kafka

Set-up Kafka Cluster using Kubernetes Statefulset

Reading Time: 3 minutes Hi readers, In this blog, we will be setting up a Kafka Statefulset cluster using Kubernetes and also get a basic knowledge of Statefulset. StatefulSet StatefulSet is the workload API object used to manage stateful applications. Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. Kafka Apache Kafka is an open-source stream-processing software platform developed by Continue Reading

Comparison between different streaming engines

Reading Time: 5 minutes Distributed stream processing engines have been on the rise in the last few years, first Hadoop became popular as a batch processing engine, then focus shifted towards stream processing engines. Stream processing engines can make the job of processing data that comes in via a stream easier than ever before and by using clustering can enable processing data in larger sets in a timely manner. Continue Reading

Lagom: Lets Stream Kafka Messages And Process using Akka Actor

Reading Time: 5 minutes Lagom is an opensource framework for building reactive applications using Java or Scala and it is built on Akka and Play, which are well-known technologies performing in production in some of the most performance-centric and scalable application systems. The Lagom has been continuously proving itself as a user-friendly and convenient framework to design and develop scalable microservices. However, the microservices can either based on orchestration, Continue Reading

Serialization in Kafka

Reading Time: 2 minutes Serialization is the process of converting an object into a stream of bytes that are used for transmission. Kafka stores and transmits these bytes of arrays in its queue. Deserialization, as the name suggests, does the opposite of serialization, in which we convert bytes of arrays into the desired data type. Apache Kafka stores as well as transmit these bytes of arrays in its queue. Continue Reading

Rebalancing: What the fuss is all about?

Reading Time: 4 minutes Apache Kafka is ruling in the world of Big Data. It is just not a messaging queue but a full-fledged event streaming platform. We have looked through the basic idea of Kafka and what makes it faster than any other messaging queue. You can read about it from my previous blog. Also, we looked through Partitions, Replicas, and ISR. We are now ready for our Continue Reading

Kafka

Apache Kafka: Topic Partitions, Replicas & ISR

Reading Time: 6 minutes In earlier blogs, we have gone through the basic terminologies of Kafka, and one step deeper into Zookeeper. Now let’s talk in detail about topic Partitions and replicas.  Topic Partitions The topic is a place holder of your data in Kafka. Data on a topic is further divided onto partitions. Each partition is an ordered, immutable sequence of records that is continually appended to a Continue Reading

Streaming from Kafka to PostgreSQL through Spark Structured Streaming

Reading Time: 3 minutes Hello everyone, in this blog we are going to learn how to do a structured streaming in spark with kafka and postgresql in our local system. We will be doing all this using scala so without any furthur pause, lets begin. Setting up the necessities first: Dependencies Set up the required dependencies for scala, spark, kafka and postgresql. 2. PostgreSQL setup Lets start fresh by Continue Reading

Apache Zookeeper: Does Kafka need it?

Reading Time: 3 minutes In my previous blog, we started with what Kafka is, and what makes Kafka fast. If you haven’t read already, you should give it a read. We also talked briefly about Zookeeper. We know that Zookeeper keeps track of the status of the Kafka cluster nodes and it also keeps track of Kafka topics, partitions, etc. But what else?In this blog, we will learn more Continue Reading

Apache Kafka: What & Why?

Reading Time: 6 minutes What is Apache Kafka? Apache Kafka is a well-known name in the world of Big Data. It is one of the most used distributed streaming platforms. Kafka is just not a messaging queue but a full-fledged event streaming platform. It is a framework for storing, reading and analyzing streaming data. It is a publish-subscribe based durable messaging system exchanging data between processes, applications, and servers. Continue Reading

Data Lake – Build it in Phases

Reading Time: 3 minutes Data Lake – How to build a data lake and what are the phases involved in the same.

Big Data Landscape explained

Reading Time: 5 minutes Big Data has now evolved into a buzz word and it seems everyone is either working on it or want to work on it. However, most of the people associate Big Data with some of the popular tool sets like Hadoop, Spark, NoSql databases like Hive, Cassandra , HBase etc. HDFS made Big Data popular as it gave us an option to distribute the data Continue Reading