Apache Kafka

Kafka Connect Concepts

Reading Time: 3 minutes Kafka Connect is a data distribution framework within and outside of Apache Kafka®. The Confluent Platform is deployed with a few built-in connectors that can be used to stream data to or from commonly used systems such as related websites or HDFS. To effectively discuss the internal functionality of Kafka Connect, it is helpful to establish a few key concepts: Connectors Tasks Workers Converters Transform Continue Reading

Apache Camel vs Apache Kafka

Reading Time: 4 minutes An overview of Camel Apache Camel is an open source integration framework that targets the integration between different systems. Camel is a routing engine, more precisely a routing- engine builder. however It allows you to define your own routing rules, decide from which sources to accept messages, and determine how to process and send those messages to other destinations.However its Routes, Camel uses a set Continue Reading

Security & SSL Setup in Confluent Kafka

Reading Time: 2 minutes What is SSL ? Secure Socket Layer (SSL) is a security protocol for the transport layer. In SSL Protocol data is divided into fragments. The fragments are compressed and encrypted Message Authentication Code (MAC) generated by algorithms like Secure Hash Protocol(SHA) and MD5(Message Digest) is appended. SSL is the predecessor of Transport Layer Security(TLS) . After encryption of data, finally, the SSL header is appended Continue Reading

Introduction To Apache Kafka

Reading Time: 6 minutes Introduction Apache Kafka is a framework implementation of a software bus using stream-processing . It is an open source platform, developed by the Apache Software Foundation. It is written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library. Apache Continue Reading

Apache Kafka Use Cases and Applications

Reading Time: 3 minutes The amount of data generation has multiplied many folds overdue to the dominance of the digital age. Thus, enterprises that wish to remain in business and remain relevant in today’s world and the future must understand and learn how to manage a large volume of data through a strong, scalable, and flexible platform. Apache Kafka is one of the means to achieve it. Apache Kafka, Continue Reading

A Quick Demo: Kafka to Flink to Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Flink with Kafka and Cassandra to build a simple streaming data pipeline. Apache Flink is a framework and distributed processing engine. it is used for stateful computations over unbounded and bounded data streams.Kafka is a scalable, high performance, low latency platform. It allows reading and writing streams of data like a messaging system.Cassandra: A distributed and wide-column Continue Reading

Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline. Spark Structured Streaming is a component of Apache Spark framework that enables scalable, high throughput, fault tolerant processing of data streams.Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data Continue Reading

Knime Meets Apache Kafka

Reading Time: 4 minutes Knime taps into the power of Apache Kafka’s pub-sub mechanism by introducing Kafka extension. It adds nodes to connect, read and publish to Kafka cluster.

Serialization in Kafka

Reading Time: 2 minutes Serialization is the process of converting an object into a stream of bytes that are used for transmission. Kafka stores and transmits these bytes of arrays in its queue. Deserialization, as the name suggests, does the opposite of serialization, in which we convert bytes of arrays into the desired data type. Apache Kafka stores as well as transmit these bytes of arrays in its queue. Continue Reading

Rebalancing: What the fuss is all about?

Reading Time: 4 minutes Apache Kafka is ruling in the world of Big Data. It is just not a messaging queue but a full-fledged event streaming platform. We have looked through the basic idea of Kafka and what makes it faster than any other messaging queue. You can read about it from my previous blog. Also, we looked through Partitions, Replicas, and ISR. We are now ready for our Continue Reading

Streaming from Kafka to PostgreSQL through Spark Structured Streaming

Reading Time: 3 minutes Hello everyone, in this blog we are going to learn how to do a structured streaming in spark with kafka and postgresql in our local system. We will be doing all this using scala so without any furthur pause, lets begin. Setting up the necessities first: Dependencies Set up the required dependencies for scala, spark, kafka and postgresql. 2. PostgreSQL setup Lets start fresh by Continue Reading

Apache Kafka: What & Why?

Reading Time: 6 minutes What is Apache Kafka? Apache Kafka is a well-known name in the world of Big Data. It is one of the most used distributed streaming platforms. Kafka is just not a messaging queue but a full-fledged event streaming platform. It is a framework for storing, reading and analyzing streaming data. It is a publish-subscribe based durable messaging system exchanging data between processes, applications, and servers. Continue Reading

Using Vertica with Spark-Kafka: Write using Structured Streaming

Reading Time: 3 minutes In two previous blogs, we explored about Vertica and how it can be connected to Apache Spark. The first blog in this mini series was about reading data from Vertica using Spark and saving that data into Kafka. The next blog explained the reverse flow i.e. reading data from Kafka and writing data to Vertica but in a batch mode. i.e reading data from Kafka Continue Reading