kafka

Using Apache Flink for Kinesis to Kafka Connect

Reading Time: 3 minutes In this blog, we are going to use kinesis as a source and kafka as a consumer. Let’s get started. Step 1: Apache Flink provides the kinesis and kafka connector dependencies. Let’s add them in our build.sbt: Step 2: The next step is to create a pointer to the environment on which this program runs. Step 3: Setting parallelism of x here will cause all Continue Reading

Comparison between different streaming engines

Reading Time: 5 minutes Distributed stream processing engines have been on the rise in the last few years, first Hadoop became popular as a batch processing engine, then focus shifted towards stream processing engines. Stream processing engines can make the job of processing data that comes in via a stream easier than ever before and by using clustering can enable processing data in larger sets in a timely manner. Continue Reading

Apache Kafka: What & Why?

Reading Time: 6 minutes What is Apache Kafka? Apache Kafka is a well-known name in the world of Big Data. It is one of the most used distributed streaming platforms. Kafka is just not a messaging queue but a full-fledged event streaming platform. It is a framework for storing, reading and analyzing streaming data. It is a publish-subscribe based durable messaging system exchanging data between processes, applications, and servers. Continue Reading

Understanding data persistence in Lagom

Reading Time: 4 minutes When we create any microservice, or in general any service, one of the biggest task is to manage data persistence. Lagom supports various databases for doing this task. By default, Lagom uses Cassandra to persist data.

Kafka Timestamp Extractor

Reading Time: 3 minutes Hi folks, I hope you all’re doing well, so if you land up here you probably looking for Timestamp Extractor for kafka streams, so whats the buzz is all about? So in this blog we are going to look what it is and would explore it as well, so buckle up. The Timestamp Extractor As per docs, A timestamp extractor extracts a timestamp from an Continue Reading

Custom Partitioner in Kafka: Let’s Take Quick Tour!

Reading Time: 5 minutes In this blog, we are going to explore the Kafka partitioner. We will try to understand why the default partitioner is not enough and when you might need a custom partitioner. We will also look at a use case and create code for the custom partitioner. I assumed that you have sound knowledge of Kafka. Let’s understand the behavior of the default partitioner. The default Continue Reading

Diving deeper into Delta Lake

Reading Time: 6 minutes Delta Lake is an open-source storage layer that brings reliability to data lakes. It has numerous reliability features including ACID transactions, scalable metadata handling, and unified streaming and batch data processing.

Using Vertica with Spark-Kafka: Writing

Reading Time: 4 minutes In previous blog of this series, we took a glance over the basic definition of Spark and Vertica. We also did a code overview for reading data from Vertica using Spark as DataFrame and saving the data into Kafka. In this blog we will be doing the reverse flow i.e. working on reading the data from Kafka as a DataFrame and writing that DataFrame into Continue Reading

Using Vertica with Spark-Kafka: Reading

Reading Time: 4 minutes We live in a world of Big data where the size of data is so big even for small results. This is the result of an increase in data collection on a rapid scale in the modern world. This massiveness of data brings the requirements of such tools which can work upon such a big chunk of data. I am pretty sure that you guys Continue Reading

Take a deep dive into Kafka – Producer API

Reading Time: 4 minutes I am going to start a series of blogs on Kafka API. This blog is a part of the series. In the series of blogs In this blog, we are going to learn about Producer-API. If you are new to Kafka then I will recommend you to first get some basic idea about Kafka Quickstart from kafka-quickstart . There are many reasons an application might Continue Reading

Knolx: Alpakka-Connecting Kafka & ElasticSearch to Akka Streams

Reading Time: < 1 minute Hi all, Knoldus has organized a 30 min session on 1st  March 2019 at 3:30 PM. The topic was Alpakka – Connecting Kafka and ElasticSearch to Akka Streams.  Many people have joined and enjoyed the session. I am going to share the slides here. Please let me know if you have any question related to linked slides or video. The slides of the KnolX are here: And Continue Reading

Flinkathon: What makes Flink better than Kafka Streams?

Reading Time: 2 minutes Initially, I would like you all to focus on a few questions before comparing the frameworks:1. Is there any comparison or similarity between Flink and the Kafka?2. What could be better in Flink over the Kafka?3. Is it the problem or system requirement to use one over the other? Before talking about the Flink betterment and use cases over the Kafka, let’s first understand their Continue Reading