Reading Time: 6 minutes Delta Lake is an open-source storage layer that brings reliability to data lakes. It has numerous reliability features including ACID transactions, scalable metadata handling, and unified streaming and batch data processing.
Reading Time: 2 minutes Kafka Streams is a Client library where the input and output data are stored in an Apache Kafka cluster. It combines the simplicity of building and deploying Java and Scala processing applications with Kafka topics on the client side with the benefits of Kafka’s server-side cluster technology. When working with Kafka Streams, there are times when the stream processing application requires integration with data external Continue Reading
Reading Time: 2 minutes Initially, I would like you all to focus on a few questions before comparing the frameworks:1. Is there any comparison or similarity between Flink and the Kafka?2. What could be better in Flink over the Kafka?3. Is it the problem or system requirement to use one over the other? Before talking about the Flink betterment and use cases over the Kafka, let’s first understand their Continue Reading
Reading Time: 3 minutes By now you must be familiar with KSQL and how to get started with it. If not, check out the Part1 KSQL: Getting started with Streaming SQL for Apache Kafka of this series. In this blog, we’ll move one step forward to get an understanding of the Dual streaming model to see what abstractions does KSQL use to process the data. All the data that we Continue Reading
Reading Time: 4 minutes Apache Kafka v0.10 introduced a new feature Kafka Streams API – a client library which can be used for building applications and microservices, where the input and output data can be stored in Kafka clusters. Kafka Streams provides state stores, which can be used by stream processing applications to store and query data. Every task in Kafka Streams uses one or more state stores which Continue Reading
Reading Time: < 1 minute Hello everyone, Knoldus organized a session on 22nd September 2017. The topic was “Learning Kafka Streams with Scala”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides: Video: If you have any query, then please feel free to comment below.
Reading Time: 3 minutes Introduction to core concepts: Apache Kafka is a distributed streaming platform which enables you to publish and subscribe to a stream of records also letting you process this stream of records as it occurs. Kafka Streams is a client library used for building applications and microservices, where the input and output data are stored in Kafka clusters. Interface KStream<K, V> is an abstraction of Continue Reading
Reading Time: 2 minutes In our previous blog – Self-Learning Kafka Streams with Scala – #1, we saw how to create a simple KStream in Scala. In this blog, we will see how to transform a KStream and create a new Stream from it. But, before we get into the details of the KStream transformations, let’s take a look at the code:
Reading Time: 3 minutes Kafka Streams is a powerful API. In Kafka, we can only store our data for consumers to consume. But we always needed a processor with which we can process the data without going to an external tool like Spark, Storm etc. To know more about this and for a quick start you can check out the first blog of this series. The Need Now here Continue Reading
Reading Time: 2 minutes A few days ago, I came across a situation where I wanted to do a stateful operation on the streaming data. So, I started finding possible solutions for it. I came across many solutions which were using different technologies like Spark Structured Streaming, Apache Flink, Kafka Streams, etc. All the solutions solved my problem, but I selected Kafka Streams because it met most of my Continue Reading
Reading Time: 5 minutes Whenever we hear the word Kafka, all we think about it as a messaging system with a publisher-subscriber model that we use for our streaming applications as a source and a sink. So we can say that Kafka is just a dumb storage system that stores the data provided by a producer for a long time (configurable) and it can provide it to some consumer whenever Continue Reading