Streaming

Flinkathon: Guide to setting up a Local Flink Custer

In our previous blog post, Flinkathon: First Step towards Flink’s DataStream API, we created our first streaming application using Apache Flink. It was easy, clean, and concise. However, the real power of Apache Flink is seen on a cluster, where data is processed in a distributed manner, with the advantage of multi-core/multi-memory systems. So, in this blog post, we will see how to set up Continue Reading

Determine Kafka broker health using Kafka stream application’s JMX metrics and setup Grafana alert

As we all know, Kafka exposes the JMX metrics whether it is Kafka broker, connectors or Kafka applications. A few days ago, I got the scenario where I needed to determine Kafka broker health with the help of Kafka stream application’s JMX metrics. It looks bit awkward, right? I should use the broker’s JMX metrics to do this, why am I looking to application JMX Continue Reading

Knolx: Alpakka – Connecting Kafka and ElasticSearch to Akka Streams

Hi all, Knoldus has organized a 30 min session on 1st  March 2019 at 3:30 PM. The topic was Alpakka – Connecting Kafka and ElasticSearch to Akka Streams.  Many people have joined and enjoyed the session. I am going to share the slides here. Please let me know if you have any question related to linked slides or video. The slides of the KnolX are here: And Continue Reading

Flinkathon: First Step towards Flink’s DataStream API

In our previous blog posts: Flinkathon: Why Flink is better for Stateful Streaming applications? Flinkathon: What makes Flink better than Kafka Streams? We saw why Apache Flink is a better choice for streaming applications. In this blog post, we will explore how easy it is to express a streaming application using Apache Flink’s DataStream API. DataStream API DataStream API is used to develop regular programs Continue Reading

Flinkathon: What makes Flink better than Kafka Streams?

Initially, I would like you all to focus on a few questions before comparing the frameworks:1. Is there any comparison or similarity between Flink and the Kafka?2. What could be better in Flink over the Kafka?3. Is it the problem or system requirement to use one over the other? Before talking about the Flink betterment and use cases over the Kafka, let’s first understand their Continue Reading

Kafka: Consumer – Push vs Pull approach

Have you ever thought about the Push vs Pull approach for the system, which one suits or solves which problem? Another Question why did Kafka choose Pull over Push design for Consumers? Before talking about the Kafka approach, whether the Broker should push the data to consumer or consumer should pull from Kafka? Let’s first understand both of the approaches, as each one has its Continue Reading

Working with Project Reactor: Reactive Streams

The words “Reactive” and “Streams” often go hand in hand. The streams API of Java 8 is a great tool for making your projects Reactive. But that’s not the only stream you can have. In this blog, I’d like to talk about this awesome project called Project Reactor.

Reactivate your streams with Reactive Streams!!

As you all might have known by now that one of the hot topics for quite some time has been streaming of big data. Day after day, we see tons of streaming technologies out there competing with one another. The obvious reason for that, processing big volumes of data is not enough. We need real-time processing of data, especially when we need to handle continuously increasing Continue Reading

Spark Streaming vs. Structured Streaming

Fan of Apache Spark? I am too. The reason is simple. Interesting APIs to work with, fast and distributed processing, unlike map-reduce no I/O overhead, fault tolerance and many more. With this much, you can do a lot in this world of Big data and Fast data. From “processing huge chunks of data” to “working on streaming data”, Spark works flawlessly in all. In this Continue Reading

Is Apache Flink the future of Real-time Streaming?

In our last blog, we had a discussion about the latest version of Spark i.e 2.4 and the new features that it has come up with. While trying to come up with various approaches to improve our performance, we got the chance to explore one of the major contenders in the race, Apache Flink. Apache Flink is an open source platform which is a streaming Continue Reading

kafka with spark

Apache Spark 2.4: Adding a little more Spark to your code

Continuing with the objectives to make Spark faster, easier, and smarter, Apache Spark recently released its fifth release in the 2.x version line i.e Spark 2.4. We were lucky enough to experiment with it so soon in one of our projects. Today we will try to highlight the major changes in this version that we explored as well as experienced in our project. In our Continue Reading

Alpakka – Connecting Kafka and ElasticSearch to Akka streams

In our previous blog, we had a look at what Akka streams are and how they are different from the other streaming mechanisms we have. In this blog, we will be taking a little step forward into the world of Akka Streams. In order to work with Akka streams, we need a mechanism to connect Akka Streams to the existing system components. That is where Alpakka Continue Reading

Akka Streams: Is it a Solution to Your Streaming Problems?

A few days earlier, in our project, we were using Spark streaming and initially, it worked like a charm. But as we were very close to completion of our use case, the unexpected occurred. Spark does have a lot of interesting features, but we had some more custom needs such as running a ton of varying jobs with different actors/flows. Also, we needed something which Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!