Flink

Flinkathon: Guide to setting up a Local Flink Custer

Reading Time: 3 minutes In our previous blog post, Flinkathon: First Step towards Flink’s DataStream API, we created our first streaming application using Apache Flink. It was easy, clean, and concise. However, the real power of Apache Flink is seen on a cluster, where data is processed in a distributed manner, with the advantage of multi-core/multi-memory systems. So, in this blog post, we will see how to set up Continue Reading

Flinkathon: First Step towards Flink’s DataStream API

Reading Time: 3 minutes In our previous blog posts: Flinkathon: Why Flink is better for Stateful Streaming applications? Flinkathon: What makes Flink better than Kafka Streams? We saw why Apache Flink is a better choice for streaming applications. In this blog post, we will explore how easy it is to express a streaming application using Apache Flink’s DataStream API. DataStream API DataStream API is used to develop regular programs Continue Reading

Flinkathon: What makes Flink better than Kafka Streams?

Reading Time: 2 minutes Initially, I would like you all to focus on a few questions before comparing the frameworks:1. Is there any comparison or similarity between Flink and the Kafka?2. What could be better in Flink over the Kafka?3. Is it the problem or system requirement to use one over the other? Before talking about the Flink betterment and use cases over the Kafka, let’s first understand their Continue Reading

Is Apache Flink the future of Real-time Streaming?

Reading Time: 5 minutes In our last blog, we had a discussion about the latest version of Spark i.e 2.4 and the new features that it has come up with. While trying to come up with various approaches to improve our performance, we got the chance to explore one of the major contenders in the race, Apache Flink. Apache Flink is an open source platform which is a streaming Continue Reading

Structured Streaming: What is it?

Reading Time: 3 minutes With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like – Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on solving business challenges. The reason is, the frameworks (the ones mentioned above) provided inbuilt support for all of them. For example: In Spark Streaming, by just adding Continue Reading

Another Apache Flink tutorial, following Hortonworks’ Big Data series

Reading Time: 7 minutes Background A couple of weeks back, I was discussing with a friend of mine, on the topic of training materials on Apache Spark, available online. Of the couple of sites that I mentioned, the hadoop tutorial from Hortonworks, came up. This was primarily because I liked the way they organized the content: it was clearly meant for encouraging newcomers to try things hands-on, banishing the Continue Reading

Getting close to Apache Flink, albeit in a Träge manner – 2

Reading Time: 6 minutes From the preceding post in this series In the last blog , we had taken a look at Flink’s CountWindow feature. Here’s a quick recap: As a stream of events enter a Flink-based application, we can apply a transformation of CountWindow on it (there are many such transformations the Flink offers us, we will meet them as we go). CountWindow allows us to create a Continue Reading

Getting close to Apache Flink, albeit in a Träge manner – 1

Reading Time: 7 minutes Of late, I have begun to read about Apache Flink. Apache Flink (just Flink hereafter), is an ‘open source platform for distributed stream and batch data processing’, to quote from the homepage.  What has caught my interest is Flink’s idea that, the ability operate on unit of data streaming in gives one the flexibility to decide what constitutes a batch: count of events or events Continue Reading