Apache Flink

Reading Avro files using Apache Flink

Reading Time: 2 minutes In this blog, we will see how to read the Avro files using Flink. Before reading the files, let’s get an overview of Flink. There are two types of processing – batch and real-time. Batch Processing: Processing based on the data collected over time. Real-time Processing: Processing based on immediate data for an instant result. Real-time processing is in demand and Apache Flink is the Continue Reading

Using Apache Flink for Kinesis to Kafka Connect

Reading Time: 3 minutes In this blog, we are going to use kinesis as a source and kafka as a consumer. Let’s get started. Step 1: Apache Flink provides the kinesis and kafka connector dependencies. Let’s add them in our build.sbt: Step 2: The next step is to create a pointer to the environment on which this program runs. Step 3: Setting parallelism of x here will cause all Continue Reading

Comparison between different streaming engines

Reading Time: 5 minutes Distributed stream processing engines have been on the rise in the last few years, first Hadoop became popular as a batch processing engine, then focus shifted towards stream processing engines. Stream processing engines can make the job of processing data that comes in via a stream easier than ever before and by using clustering can enable processing data in larger sets in a timely manner. Continue Reading

Flink on Kubernetes

Reading Time: 3 minutes Introduction Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. The design of Flink is such as to run in all common cluster environments, perform computations at in-memory speed and at any scale. There are two Flink’s clusters: Flink session cluster and Flink job cluster. A job cluster is a dedicated cluster that runs a single job. The job is part of Continue Reading

Big Data Landscape explained

Reading Time: 5 minutes Big Data has now evolved into a buzz word and it seems everyone is either working on it or want to work on it. However, most of the people associate Big Data with some of the popular tool sets like Hadoop, Spark, NoSql databases like Hive, Cassandra , HBase etc. HDFS made Big Data popular as it gave us an option to distribute the data Continue Reading

Flinkathon: Guide to setting up a Local Flink Custer

Reading Time: 3 minutes In our previous blog post, Flinkathon: First Step towards Flink’s DataStream API, we created our first streaming application using Apache Flink. It was easy, clean, and concise. However, the real power of Apache Flink is seen on a cluster, where data is processed in a distributed manner, with the advantage of multi-core/multi-memory systems. So, in this blog post, we will see how to set up Continue Reading

Flinkathon: First Step towards Flink’s DataStream API

Reading Time: 3 minutes In our previous blog posts: Flinkathon: Why Flink is better for Stateful Streaming applications? Flinkathon: What makes Flink better than Kafka Streams? We saw why Apache Flink is a better choice for streaming applications. In this blog post, we will explore how easy it is to express a streaming application using Apache Flink’s DataStream API. DataStream API DataStream API is used to develop regular programs Continue Reading

Flinkathon: What makes Flink better than Kafka Streams?

Reading Time: 2 minutes Initially, I would like you all to focus on a few questions before comparing the frameworks:1. Is there any comparison or similarity between Flink and the Kafka?2. What could be better in Flink over the Kafka?3. Is it the problem or system requirement to use one over the other? Before talking about the Flink betterment and use cases over the Kafka, let’s first understand their Continue Reading

Flinkathon: Why Flink is better for Stateful Streaming applications?

Reading Time: 2 minutes Stream processing is a way to query a continuous stream of data and draw conclusions from it within the boundaries of a real-time scenario. For example, receiving an alert as soon as a fraudulent transaction is done via a credit/debit card. The 2 main types of stream processing done are: Stateless: Where every event is handled completely independent from the preceding events. Stateful: Where a Continue Reading

Is Apache Flink the future of Real-time Streaming?

Reading Time: 5 minutes In our last blog, we had a discussion about the latest version of Spark i.e 2.4 and the new features that it has come up with. While trying to come up with various approaches to improve our performance, we got the chance to explore one of the major contenders in the race, Apache Flink. Apache Flink is an open source platform which is a streaming Continue Reading

Structured Streaming: What is it?

Reading Time: 3 minutes With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like – Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on solving business challenges. The reason is, the frameworks (the ones mentioned above) provided inbuilt support for all of them. For example: In Spark Streaming, by just adding Continue Reading

Streaming in Spark, Flink and Kafka

Reading Time: 7 minutes There is a lot of buzz going on between when to use use spark, when to use flink, and when to use Kafka. Both spark streaming and flink provides exactly once guarantee that every record will be processed exactly once thereby eliminating any duplicates that might be available. Both provide very high throughput compared to any other processing system like storm, and the overhead of Continue Reading

Introduction To HADOOP !

Reading Time: 4 minutes Here I am to going to  write a blog on Hadoop! “Bigdata is not about data! The value in Bigdata [is in] the analytics. ” -Harvard Prof. Gary King So the Hadoop came into Introduction! Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache Continue Reading