Author: Niraj Kumar

Flink Architecture And Cluster Deployment

Reading Time: 4 minutes In this blog, we will be discussing Flink Architecture and its core components. Introduction Flink is a distributed system and requires effective allocation and management of compute resources in order to execute streaming applications. It integrates with all common cluster resource managers such as Hadoop YARN and Kubernetes,. In addition it,it can run standalone cluster or even as a library. Components of a Flink Cluster The Flink Architecture Continue Reading

Stateful processing with Apache Beam

Reading Time: 6 minutes Overview Beam lets us process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam. With these new features, we can unlock newer use cases and newer efficiencies Quick Recap In Beam, a big data processing pipeline is a directed, acyclic graph of parallel operations called PTransforms processing data from PCollections. The boxes are PTransforms and the edges Continue Reading

ProtoBuf: New way of Serialization

Reading Time: 5 minutes In this blog, we will learn how to use ProtoBuf with java and its comparison with JSON. Overview Protobuf is short for protocol buffers, which are language- and platform-neutral mechanisms for serializing structured data for use in communications protocols, data storage, and more. Think XML, but smaller, faster, and simpler The method involves an interface description language that describes the structure of some data and a program Continue Reading

Apache Beam Overview

Reading Time: 2 minutes This blog gives an overview of Apache Beam. What is Apache Beam? Apache Beam is an open-source, unified model for defining both batches as well as streaming data-parallel processing pipelines. Moreover available open-source Beam SDKs, can help us to easily build a program for our pipeline. Apache Flink, Apache Spark, and Cloud DataFlow are some of the possible runners to run the program. Why use Continue Reading