Apache Beam

Apache Beam: Ways to join PCollections

Reading Time: 4 minutes Joining multiple sets of data into a singular entity is very often when working with data pipelines. In this blog, We will cover how we can perform Join operations between datasets in Apache Beam. There are different ways to Join PCollections in Apache beam – Extension-based joins Group-by-key-based joins Join using side input Let’s understand the above different way’s to perform Join with examples. We Continue Reading

Apache Beam: Side input Pattern

Reading Time: 3 minutes Apache Beam is a unified programming model for defining both batch and streaming data-parallel processing pipelines. It is a modern way of defining data processing pipelines. It has rich sources of APIs and mechanisms to solve complex use cases. In some use cases, while we define our data pipelines the requirement is, the pipeline should use some additional inputs. For example, In streaming analytics applications, it Continue Reading

Apache Beam Overview

Reading Time: 2 minutes This blog gives an overview of Apache Beam. What is Apache Beam? Apache Beam is an open-source, unified model for defining both batches as well as streaming data-parallel processing pipelines. Moreover available open-source Beam SDKs, can help us to easily build a program for our pipeline. Apache Flink, Apache Spark, and Cloud DataFlow are some of the possible runners to run the program. Why use Continue Reading