Apache-Beam

Stateful processing with Apache Beam

Reading Time: 6 minutes Overview Beam lets us process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam. With these new features, we can unlock newer use cases and newer efficiencies Quick Recap In Beam, a big data processing pipeline is a directed, acyclic graph of parallel operations called PTransforms processing data from PCollections. The boxes are PTransforms and the edges Continue Reading

Debugging Apache Beam Pipeline

Reading Time: 2 minutes Overview Apache Beam is known as one of the widely used frameworks for Stream and Batch processing in a distributed environment and provides some very unique features. It is an open-source, unified bulk data processing framework that supports data processing through various SDKs that allow the execution of pipelines in different processing engines/runners. Beam Apache runners : Spark Flink Apex Google Cloud Dataflow DirectRunner. A Continue Reading