stateful streaming

Stateful processing with Apache Beam

Reading Time: 6 minutes Overview Beam lets us process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam. With these new features, we can unlock newer use cases and newer efficiencies Quick Recap In Beam, a big data processing pipeline is a directed, acyclic graph of parallel operations called PTransforms processing data from PCollections. The boxes are PTransforms and the edges Continue Reading

Stateful stream processing with Apache Flink(part 1): An introduction

Reading Time: 4 minutes Apache Flink, a 4th generation Big Data processing framework provides robust stateful stream processing capabilities. So, in a few parts of the blogs, we will learn what is Stateful stream processing. And how we can use Flink to write a stateful streaming application. What is stateful stream processing? In general, stateful stream processing is an application design pattern for processing an unbounded stream of events. Continue Reading

Spark Structured Streaming (Part 4) – Handling Late Data

Reading Time: 3 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Understanding Stateful Streaming“. And this blog pertains to Handling Late Arriving Data in Spark Structured Streaming. So let’s get started. Handling Late Data With window aggregates (discussed in the previous blog) Spark automatically takes cares of late data. Every aggregate window is like a bucket Continue Reading

Spark Structured Streaming (Part 3) – Stateful Streaming

Reading Time: 4 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Internals of Structured Streaming“. And this blog pertains to Stateful Streaming in Spark Structured Streaming. So let’s get started. Let’s start from the very basic understanding of what is Stateful Stream Processing. But to understand that, let’s first understand what Stateless Stream Processing is. In Continue Reading

Stateful Streaming in Spark

Reading Time: 4 minutes Apache Spark is a fast and general-purpose cluster computing system. In Spark, we can do the batch processing and stream processing as well. It does near real-time processing. It means that it processes the data in micro-batches. I have discussed more Spark Streaming in my previous blog. Now in this blog, I’ll discuss Stateful Streaming in Spark. So let’s start !! What is Stateful Streaming? Continue Reading