Spark Structured Streaming

Structured Streaming: Philosophy behind it

In our previous blogs: Structured Streaming: What is it? & Structured Streaming: How it works? We got to know 2 major points about Structured Streaming – It is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. It treats the live data stream as a table that is being continuously appended/updated which allows us to express our streaming computation as Continue Reading

Structured Streaming: How it works?

In our previous blog post – Structured Streaming: What is it? we got to know that Structured Streaming is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. Now it’s time to learn  – How it works? So, in this blog post, we will look at the working of a structured stream via an example. So, let’s take a Continue Reading

Structured Streaming: What is it?

With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like – Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on solving business challenges. The reason is, the frameworks (the ones mentioned above) provided inbuilt support for all of them. For example: In Spark Streaming, by just adding Continue Reading

KnolX: Understanding Spark Structured Streaming

Hello everyone, Knoldus organized a session on 05th January 2018. The topic was “Understanding Spark Structured Streaming”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides:

fetching data from different sources using Spark 2.1

What’s new in Apache Spark 2.2

Apache recently released a newer version of Spark i.e Apache Spark 2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production ready and its experimental tag has been removed. Some of the high-level changes and improvements : Production ready Structured Streaming Expanding SQL functionalities New Continue Reading

Having Issue How To Order Streamed Dataframe ?

A few days ago, i have to perform aggregation on streaming dataframe. And the moment, i apply groupBy for aggregation, data gets shuffled. Now the situation arises how to maintain order? Yes, i can use orderBy with streaming dataframe using Spark Structured Streaming, but only in complete mode. There is no way of doing ordering of streaming data in append mode and update mode. I Continue Reading

Exploring Spark Structured Streaming

Hello Spark Enthusiasts, Streaming apps are growing more complex. And it is getting difficult to do with current distributed streaming engines. Why streaming is hard ? Streaming computations don’t run in isolation. Data arriving out of time order is a problem for batch-processed processing. Writing stream processing operations from scratch is not easy. Problem with DStreams: Processing with event-time: dealing with late data. Interoperate streaming Continue Reading

%d bloggers like this: