In our previous blog post – Structured Streaming: What is it? we got to know that Structured Streaming is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications.

Now it’s time to learn  – How it works? So, in this blog post, we will look at the working of a structured stream via an example.

So, let’s take a look at the example:

Above is an example of a structured stream which has Socket as the source & Console as the sink. It has 3 major sections:

  1. Source – The first part is the source, which is represented by lines variable. It is nothing but a DataFrame created from a streaming source.
  2. Operations – Since, we get the same DataFrame API, calculating Word-Count has the same code as it is for a batch data processing.
  3. Sink – The last part is the Sink which is represented by query variable. It is nothing but a streaming sink where results are sent.

Now, when we have seen the code and its anatomy in words it’s time to take a look at it’s working via an illustration:


In the above illustration, the query variable represents a DataFrame with the unlimited number of rows. Every push to the Socket will append new rows or update old rows in the DataFrame, which eventually sends the data to the sink, i.e., Console.

So, we have seen how a Structured Streaming works, i.e., it acts like a table with an unlimited number of rows. I hope you liked this blog. Please feel free to suggest or comment.

We will be back with more blogs on Structured Streaming. Till then stay tuned 🙂


One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.