Reactivate your streams with Reactive Streams!!

Reading Time: 5 minutes

As you all might have known by now that one of the hot topics for quite some time has been streaming of big data. Day after day, we see tons of streaming technologies out there competing with one another. The obvious reason for that, processing big volumes of data is not enough. We need real-time processing of data, especially when we need to handle continuously increasing volumes of data and also need to process it and maintain it. Screenshot from 2019-03-03 12-11-05

 

We ourself have tried our hands on many of the technologies like Apache Flink, Akka streams, Spark Streaming and many more. Just in case you need to know about them, you can go through the earlier blogs I have added:

But after going through these I thought about backtracking the things a little bit. I thought to myself, how these streams were streaming options were different from each other and what makes them similar. While doing this, I came across a term Reactive Streams which astonished me. So in this blog, I will be taking you on a tour to explore what Reactive Streams are and what makes them so special. But before coming to the Reactive Streams we must know what Streams actually are and we will also try to recall the Reactive Manifesto. So going by the definition,

Streams are continuously flowing data which begins at a Source and ends at a Sink passing through multiple flows in between.  

Image result for streams

 

Now that you have some idea about streams, you can go through the below image to get some idea about what the Reactive Manifesto is and what the 4 principles of the Reactive Manifesto are:

Image title

To know and read more about the Reactive Manifesto, you can go through its official documentation. Now that we understand the basic terminologies, we will try to keep our focus on Reactive Streams.

Definition :
Reactive Streams is an initiative to provide a standard for
asynchronous stream processing with non-blocking back pressure.

3 points to note over here are: Screenshot from 2019-03-03 13-31-01.png
We have already discussed what stream processing is. Let’s focus more on the other 2 points.

Asynchronous Processing

Asynchronous Processing is required as to :

  • ensure the non-blocking flow of data
  • enable the parallel use of computing resources
  • collaborate network hosts or multiple CPU cores within a single machine.

 

Back-pressured Processing

 

Screenshot from 2019-03-03 13-40-33

Back-pressured Processing is required as to:

  • communicate workload levels so that data passing never faces bottlenecks on either side.
  • manage the rate of the operations of the source and sink.
  • make sure the data producer doesn’t overwhelm the data consumer and potentially bring down the entire system.

Let’s try to understand the backpressure part in detail.  

 

Understanding BackPressure 

 

Consider a scenario where there is a fast publisher publishing at a speed of 100 ops/sec and a slow subscriber subscribing at the speed of 1 ops/sec.  

 

 

The subscriber usually has some kind of buffer to manage incoming messages but that also has a limited space which keeps on filling as the publisher is much faster than the subscriber.

 

  Screenshot from 2019-02-24 20-16-59

As the buffer keeps on filling, at one time the buffer will surely overflow as the publisher is continuously sending data to the subscriber via the flow.

Now how can we handle this scenario?  

 

Screenshot from 2019-03-04 16-35-53

 

Option 1: Increase Buffer Size : One option is to increase the buffer size of subscribers. This is possible, but not always realistic. As the memory is limited, we will eventually run out of memory.  

 

Screenshot from 2019-03-04 16-36-27

 

Option 2: Bounded Buffer + Drop Messages Another option is to simply drop elements. Working in stream processing gives us some flexibility for dropping what can’t be handled, but it’s not always appropriate. This would highly increase the load on Publisher.  

 

Optimal Solution: What we need is a bi-directional flow of data — elements emitted downstream from publisher to the subscriber, and a signal for demand emitted upstream from subscriber to publisher. If the subscriber is placed in charge of signaling demand, the publisher is free to safely push up to the number of elements demanded. This also prevents wasting resources because demand is signaled asynchronously, a subscriber can send many requests for more work before actually receiving any.

 

 

Reactive Streams implementation

 

Implementations of the Reactive Streams specification are compatible with each other, which is an obvious benefit of choosing a Reactive Streams compliant library. You can even integrate different stream processing systems together that use different libraries as long as they comply with the specification.

 

Image result for why reactive streams

 

A few of the Reactive Streams implementation are listed below:

 

Need for Reactive Streams?

 

The goals of the Reactive Streams specification are :  

  • to govern the exchange of stream data across an asynchronous boundary – like passing elements on to another thread or thread-pool.
  • to create a standard for achieving statically typed, high-performance, low latency, and asynchronous streams of data with built-in non-blocking back pressure
  • Handling streams of data—especially “live” data whose volume is not predetermined—requires special care in an asynchronous system.
  • The most prominent issue is that resource consumption needs to be controlled such that a fast data source does not overwhelm the stream destination i.e maintaining the backpressure.

In this blog, we tried to cover what reactive streams are and why they are so important. I hope that now you guys have some idea about what reactive Streams are and what makes a stream reactive. So now, when it comes to choosing a streaming approach, be sure to make a choice that suits your use case well. To know more about the Reactive Streams, you can go through the following links :

  Hope this helps. Stay tuned for more interesting blogs. 🙂

 

Knoldus-blog-footer-image

 

 

Advertisements

Written by 

Anmol Sarna is a software consultant having more than 1.5 years of experience. He likes to explore new technologies and trends in the IT world. His hobbies include playing football, watching Hollywood movies and he also loves to travel and explore new places. Anmol is familiar with programming languages such as Java, Scala, C, C++, HTML and he is currently working on reactive technologies like Scala, Cassandra, Lagom, Akka, Spark and Kafka.

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!