“Reactive Streams”, whenever we come across these words, there are two things that come to our mind. The first is asynchronous stream processing and the second is non-blocking backpressure. In this blog, we are going to learn about the latter part.
Understanding back pressure
According to the English dictionary, back pressure means “Resistance or force opposing the desired flow of fluid through pipes”. To define it in our software context, in place of the flow of fluid we can say the flow of data “Resistance or force opposing the desired flow of data through software”
Let us see what role does backpressure play in Akka streams.
Imagine you’ve built an app and its working just how you wanted it, but when the number of users gets increased, the app crashes because it was not designed to handle that amount of pressure. We wouldn’t want that for any system or application. This is where the need of handling backpressure entered.
There are many solutions to this problem and various implementations one of which is Akka streams.
Let us understand the problem in terms of Streaming Applications
Consider these two people here as two endpoints of dataflow. One is the source of data, referred to as the Publisher in Reactive streams who will send the data and the other is the one responsible for receiving and buffering, referred to as the Subscriber in Reactive Streams who will receive the data coming from the publisher. These terminologies are used in defining Reactive Streams specifications. In Akka Streams, these concepts are implemented as Source(Publisher), Flow(referred to as Processor in Reactive Streams) and Sink(Subscriber).
The mode in which Reactive Streams back-pressure works can be described as “dynamic push/pull mode” since it will switch between push and pull-based back-pressure models depending on the downstream being able to cope with the upstream production rate or not.
To understand this problem situation and to see how backpressure handles them, let us see the two scenarios.
1. Fast Publisher and Slow Subscriber
Here, we can see the publisher is fast and can send 100 requests per second whereas the subscriber is slow and can handle 1 request per second. All the subscribers have some kind of buffer where the awaiting requests can be stored. Now the problem is what will happen when the buffer overflows?
Solution: Pull-based backpressure
So, in pull-based backpressure, we have this slow subscriber which has a known buffer space, that is we have a bounded buffer. Here, we communicate with the Publisher by informing them how much of free space the Subscriber has in its buffer to fit incoming data and the Publisher can send as many elements as we have allowed it to send.
As in the above picture, we can see how the slow subscriber has asked to send only 3 elements because the amount of space left is only for 3 elements, and then the Publisher is sending only 3 of them to the Subscriber. Also, since we can communicate asynchronously both ways we get a nice pipelining effect. So imagine when those 3 elements will fly over to the subscriber, in the meantime the elements present in subscriber buffer will get processed. So now the subscriber can request more elements asynchronously.
As we can see, this scenario effectively means that the Subscriber will pull the elements from the Publisher, hence this mode of operation is known as pull-based back-pressure.
2. Slow Publisher and Fast Subscriber
This scenario is not much of a big problem since you don’t need to slow down the Publisher. The signaling rate of both the Publisher and Subscriber are rarely constant and may change to fast Publisher and slow Subscriber any moment. So to prevent this kind of situations this, back pressure protocol must be enabled in such type of situations. The protocol works the same way as discussed in pull-based backpressure. The Publisher will never send more elements than the required demand by the Subscriber. But since in this case, the Subscriber is faster, it will be requesting elements at a higher rate. This means the Publisher does not ever have to wait to publish incoming requests.
As we can see, in this scenario we effectively operate in so-called push-mode. Since the Publisher can continue producing elements as fast as it can and the pending demand will be recovered just-in-time while it is emitting elements. With push-based streams, the producer is in control and pushes data to the consumer when it’s available. When dealing with user input, we mostly prefer push streams as they model the producer accurately since you can’t control the user.
The users of Akka streams library do not have to write any explicit backpressure handling code. This protocol is built-in and dealt with automatically by all of the provided Akka Streams operators. However, it is possible to add explicit buffer operators with overflow strategies to influence the behavior of the stream.