reactive streams, reactive programming

What is Back Pressure in Stream Processing?

Reading Time: 3 minutes

Welcome to the blog! If are dealing with applications and services which are taking input from some source at some defined rate and processing and emitting output to another source, you might have or will soon hear about the term backpressure. It is a kind of problem you will encounter when you scale your application to cater to a certain volume and somewhere down the lane, application responsiveness degrades/lags. We will start with an introduction to Reactive systems but If you are here just to understand the backpressure, you can skip the first two sections. 

What are Reactive Systems?

With the expansion of businesses going global and the requirement of applications and services being used by distributed yet globally connected users, software development too needs a way to make the application development adhere to principles that provide scale and resiliency. Reactive systems are those that are: Responsive, Resilient, Elastic, and Message Driven as per Reactive Manifesto. The Reactive Manifesto is a set of principles that formally define properties of Reactive systems with the aforementioned properties. 

What is Back Pressure?

If we look into the reactive stream definition, Here is how it is defined:

This back-pressure is an important feedback mechanism that allows systems to gracefully respond to load rather than collapse under it. The back-pressure may bubble all the way up to the user, at which point responsiveness may degrade, but this mechanism will ensure that the system is resilient under load, and will provide information that may allow the system itself to apply other resources to help distribute the load.

Let us understand it from an example below in which there is a service that is consuming from the publisher or source and after processing, it is produced to some output.

Let us assume that the source to which the application is consuming the input produces 8 inputs in 1 second the above application is able to process only 4 of them in the same amount of time.

After 10 Seconds

service processing records remain the same, there would be a lot of boxes that would need a buffer to store them before service can pick them up and process them to avoid losing the input packets. At some point in time, if it receives more input, it can simply overwhelm the memory and the service might stop responding.

What if Service Could control the source speed?

Imagine if the service which is consuming from the source is able to control the speed at which it can consume the data from the source, there would not be any need for the buffer and the service would take as much input as it can process.

Is backpressure good or bad?

Backpressure may result in an abnormality if we can’t cope with the speed at which the upstream source is publishing. It would be great if we should be able to do a pull-based approach and as many records can be processed. Reactive streams are one approach to solve this problem with some set of defined rules called Reactive Stream Specification

What are Reactive Streams

Here is the official definition:

Reactive Stream is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure. This encompasses efforts aimed at runtime environments (JVM and JavaScript) as well as network protocols.

As the above definition is agnostic of language or a single runtime, still we as a developer need tools to implement these ideas to code on production. If you are a developer with a java eco-system you can refer to the reactive stream specification here.

In summary, Reactive Streams is a standard and specification for Stream-oriented libraries for the JVM that

  • process a potentially unbounded number of elements
  • in sequence,
  • asynchronously passing elements between components,
  • with mandatory non-blocking backpressure.

we can choose Reactive Stream to overcome the same. we would be talking about Reactive Stream specification in detail and how we are going to use the implemented libraries and component details in the next blog.

References: https://www.reactive-streams.org/

Written by 

Manish Mishra is Lead Software Consultant, with experience of more than 7 years. His primary development technology was Java. He fell for Scala language and found it innovative and interesting language and fun to code with. He has also co-authored a journal paper titled: Economy Driven Real Time Deadline Based Scheduling. His interests include: learning cloud computing products and technologies, algorithm designing. He finds Books and literature as favorite companions in solitude. He likes stories, Spiritual Fictions and Time Traveling fictions as his favorites.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading