Akka Streams: Produce, Process, and Consume the data

Reading Time: 5 minutes

Introduction of Akka Streams

Akka Streams enables the consumption of streaming data in a fully non-blocking and asynchronous manner. Akka Streams is a part of Akka, specifically the part exposing a user-friendly API (a set of classes and methods) to handle, consume produce streams easily. It is all about the “how” of pushing streaming data around and making that easier for a user of Akka.

Akka is primarily based on the Actor Model which basically allows the developer to express algorithms in a distributed manner.

Akka Streams is a design and attempt to tackle a classical problem in computer science related to the idea of data streams.

What are Streams?

Streams are sequences of data, divided up into individual elements. The size of the stream may not be known or it may, in fact, be infinite often these streams are too large to fit in memory basically what this means is that you have this stream of data whatever that data maybe and you don’t know without consuming the data how much data is going to be there so it may be infinite or it may not be infinite but if it’s too large to fit in memory you can think of that as kind of functionally infinite because from the memory perspective it might as well be it can’t fit.

Reactive Streams Component

Publisher -> Publishes data into the stream

Subscriber -> Costumes data from the stream

Processor -> Act as both a producer and a subscriber, obeying the contract for each.

Subscription -> Connects a subscriber to a publisher in order to initiate the message flow.

Linear Streams

Image showing the flow of data in Linear Streaming

In a linear stream, we have a file, we have a source that is consuming from that file. We have a flow that is performing some map operations on the data from the file. And then we have a sink which is then written to the database.

  • Source – The source of the “data” in the sttream.
  • Sinks – The destination for the “data” in the stream.
  • Flows – Transformations to the “data” in the stream.
  • Runnable Graphs – A akka streams where all inputs and outputs are connected.

Backpressure

  • Backpressure implemented using a pull/push mechanism.
  • Subscribers signal demand. Demand is sent upstream via subscription.
  • Publisher receives demand and push data(if available) downstream.
  • Publishers are forbidden from pushing more than the demand

How Backpressure happens?

The publisher is gonna send data downstream. We have a publisher, a processor, and a subscriber the processor is going to receive that data and then send it on to the subscriber, in the meantime, the subscriber is sending requests upstream to the processor and eventually to the publisher signaling demand. So what happens is a subscriber will signal the demand that gets sent all the way upstream through the subscription then the publisher is going to receive the demand and if has any data it will push it back downstream. And again publishers are forbidden from pushing more than the demand and that’s what creates our backpressure. If the subscriber has not signaled that it is ready for more data then the publisher is not allowed to push it.

Source

A source is a stage with a single output, think of it as an input to a stream. It could be receiving data from a file, a database, a Rest API, collection, etc. The amount of data is not predetermined. It may be infinite.

Source is the entry point of the stream and gives exactly single output

Demand in Sources

Sources receive Demand from Downstream. A source can push data downstream, as long as there is demand. If there is no demand, then the source is forbidden from pushing data. The source will have to deal with incoming data until demand resumes.

Source receives demand from downstream

Sources from single elements

val source: Source[String, NotUsed]= Source.single("Scala is Awesome")
val source: Source[String, NotUsed]= Source.repeat("Scala is Awesome")

single -> Create a source with the single arbitrary element. Push the element and complete

repeat -> Similar to Source.single but the same element is infinitely to be push whenever there is demand

val source: Source[String, Cancellable] = Source.tick(
    initialDelay= 1.second
    interval = 5.seconds
    tick = "Hello World"
)

tick -> Similar to Source.repeat except the element is pushed on a time schedule. If there is no demand(i.e. back pressure), no tick will be push. That tick will be lost.

Sources from Iterables

val source: Source[Int, NotUsed]= Source(1 to 10)

Create a source from a collection. immutable.Iterable. Elements are taken from the Iterable and push to downstream whenever there is demand. The stream is complete when there is no more data in the iterable.

Sink

A Sink is a stage with a single input. You can think of it as the output of the stream. The Sink could be writing data to a file, database, Rest API, collection, etc. Sink creates backpressure by controlling demand.

Sink receives the data elements from the stream.There must be at least one sink in every akka stream.

Demand in Sink

Sink send Demand upstream. A Sink should only send Demand when it is ready to receive more data. If a sink can not keep up with incoming data, demand will stop, and the data flow will cease.

Sink sends demand upstream

Sink that materialize a single element

val head: Sink[Int, Future[Int]] = Sink.head[Int]
val last: Sink[Int, Future[Int]] = Sink.last[Int]
val headOption: Sink[Int, Future[Option[Int]]] = Sink.headOption[Int]
val lastOption: Sink[Int, Future[Option[Int]]] = Sink.lastOption[Int]

head/last -> Pulls until it finds the first or last element in the stream and materializes it. Fails with NoSuchElementException if the stream is empty.

headOption/lastOption -> It is similar to head/last except it returns None if the stream is empty.

Flow

A Flow is a graph stage with a single input and a single output. Flow is use to move data from a Source into a Sink and manipulate that data in some way(transforming, filtering, etc). A Flow acts as both a Source and a Sink, obeying the rules for both.

Flow takes the elements from the source, apply some transformation in it and transfer it to the Sink.

Demand in Flow

A Flow receives a demand from downstream and propagates it to upstream. Like a Source, if there is no downstream demand, the FLow must stop. Flow can propagate backpressure upstream by reducing and stopping the demand. Alternatively, Flow can also be drop and buffer data.

Flow receive a demand from downstream and propagates it to upstream

Flow to map elements

val double: Flow[Int, Int, Notused] = Flow[Int].map(_ * 2)
val double: Flow[Int].mapAsync(parallelism = 4) { i => Future { i * 2 } }

map -> Transform the stream by applying the given function to each element.

mapAsync -> Accepts a function that returns a future but still guarantees ordering. Parallelism defines the amount of parallelism to use when resolving the futures.

Conclusion

Akka’s stream aim is to address one of the most critical yet problematic challenges in asynchronous processing, which is the ability to align the processing speeds between producers and consumers of messages while allowing for efficient use of system resources. It allows applications to more easily exploit a limited form of parallel processing. The streaming process is consider to be the backbone of live streaming applications.

References

https://doc.akka.io/docs/akka/current/stream/index.html

https://developer.lightbend.com/docs/akka-platform-guide/concepts/reactive-streams.html

Written by 

Gulshan Singh is a Software Consultant at Knoldus Inc. Software Developers are forever students. He is passionate about Programming.