Protobuf Serialization in Akka

Table of contents

Reading Time: 4 minutes

Before Protobuf, lets have a look at what role does serialization play in Akka.

The messages that Akka actors send to each other are JVM objects (e.g. instances of Scala case classes). Message passing between actors that live on the same JVM is straightforward. It is simply done via reference passing. However, messages that have to escape the JVM to reach an actor running on a different host have to undergo some form of serialization (i.e. the objects have to be converted to and from byte arrays).

Akka itself uses Protocol Buffers to serialize internal messages (i.e. cluster gossip messages).

However, the serialization mechanism in Akka allows you to write custom serializers and to define which serializer to use for what.

But why should we really use this Protobuf Serialization?

As we all know, Java serialization is the default in Akka which is pretty bad for us because of the following reasons:

It is very slow.

Java provides a mechanism called object serialization, which allows an object to be represented as a sequence of bytes that includes the object’s data as well as information about the object’s type and the types of data stored in the object. The sequence of bytes can be used to deserialize the object graph by using the type information and bytes that represent the object and its data to recreate the object in memory.

The reason why it is considered slow because:

it doesn’t serialize just data, but it also serializes the entire class definition and all definitions of all referenced classes.

It was designed to be simple to use, to serialize almost everything and also it was supposed to work with different JVMs.

The main reason why we want to avoid Java serialization is its performance.

So, what are the alternatives for Jave Serialization?

Well we have quite a number of possible options: JSON, Kyro Serialization, Protobuf and so on.

What is Protobuf ?

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler.

You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

How do they work?

Create a .proto file having the structured data which needs to be serialized.
Include ScalaPB dependencies to generate scala case classes from .proto file.
Create your own Protobuf serializer.
Inform Akka about which serializer you want to use and which messages are bind to which serializer.

Step 1:

You specify how you want the information you’re serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here’s a very basic example of a .proto file that defines a message containing information about a add and subtract operation:

message Added {

int32 nbr1 = 1;
 int32 nbr2 = 2;
}
message Subtracted {
 int32 nbr1 = 1;
 int32 nbr2 = 2;
}

Step 2:

The above created .proto file which will be compiled to create the required case classes and for this to happen, we will be using scalaPB.

What is scalaPB now?

ScalaPB is a protocol buffer compiler (protoc) plugin for Scala. It will generate Scala case classes, parsers and serializers for your protocol buffers.

ScalaPB generates case classes that can co-exist in the same project alongside the Java-generated code for ProtocolBuffer.

This makes it easy to gradually migrate an existing project from the Java version of protocol buffers to Scala.

To automatically generate Scala case classes just import the scalaPB dependencies and plugins.

Running the compile command in sbt will both generate Scala sources from your protos and compile them.

Step 3:

Now we have the case classes for messages to be serialized, we need to do is write a custom serializer which would be using those case classes for serialization and deserialization.

Your custom serializer has to inherit from akka.serialization.Serializer and can be defined like the following:

class MyOwnProtobufSerializer extends SerializerWithStringManifest{

// Pick a unique identifier for your serializer, 0-40 is reserved by Akka itself
def identifier: Int = 101110116

override def manifest(o: AnyRef): String = o.getClass.getName
 final val AddedManifest = "Added"
 final val SubtractedManifest = "Subtracted"
 
// fromBinary deserializes the given array to the object
override def fromBinary(bytes: Array[Byte], manifest: String): AnyRef = {

println("inside fromBinary"+manifest)

manifest match {
 case AddedManifest => Added.parseFrom(bytes)
 case SubtractedManifest => Subtracted.parseFrom(bytes)
 }
 }

//toBinary serializes the given object to an array of bytes

override def toBinary(o: AnyRef): Array[Byte] = {

println("inside toBinary ")
 o match {
 case a: Added => a.toByteArray
 case s :Subtracted => s.toByteArray
  }
 }
}

Step 4:

So, rather than using the slow Java Serializer, we need to inform Akka of this.

For Akka to know which Serializer to use for what, you need edit your Configuration, in the “akka.actor.serializers”-section you bind names to implementations of the akka.serialization.Serializer you wish to use, like this:

akka {   actor {
    serializers { 
      java = "akka.serialization.JavaSerializer"
        proto = "sample.remote.serializer.MyOwnProtobufSerializer" 
      }     }   }

After you’ve bound names to different implementations of Serializer you need to wire which classes should be serialized using which Serializer, this is done in the “akka.actor.serialization-bindings”-section:

akka { 
  actor { 
serialization-bindings {
"java.io.serializable" = none
"com.google.protobuf.Message" = proto
"sample.remote.models.Added" = proto "sample.remote.models.Subtracted" = proto
   }   }  }

A simple working demo where two actors send messages to each other using protobuf serialization can be found here

References:

2 thoughts on “Protobuf Serialization in Akka5 min read”

Serkan says:

November 12, 2017 at 8:29 PM

Implementing a custom serializer is quite easy for simple messages, but some messages are not that simple. For example, if you have a generic message Message[T](msg: T) or message that contains more general type like Message(msg:Any), how will we create profo files and the conversions?
joecamel says:

January 10, 2018 at 8:51 PM

Reblogged this on Coding, Unix & Other Hackeresque Things.