ProtoBuf: New way of Serialization

Reading Time: 5 minutes

In this blog, we will learn how to use ProtoBuf with java and its comparison with JSON.

Overview

  • Protobuf is short for protocol buffers, which are language- and platform-neutral mechanisms for serializing structured data for use in communications protocols, data storage, and more. Think XML, but smaller, faster, and simpler
  • The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.
  •  It supports many popular languages such as C++, C#, Dart, Go, Java and Python. Although there are still other not official add-ons, that support other languages, such as C

ProtoBuf History

  • Google developed Protocol Buffers for internal use and provided a code generator for multiple languages under an open-source license.
  • The design goals for Protocol Buffers emphasized simplicity and performance. In particular, the design is to be smaller and faster than XML.
  • Google uses Protocol buffers for storing and interchanging all kinds of structured information. The method serves as a basis for a custom remote procedure call (RPC) system in nearly all inter-machine communication at Google .
  • Protocol Buffers are similar to the Apache Thrift (used by Facebook, Evernote), Ion (created by Amazon), or Microsoft Bond protocols, offering as well a concrete RPC protocol stack to use for defined services called gRPC.

How Google uses Protobuf

  • Protocol buffers are Google’s lingua franca for structured data. RPC systems uses this for gRPC and its Google-internal predecessor Stubby, for persistent storage of data in a variety of storage systems, and in areas ranging from data analysis pipelines to mobile clients. Practically every project in Google uses protocol buffers.

But Why not XML?

  • But, why another language and serialization mechanism if we can use something already available like XML? The answer is performance.
  • Protobuf has many advantages for serialization that go beyond the capacity of XML. It allows you to create a simpler description than using XML. Even for small messages, when requiring multiple nested messages, reading XML starts to get difficult for human eyes.
  • Another advantage is the size, as the Protobuf format is simplified, the files can reach 10 times smaller compared to XML. But the great benefit is its speed, which can reach 100 times faster than the standard XML serialization, all due to its optimized mechanism. In addition to size and speed, Protobuf has a compiler capable of processing a .proto file to generate multiple supported languages. Unlike the traditional method where it is necessary to arrange the same structure in multiple source files.

Protobuf Vs Json

  • Protobuf is a binary data-interchange format develop by Google.Whereas JSON is the human-readable data-interchange format. JSON derives from JavaScript but as the name suggests, but not limited to JavaScript only. The design helps it to
  • Protobuf supports binary serialization format, whereas JSON is for simple text serialization format
  • JSON is useful for common tasks and limits itself to certain types of data. It means JSON cannot serialize and de-serialize every python object. Whereas Protobuf covers a wide variety of data types when compared to JSON. Protobuf can serialize enumerations and methods.
  • Both Protocol buffers and JSON are languages interoperable, but Protobuf are limited to subsets of programming language, whereas JSON is widely accepted.
  • JSON contains the only message and not schema, whereas Protobuf not only has messages but also includes a set of rules and schemas to define these messages.
  • Protobuf is mostly useful for internal services whereas JSON is mostly useful for web applications.
  • Prior knowledge of schema is essential in decoding Protobuf messages.While data decoding or parsing in JSON is fairly easily without knowing schemas in advance.
  • The following charts exposes the average performance, of these browsers, on 50 subsequent GET requests to both endpoints – the Protobuf and JSON endpoints. The 50 requests per endpoint were issued twice. First when running the Spring Boot application with compression turned on; and then when running the application with compression turned off. So, in the end, each browser requested 200 times all these 50 thousand people data.

Maven Dependency

  • To use protocol buffers is Java, we need to add a Maven dependency to a protobuf-java.
<!-- https://mvnrepository.com/artifact/com.google.protobuf/protobuf-java -->
<dependency>
    <groupId>com.google.protobuf</groupId>
    <artifactId>protobuf-java</artifactId>
    <version>4.0.0-rc-2</version>
</dependency>

Defining a Protocol

Let’s take an example and define a protocol in protobuf format.

syntax = "proto3";
package common;
option java_multiple_files = true;
option java_package = "com.knoldus.models";

message Address {
  required int32 postbox = 1;
  optional string street = 2;
  string city = 3;
}
  • This is a protocol of a simple message of Address type that has only three required fields.
  • When we want to define a field that is required – meaning that creating an object without such field will cause an Exception, we need to use a required keyword.
  • Creating a field with the optional keyword means that this field doesn’t need to be set. The repeated keyword is an array type of variable size.
  • All fields are indexed – the field that is denoted with number 1 will be saved as a first field in a binary file. Field marked with 2 will be saved next and so on. That gives us better control over how fields are laid out in the memory.

Generating Java Code From a Protobuf File

Once we define a file, we can generate code from it.

Firstly, we need to install protobuf on our machine. Once we do this, we can generate code by executing a protoc command:

protoc -I=. --java_out=. address.proto
  • The protoc command will generate Java output file from our address.proto file. The -I option specifies a directory in which a proto file resides. The java-out specifies a directory for storing generated class.
  • Generated class will have setters, getters, constructors and builders for our defined messages. It will also have some util methods for saving protobuf files and deserializing them from binary format to Java class.

Creating an Instance of Protobuf Defined Messages

We can easily use a generated code to create a Java instance of an Address class.

Address address =
        Address.newBuilder().setPostbox(123).setStreet("main street").setCity("Atlanta").build();
    assertEquals(address.getPostbox(), 123);
    assertEquals(address.getStreet(), "main street");
    assertEquals(address.getCity(), "Atlanta");
  • We can create a fluent builder by using a newBuilder() method on the desired message type. After setting up all required fields, we can call a build() method to create an instance of a Address class.

Code for Performance Test with Json

public class PerformanceTest {

  public static void main(String[] args) {
    // json

    JPerson person = new JPerson();
    person.setName("sam");
    person.setAge(10);
    ObjectMapper objectMapper = new ObjectMapper();
    Runnable json =
        () -> {
          try {
            byte[] bytes = objectMapper.writeValueAsBytes(person);
            System.out.println(bytes.length);
            JPerson person1 = objectMapper.readValue(bytes, JPerson.class);
          } catch (IOException e) {
            e.printStackTrace();
          }
        };

    // protobuf
    Person sam = Person.newBuilder().setName("sam").setAge(Int32Value.newBuilder().setValue(10).build()).build();

    Runnable proto =
        () -> {
          try {
            byte[] bytes = sam.toByteArray();
            System.out.println(bytes.length);
            Person sam1 = Person.parseFrom(bytes);
          } catch (InvalidProtocolBufferException e) {
            e.printStackTrace();
          }
        };
    for (int i = 0; i < 1; i++) {
      runPerformanceTest(json, "JSON");
      runPerformanceTest(proto, "PROTO");
    }
  }

  private static void runPerformanceTest(Runnable runnable, String method) {
    long time1 = System.currentTimeMillis();
    for (int i = 0; i < 1; i++) {
      runnable.run();
    }
    long time2 = System.currentTimeMillis();

    System.out.println(method + ":" + (time2 - time1) + "ms");
  }
}
  • Upon running the code, it would be clearly visible that protocol buf require very less time as compared to json and give high performance.

References


Knoldus-blog-footer-image

Written by 

Am a technology enthusiast having 3+ years of experience. I have worked on Core Java, Apache Flink, Apache Beam, AWS, GCP, Kafka, Spark, MySQL. I am curious about learning new technologies.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading