Let's learn about protocol buffers

#node #python #tutorial #webdev

This post is a mirror of a post I wrote on my own blog.

Let's learn about Protocol Buffers

Protocol Buffers or “Protobufs” is a term often thrown around the rooms of big tech companies when designing application systems. Application systems can contain hundreds of thousands of machines all communicating with each other. At that scale, many companies try to optimize in any way possible—Protocol Buffers is tool you can use to send data between your applications at high speeds.

In this article, I’ll be shedding some light on protocol buffers and showing you how to use it!

Protobufs are often paired with gRPCs (Remote Procedure Calls), which are a topic of its own. I’ll try to cover it in a few weeks.

The Gist

Protobufs is an interface definition language and communication protocol used to build applications and transport data between them. Protobufs accomplishes this by enforcing a common data structure in the sections of code where data will be transmitted between applications. These data structures are defined in .proto files. A commandline tool, protoc, uses those .proto files to generate class files that are used to write your applications.

These classes come with a few helper functions that can convert data defined in a class to binaries--which then is used to transmit data between two servers.

Protobufs can be compared to JSON, the two differences are:

You need to pre-define how your structure looks like in .proto files
The data stored in protobufs are modified by helper functions provided by the autogenerated classes from those .proto files

Any time you transmit JSON between two servers; you could replace that with a protobuf binary instead. Sending data via protobuf binaries can offer performance improvements in faster download times between 4 to 78% depending on the situation (I discuss more in Tradeoffs and Benefits).

In my mind, there are two processes when developing with protobufs: the development process and the implementation process. The development process deals with creating and managing protobufs. The implementation process is the use of protobuf classes to build our applications/servers/services.

Let's look at these processes by example. Let's say we're developing an application that returns us a list of customers our company has.

Our development process looks like the following:

A developer writes some data structures called CustomerList and Customer in a customerlist.proto file
A command line tool that comes with the protobuf library, called protoc, reads .proto files and generates classes in the programming langauge of the developer's choice.
The developer commits the .proto and generated code into their codebase
If any changes are needed to that datastructure, we start again at step one.

The generated code in our case is the classes CustomerList and Customer. We can now use these classes to build out application.

When the time comes to send data between two systems, we can invoke a helper function that's attached to these classes to convert our Class data into a string. An invoked REST/gRPC/etc call passes this data to another service. Our listener on our other service can then use the same classes to deserialize the string back into language readable data.

Implementing protobufs

Let’s build a system that transports a list of customers from our python application server to a Node.js webserver and shows us that list on a table.

This application is a bit complicated, so I have provided a Github link below for you to follow along:

4shub / protobufs-example

The file structure of our application should look like the following:

// @language-override:Our folder
application_root
|_src
   |_ generated
   |_ protos

First let’s build a customerlist.proto in src/protos:

// @language-override:proto3
syntax = "proto3";

message Customer {
  required string name = 1;
  required int32 id = 2;
  required string email = 3; 
  required bool isNewCustomer = 4;
}

message CustomerList {
  repeated Customer customer = 1;
}

Above I created our data structure following the proto3 language.

Then we need to run following command in our application root:

// @language-override:Terminal
protoc --python_out=src/generated --js_out=import_style=commonjs,binary:src/generated src/protos/customerlist.proto -I src/protos

This command will generate our classes in files named customerlist_pb.py and customerlist_pb.js in a folder called generated.

Now let’s build our python server

# @language-override:Python + Flask
import flask
from generated import customerlist_pb2

app = flask.Flask(__name__)

# creating our "database"
customer1 = customerlist_pb2.Customer(name='Shubham', id=0, email='shub@shub.club')
customer2 = customerlist_pb2.Customer(name='Rui', id=1, email='rui@too.com', isNewCustomer=True)

customer_list = customerlist_pb2.CustomerList()
customer_list.customer.append(customer1)
customer_list.customer.append(customer2)


@app.route('/customer-list')
def get_customer_list():
    # `SerializeToString` is a helper function that serializes customer_list to a binary format
    return customer_list.SerializeToString()

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=3001)

In the code above, I instantiate the class CustomerList and populate it with some customer data. Then I convert that data into a protobuf binary and pass it anyone who requests /customer-list.

Our node server will act as our receiving server, it will host a html page that would contain a button that requests us the customer list stored on the python server. The node.js server will make the request on behalf of the client to get that data.

// @language-override:Node.js + Express
const path = require('path');
const axios = require('axios');
const express = require('express');
const app = express();
const port = 3000;

const { CustomerList } = require('./generated/customerlist_pb');
const PYTHON_SERVER_URL = 'http://localhost:3001';

app.get('/customers', async (req, res) => {
    try {
        const binaryData = await axios.get(`${PYTHON_SERVER_URL}/customer-list`);

        // convert string to base64 to be read by `deserializeBinary`
        const base64data = Buffer.from(binaryData.data).toString('base64')

        const customerList = CustomerList.deserializeBinary(base64data)

        // convert to json
        res.send(customerList.toObject());
    } catch (e) {
        console.log(e)
        res.send(404);
    }
});

app.get('/', (req, res) => res.sendFile(path.join(__dirname, './index.html')));

app.listen(port, () => console.log(`Example app listening at http://localhost:${port}`))

We see CustomerList's helper function deserializeBinary converting our binary string into a workable CustomerList class object. We use toObject to convert our class data into a JSON. We finally pass the JSON to the client.

Tradeoffs and Benefits

Not everything you build requires protobufs!

Sometimes it’s easier and more efficient to not deal with sophisticated methods over sending data. In a study by Auth0 [0], where they compared JSON vs protobuf binary performance, Protobufs significantly improved data transmission rates from java server to java server communication (78% download time reduction), while java server to client communication had only a 4% download time reduction.

Auth0 also did a second test from a java server to the client in an “uncompressed” environment. Download time was improved by 21%. Using this information, if your goal is just to enhance performance, it's much better just to compress your JSON data and forget implementing protobufs.

Outside optimizations, protobufs provides a method of documenting and enforcing a data structure. This is super useful with keeping data consistent across multiple programming languages and multiple teams.

What do tradeoffs and benefits mean for you, the developer? It means that sometimes a tool you could use in one part of your application system might not be useful elsewhere. Or it could mean that maybe the additional development time to enforce protobufs on your whole application is worth it. In the end, it's up to you as a developer to see if a solution is viable for your product or use-case.

Conclusion

Building an application ecosystem can be daunting, but with protobufs in your toolkit you can optimize your networking capacity to its full potential. Companies like Square, Google and Netflix use it every day in their systems. Maybe you can try and build something cool with it too. As always, let me know what you’ve built with protobufs.

[0] https://auth0.com/blog/beating-json-performance-with-protobuf/