DEV Community

Madhav Jha
Madhav Jha

Posted on

Protobuf Quick Refresher

Protocol Buffers, for those needing a quick refresher, is a language-agnostic binary format developed by Google. The format is leveraged to structure and store data, and is quite popular due to its compact size and fast serialization/deserialization properties.

In our day-to-day work, we use protobufs to define message structures and services, and the protobuf compiler protoc generates data access classes for us in our language of choice, which in our case is Python.

Protobuf Schema Definition (.proto files)

Let's start by writing a .proto file, which will be the blueprint for our data.

syntax = "proto3";

package tutorial;

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    string number = 1;
    PhoneType type = 2;
  }

  repeated PhoneNumber phones = 4;
}
Enter fullscreen mode Exit fullscreen mode

In the example above, Person is a message type, with various fields like name, id, and email. Notice the number assignment to the fields, these numbers are used to identify your fields in the message binary format, they won't be reassigned.

We've also defined an enum PhoneType and a nested message PhoneNumber within Person, which is then used in the repeated field phones (equivalent to a list in Python).

Generating Python Code

Once we have our .proto file, we can generate Python code using the protoc compiler.

protoc --python_out=. ./person.proto
Enter fullscreen mode Exit fullscreen mode

This generates a person_pb2.py file which includes Python classes for our defined messages, enums, and services.

Serializing and Deserializing Data

Next, let's create, serialize, and then deserialize a Person message in Python.

import person_pb2

# Create a new Person
person = person_pb2.Person()
person.name = "John Doe"
person.id = 1234
person.email = "jdoe@example.com"

phone = person.phones.add()
phone.number = "555-4321"
phone.type = person_pb2.Person.HOME

# Serialize to a byte string
person_bytes = person.SerializeToString()

# Deserialize to a new person
new_person = person_pb2.Person()
new_person.ParseFromString(person_bytes)
Enter fullscreen mode Exit fullscreen mode

Here, we've created an instance of Person, filled in its fields, then serialized it to a byte string using SerializeToString(). Then, we've taken that byte string and deserialized it back into a Person object using ParseFromString().

Reading and Writing to Files

You can also write these byte strings directly to files or read them from files:

# Write to a file
with open("person.bin", "wb") as fd:
    fd.write(person_bytes)

# Read from a file
with open("person.bin", "rb") as fd:
    new_person = person_pb2.Person()
    new_person.ParseFromString(fd.read())
Enter fullscreen mode Exit fullscreen mode

Optional Fields and Default Values

In protobuf 3, all fields are optional, and fields not set take their default values. For scalar types (like int32, bool, string), default values are the "zero value" for the type - 0 for integers, false for bools, and empty string for strings. Enum fields have a default value of the first value listed in the enum's type definition.

Here's an example:

syntax = "proto3";

message Weather {
  string location = 1;
  int32 temperature = 2;
  bool is_raining = 3;
}
Enter fullscreen mode Exit fullscreen mode

In Python:

import weather_pb2

weather = weather_pb2.Weather()
weather.location = "San Francisco"

print(weather.temperature)  # prints: 0
print(weather.is_raining)  # prints: False
Enter fullscreen mode Exit fullscreen mode

In the example, we didn't set temperature or is_raining, but they still have default values.

Upgradeability

Protobuf shines when it comes to upgradeability, i.e., the ability to evolve your data structures over time without breaking existing systems. Essentially, you can add new fields to your messages or stop using old ones, and your protobuf data will remain backwards and forwards compatible.

Here's a simple example of a Person message:

Version 1

Let's say we start with this message definition:

syntax = "proto3";

message Person {
  string name = 1;
}
Enter fullscreen mode Exit fullscreen mode

And in our Python code, we create a Person and serialize it:

import person_pb2

person = person_pb2.Person()
person.name = "John Doe"
person_bytes = person.SerializeToString()
Enter fullscreen mode Exit fullscreen mode

Version 2

Then, at a later time, we update our message to include a new field age:

syntax = "proto3";

message Person {
  string name = 1;
  int32 age = 2;  // New field
}
Enter fullscreen mode Exit fullscreen mode

Even though our Python code doesn't know about the age field yet, it can still deserialize the Person bytes:

old_person = person_pb2.Person()
old_person.ParseFromString(person_bytes)
print(old_person.name)  # prints: John Doe
Enter fullscreen mode Exit fullscreen mode

This example illustrates backward compatibility, where newer message versions can be read by older software. The older version simply ignores fields it doesn't recognize.

Version 3

Let's say we then decide to remove the age field:

syntax = "proto3";

message Person {
  string name = 1;  // 'age' field removed
}
Enter fullscreen mode Exit fullscreen mode

And we serialize a new Person object:

new_person = person_pb2.Person()
new_person.name = "Jane Doe"
new_person_bytes = new_person.SerializeToString()
Enter fullscreen mode Exit fullscreen mode

The Python code from version 1 (which expects the name field and doesn't know about age) can still read this:

v1_person = person_pb2.Person()
v1_person.ParseFromString(new_person_bytes)
print(v1_person.name)  # prints: Jane Doe
Enter fullscreen mode Exit fullscreen mode

This illustrates forward compatibility, where older message versions can be read by newer software.

Key takeaway: Whenever you change a message type by adding or removing fields, be mindful not to reuse the old field numbers. This will ensure that your updated code maintains compatibility with older versions.

Remote Procedure Calls (RPCs) and Services in Protobuf

In addition to defining message types, Protobuf also allows us to define services, which specify a set of RPCs (Remote Procedure Calls) that can be implemented on a server and called on a client. This makes Protobuf a great choice for designing the interface of microservices.

Here's how we could define a simple service:

syntax = "proto3";

message HelloRequest {
  string greeting = 1;
}

message HelloResponse {
  string reply = 1;
}

service Greeter {
  rpc SayHello (HelloRequest) returns (HelloResponse);
}
Enter fullscreen mode Exit fullscreen mode

In this .proto file, we define a Greeter service with a single RPC SayHello. This RPC takes a HelloRequest message and returns a HelloResponse message.

To generate service interfaces and bindings for Python, we need to use the grpc_tools package, which is an extension of protoc with GRPC-specific features.

python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. ./greeter.proto
Enter fullscreen mode Exit fullscreen mode

This will generate a greeter_pb2.py file (as before), but also a greeter_pb2_grpc.py file, which includes GRPC-specific code.

Now we can implement this service on the server side:

import grpc
from concurrent import futures
import greeter_pb2
import greeter_pb2_grpc

class Greeter(greeter_pb2_grpc.GreeterServicer):
    def SayHello(self, request, context):
        return greeter_pb2.HelloResponse(reply='Hello, ' + request.greeting)

server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
greeter_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
server.add_insecure_port('[::]:50051')
server.start()
# keep running...
Enter fullscreen mode Exit fullscreen mode

And call it from the client side:

import grpc
import greeter_pb2
import greeter_pb2_grpc

channel = grpc.insecure_channel('localhost:50051')
stub = greeter_pb2_grpc.GreeterStub(channel)
response = stub.SayHello(greeter_pb2.HelloRequest(greeting='world'))
print(response.reply)  # prints: Hello, world
Enter fullscreen mode Exit fullscreen mode

This way, we can define both the structure of our data and the interface of our services using Protobuf, keeping our microservices clean, consistent, and easy to work with.

Conclusion

You can refer to the Protocol Buffers Language Guide for more detailed information.

Top comments (0)