Protocol Buffers, for those needing a quick refresher, is a language-agnostic binary format developed by Google. The format is leveraged to structure and store data, and is quite popular due to its compact size and fast serialization/deserialization properties.
In our day-to-day work, we use protobufs to define message structures and services, and the protobuf compiler protoc
generates data access classes for us in our language of choice, which in our case is Python.
Protobuf Schema Definition (.proto files)
Let's start by writing a .proto file, which will be the blueprint for our data.
syntax = "proto3";
package tutorial;
message Person {
string name = 1;
int32 id = 2;
string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
string number = 1;
PhoneType type = 2;
}
repeated PhoneNumber phones = 4;
}
In the example above, Person
is a message type, with various fields like name
, id
, and email
. Notice the number assignment to the fields, these numbers are used to identify your fields in the message binary format, they won't be reassigned.
We've also defined an enum PhoneType
and a nested message PhoneNumber
within Person
, which is then used in the repeated field phones
(equivalent to a list in Python).
Generating Python Code
Once we have our .proto file, we can generate Python code using the protoc
compiler.
protoc --python_out=. ./person.proto
This generates a person_pb2.py
file which includes Python classes for our defined messages, enums, and services.
Serializing and Deserializing Data
Next, let's create, serialize, and then deserialize a Person
message in Python.
import person_pb2
# Create a new Person
person = person_pb2.Person()
person.name = "John Doe"
person.id = 1234
person.email = "jdoe@example.com"
phone = person.phones.add()
phone.number = "555-4321"
phone.type = person_pb2.Person.HOME
# Serialize to a byte string
person_bytes = person.SerializeToString()
# Deserialize to a new person
new_person = person_pb2.Person()
new_person.ParseFromString(person_bytes)
Here, we've created an instance of Person
, filled in its fields, then serialized it to a byte string using SerializeToString()
. Then, we've taken that byte string and deserialized it back into a Person
object using ParseFromString()
.
Reading and Writing to Files
You can also write these byte strings directly to files or read them from files:
# Write to a file
with open("person.bin", "wb") as fd:
fd.write(person_bytes)
# Read from a file
with open("person.bin", "rb") as fd:
new_person = person_pb2.Person()
new_person.ParseFromString(fd.read())
Optional Fields and Default Values
In protobuf 3, all fields are optional, and fields not set take their default values. For scalar types (like int32, bool, string), default values are the "zero value" for the type - 0 for integers, false for bools, and empty string for strings. Enum fields have a default value of the first value listed in the enum's type definition.
Here's an example:
syntax = "proto3";
message Weather {
string location = 1;
int32 temperature = 2;
bool is_raining = 3;
}
In Python:
import weather_pb2
weather = weather_pb2.Weather()
weather.location = "San Francisco"
print(weather.temperature) # prints: 0
print(weather.is_raining) # prints: False
In the example, we didn't set temperature
or is_raining
, but they still have default values.
Upgradeability
Protobuf shines when it comes to upgradeability, i.e., the ability to evolve your data structures over time without breaking existing systems. Essentially, you can add new fields to your messages or stop using old ones, and your protobuf data will remain backwards and forwards compatible.
Here's a simple example of a Person
message:
Version 1
Let's say we start with this message definition:
syntax = "proto3";
message Person {
string name = 1;
}
And in our Python code, we create a Person
and serialize it:
import person_pb2
person = person_pb2.Person()
person.name = "John Doe"
person_bytes = person.SerializeToString()
Version 2
Then, at a later time, we update our message to include a new field age
:
syntax = "proto3";
message Person {
string name = 1;
int32 age = 2; // New field
}
Even though our Python code doesn't know about the age
field yet, it can still deserialize the Person
bytes:
old_person = person_pb2.Person()
old_person.ParseFromString(person_bytes)
print(old_person.name) # prints: John Doe
This example illustrates backward compatibility, where newer message versions can be read by older software. The older version simply ignores fields it doesn't recognize.
Version 3
Let's say we then decide to remove the age
field:
syntax = "proto3";
message Person {
string name = 1; // 'age' field removed
}
And we serialize a new Person
object:
new_person = person_pb2.Person()
new_person.name = "Jane Doe"
new_person_bytes = new_person.SerializeToString()
The Python code from version 1 (which expects the name
field and doesn't know about age
) can still read this:
v1_person = person_pb2.Person()
v1_person.ParseFromString(new_person_bytes)
print(v1_person.name) # prints: Jane Doe
This illustrates forward compatibility, where older message versions can be read by newer software.
Key takeaway: Whenever you change a message type by adding or removing fields, be mindful not to reuse the old field numbers. This will ensure that your updated code maintains compatibility with older versions.
Remote Procedure Calls (RPCs) and Services in Protobuf
In addition to defining message types, Protobuf also allows us to define services, which specify a set of RPCs (Remote Procedure Calls) that can be implemented on a server and called on a client. This makes Protobuf a great choice for designing the interface of microservices.
Here's how we could define a simple service:
syntax = "proto3";
message HelloRequest {
string greeting = 1;
}
message HelloResponse {
string reply = 1;
}
service Greeter {
rpc SayHello (HelloRequest) returns (HelloResponse);
}
In this .proto
file, we define a Greeter
service with a single RPC SayHello
. This RPC takes a HelloRequest
message and returns a HelloResponse
message.
To generate service interfaces and bindings for Python, we need to use the grpc_tools
package, which is an extension of protoc
with GRPC-specific features.
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. ./greeter.proto
This will generate a greeter_pb2.py
file (as before), but also a greeter_pb2_grpc.py
file, which includes GRPC-specific code.
Now we can implement this service on the server side:
import grpc
from concurrent import futures
import greeter_pb2
import greeter_pb2_grpc
class Greeter(greeter_pb2_grpc.GreeterServicer):
def SayHello(self, request, context):
return greeter_pb2.HelloResponse(reply='Hello, ' + request.greeting)
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
greeter_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
server.add_insecure_port('[::]:50051')
server.start()
# keep running...
And call it from the client side:
import grpc
import greeter_pb2
import greeter_pb2_grpc
channel = grpc.insecure_channel('localhost:50051')
stub = greeter_pb2_grpc.GreeterStub(channel)
response = stub.SayHello(greeter_pb2.HelloRequest(greeting='world'))
print(response.reply) # prints: Hello, world
This way, we can define both the structure of our data and the interface of our services using Protobuf, keeping our microservices clean, consistent, and easy to work with.
Conclusion
You can refer to the Protocol Buffers Language Guide for more detailed information.
Top comments (0)