Iain McGinniss

Posted on Mar 25

Zero-copy protobuf and ConnectRPC for Rust

#protobuf #connectrpc #rust #anthropic

As part of my work at Anthropic, I open sourced two Rust crates that fill a gap in the RPC ecosystem: buffa, a pure-Rust Protocol Buffers implementation with first-class editions support and zero-copy message views, and connect-rust, a Tower-based ConnectRPC implementation that speaks Connect, gRPC, and gRPC-Web on the same handlers. We're nominating connect-rust as the canonical Rust implementation of ConnectRPC — if you're using Connect from Go, TypeScript, or Kotlin, this is intended to be the peer implementation for Rust. This code is already in production at Anthropic.

Both crates pass their full upstream conformance suites — Google's protobuf binary and JSON conformance for buffa, and all ~12,800 ConnectRPC server, client, and TLS tests for connect-rust — though as I'll cover later, a green conformance run turned out to be necessary but far from sufficient for production. They were built in six weeks with Claude Opus 4.6 doing most of the work under my direction — an experiment in specification-driven development for performance- and correctness-sensitive library code.

This post covers the Rust-specific design decisions: how protobuf editions map to codegen, why zero-copy views need an OwnedView escape hatch, the type-level choices for mapping protobuf's semantics onto Rust, and what the conformance suites didn't catch. A separate post on the AI-assisted development process will follow.

Why another protobuf crate?

The short answer: editions, and leaning into the specific capabilities of Rust.

The schism caused by proto2/proto3 semantic divergence is being healed by switching to a feature-flag-driven approach to the wire format, defined by editions. Each edition specifies a default feature set. Messages defined in files from older editions (e.g. proto2) can be used from newer editions. If you are defining new message types, these details are mostly irrelevant, but if you are porting legacy systems from the proto2 era, this is likely to make your migration significantly easier.

The Rust ecosystem hasn't caught up. Prost is the de facto standard, and it's excellent at what it does — but it targets binary proto3, with JSON bolted on via pbjson, and the library is now only passively maintained. Google's official Rust implementation (protobuf v4) supports editions but is built around upb, so it needs a C compiler and there is not yet an RPC layer implementation above it.

Buffa treats editions as the core abstraction, and is also designed to work well with the current best available tooling: buf CLI for language-agnostic code generation (though protoc is of course also supported), a buffa-build crate for build.rs integration for those who prefer cargo-oriented build pipelines, and careful definition of crate features and generated code that allow the library to be used in no_std, or to select the features that matter to your use case (e.g. excluding JSON support).

Zero-copy message views

Rust provides an interesting opportunity that does not exist for implementations in other languages: we can support message "views" where data does not need to be copied from an input buffer to be used, reducing allocation cost.

The need for this wasn't purely speculative. In an early prototype of connect-rust that used prost, profiling showed that per-field String allocation and HashMap construction for map fields significantly contributed to allocator pressure. For string and bytes fields, copying data is avoidable and safe with Rust's borrow checker, referencing the content directly in the input buffer.

Buffa generates two types per message: MyMessage (owned, heap-allocated, similar to what you'd expect in most implementations) and MyMessageView<'a> (borrows directly from the wire buffer). The view type's string fields are &'a str, its bytes fields are &'a [u8], and its map fields are a flat Vec<(K, V)> scan — no hashing on the decode path.

// Owned decode - allocates per string field
let msg = LogRecord::decode_from_slice(&bytes)?;
println!("{}", msg.message); // String

// View decode - zero-copy
let view = LogRecordView::decode_view(&bytes)?;
println!("{}", view.message); // &str, borrowed from `bytes`

The catch with views is correctly handling lifetimes. A FooView<'a> can't cross an .await point if the buffer it borrows from doesn't live long enough — which is exactly the situation in an async RPC handler. OwnedView<V> solves this by bundling a view with its backing Bytes buffer:

// 'static + Send, still zero-copy
let owned = OwnedView::<LogRecordView>::decode(bytes.into())?;
tokio::spawn(async move {
    println!("{}", owned.message); // &str, borrowed from the owned Bytes
});

This is what connect-rust provides to service handlers. On a decode-heavy workload — 50 structured log records per request, ~22 KB batches with varints, strings, nested messages, and map entries — it's about 33% faster than tonic+prost at high concurrency, with allocator pressure at 3.6% of CPU versus 9.6%.

Configurable safety controls

There are some aspects of protobuf that can be unsafe or enable attacks when used in an RPC framework that deserve special consideration. Depending on your use case and environment, it is useful to be able to tune the safety controls around these issues.

Buffa provides a DecodeOptions type to control both recursion limits and message size. Prost enforces a fixed recursion limit of 100 nested messages; buffa uses the same default, but allows for overriding this via with_recursion_limit(n) to a smaller or larger value as needed. For message length, Prost does not apply a limit (this is handled within Tonic for RPC considerations), while buffa provides control at the protobuf level, with a default that matches the protobuf spec (2 GiB). The connect-rust library applies a 4 MiB default limit for messages and HTTP bodies that is more typical for HTTP servers.

Rust String / &str values must be valid UTF-8, whereas proto2 strings do not have this restriction and later editions provide an opt-out for UTF-8 verification. Regardless, the natural user expectation is that a string field should be a String in the Rust struct, so buffa chooses to perform UTF-8 validation for all strings by default. The library also provides an opt-out that changes string fields with utf8_validation = NONE (all proto2 strings by default, or editions fields that explicitly opt out) to Vec<u8> / &[u8] instead, allowing validation during decode to be bypassed without misleading the user as to the safety of the content. The user can then call from_utf8 or from_utf8_unchecked as they deem fit, taking responsibility for the decision.

Ergonomics

Protobuf makes some very opinionated choices around message semantics, which can be quite different from the typical behavior of primitive data types in most languages. Two examples of this semantic mismatch that require careful resolution in Rust are optional message fields and enums.

Message fields have default value semantics, that combined with recursive message types, can be difficult to represent cleanly. Prost uses Option<M> or Option<Box<M>> for optional message fields, depending on whether the message type is recursive. This results in some awkward code when attempting to dereference or assign to those fields:

let name = msg.address.as_ref().unwrap().street.as_str();

msg.address = Some(Address {
    street: "123 Main St".into(),
    ..Default::default()
});

Buffa defines MessageField<T> that handles all message fields, and this provides Deref and From trait implementations. This produces more natural field interaction:

let name = &msg.address.street;

msg.address = Address {
    street: "123 Main St".into(),
    ..Default::default()
}.into();

Protobuf enums in the current editions are "open", due to the possibility of unknown enum values from future evolutions of the enum definition. Prost uses raw i32 for enum values; for buffa we define EnumValue<T> as a proper Rust enum, while preserving unknown values for round-trip fidelity:

use buffa::EnumValue;

pub struct Contact {
    pub phone_type: EnumValue<PhoneType>,
    // ...
}

// Match directly - the type carries the known/unknown distinction:
match contact.phone_type {
    EnumValue::Known(PhoneType::MOBILE) => { /* ... */ }
    EnumValue::Known(PhoneType::HOME) => { /* ... */ }
    EnumValue::Known(PhoneType::WORK) => { /* ... */ }
    EnumValue::Unknown(v) => { /* v is the raw i32 from the wire */ }
}

// Or compare directly (PartialEq<E> is implemented):
if contact.phone_type == PhoneType::MOBILE { /* ... */ }

For closed enums (from proto2), fields are directly the enum type, with no middle EnumValue<T> layer.

Supporting `no_std`

The core runtime is no_std + alloc, with optional JSON serialization via serde. Enabling std adds std::io integration and std::time conversions, but the wire format, views, and JSON all work without it. Rust is well suited to embedded systems and constrained environments, and I believe that protobufs can also be beneficial in such scenarios. The encoding is efficient, and makes it easier for these systems to integrate with the broader ecosystem. While we have not yet pushed this to the logical conclusion of a partial ConnectRPC implementation that works with embassy, reqwless, and/or picoserve, the door is open for others to implement this.

There are some small ergonomic consequences when using no_std — the JsonParseOptions that are normally scoped via a thread-local for deserialization (as serde provides no mechanism to provide a deserialization context for the entire operation) are instead a global OnceBox. This is usually fine, as most applications do not vary the parse options over the lifetime of the process, but it is a loss of flexibility compared to std.

connect-rust: the RPC layer

Connect-rust is a Tower-based implementation of the ConnectRPC protocol, including support for handling gRPC and gRPC-Web requests, and JSON/binary encoded messages, all from the same handler, as the ConnectRPC specification intends. Unary and all three streaming RPC types (client streaming, server streaming, and bidirectional) are supported for both clients and servers. The client transports can use HTTP/1.1 and HTTP/2, with or without TLS as appropriate.

The architecture is straightforward: codegen emits a monomorphic FooServiceServer<T> per service, with a compile-time match on the method name. No Arc<dyn Handler> vtable or per-request allocation is required for dispatch. It drops into any Tower-compatible HTTP framework like Axum, or you can use the built-in standalone server that uses hyper directly:

impl GreetService for MyGreetService {
    async fn greet(
        &self,
        ctx: Context,
        request: OwnedView<GreetRequestView<'static>>,
    ) -> Result<(GreetResponse, Context), ConnectError> {
        let response = GreetResponse {
            greeting: format!("Hello, {}!", request.name),
            ..Default::default()
        };
        Ok((response, ctx))
    }
}

let service = Arc::new(MyGreetService);
let router = service.register(Router::new());
Server::new(router).serve("127.0.0.1:8080".parse()?).await?;

There are some known ergonomics issues here: I prioritized shipping a release for feedback over attempting to achieve perfection for a 0.x release. Threading the context in and out of the handler (returning Ok((response, ctx))) is awkward, and the request type OwnedView<ReqView<'static>> is overly explicit. This will likely change to ConnectRequest<Req> and ConnectResponse<Resp> types in a future release, where the request context and response options are separated and the lifetime is implicit.

Client code for interacting with services is also what you would expect:

let http = HttpClient::plaintext();
let config = ClientConfig::new("http://localhost:8080".parse()?);
let client = GreetServiceClient::new(http, config);

let response = client.greet(GreetRequest {
    name: "World".into(),
    ..Default::default()
}).await?;

It is worth noting one small security ergonomics decision here: the transport constructors have no bare new(), instead one must explicitly choose between plaintext() or with_tls(config), and these enforce the appropriate URL scheme (http and https respectively). This is an intentional choice to make the decision to use plaintext explicit and consequential; obfuscating this detail in options for new() is how security incidents are born.

What conformance tests failed to catch

Both crates passed the full conformance suites for protobuf and ConnectRPC weeks before I would have called them ready for consumption. Conformance exercises protocol correctness. It does not exercise adversarial resource bounds — nobody writes a conformance test that sends you a gzip bomb.

Four real issues made it past green conformance, surfaced during security review:

The server enforced a size limit on incoming request bodies; the client did not, calling .collect().await on whatever the server sent back. The safe pattern had been applied asymmetrically.
CompressionProvider::decompress_with_limit had a default implementation that decompressed fully and checked the size afterwards. The gzip/zstd implementations overrode this behavior correctly, but a custom provider using the default would be vulnerable to decompression bombs.
The TLS handshake had no timeout. A client that connects but never sends a ClientHello would hold the connection forever.
grpc-timeout: 18446744073709551615S parsed to Duration::from_secs(u64::MAX), which panics when added to Instant::now(). The code had a comment saying the spec limits this to 8 digits. The code did not match the comment.

These were all fixed, but the themes generalize past this project: asymmetric client/server defenses, unsafe trait defaults inherited by custom impls, parse-site leniency trusted at the use-site, comments that claim enforcement without enforcing. If you're building an RPC crate, that's a decent checklist.

Where the spec runs out

The protobuf spec carefully defines what happens when an unknown value arrives for a closed enum in a singular field, a repeated field, and a map value — but says nothing about a closed enum inside a oneof. Java treats it like the singular case. Go doesn't implement closed-enum semantics at all and still passes conformance, because conformance doesn't test closed enums. For buffa, we chose to follow Java's precedent.

Similarly, the spec doesn't say whether overflow bits in the 10th byte of a varint should be rejected or silently discarded. C++ and prost discard, whereas for buffa we reject varints with these bits set. Both are defensible choices, but neither is tested or treated preferentially by the conformance tests. Claude did a fantastic job of finding these issues, but only when specifically prompted to compare the end product of spec x tests x code to find possible gaps and inconsistencies relative to other gold standard implementations.

Performance

I want to be careful here, as benchmark numbers are the part most likely to be misread. Connect-rust is not meaningfully faster than tonic for real services. In realistic workloads, like a handler that interacts with a database or upstream services, the optimizations in buffa and connect-rust increase throughput by around 4%. On decode-heavy workloads where buffa's views pay off, it's further ahead: 33% more throughput at high concurrency on the log-ingest benchmark.

What actually moves the needle:

Zero-copy views. Allocator pressure is 3.6% of server CPU versus 9.6% for tonic+prost on string-heavy payloads.
Monomorphic dispatch. Compile-time match beats dyn-dispatch by a small but real margin when there's nothing else in the request path.
Connect framing. On unary RPCs, Connect's protocol is genuinely cheaper than gRPC — no envelope header, no trailing HEADERS frame. At 200k+ req/s, gRPC's trailer is ~200k extra h2 HEADERS encodes per second. The gap is ~5% at low concurrency, ~23% at c=256.

The buffa and connect-rust repositories contain the benchmark code and result snapshots — as always, take synthetic benchmarks with a grain of salt. More performance optimizations are possible in the future, but the gains are likely marginal for all but the most performance-focused and tuned services.

The future

I hope you will try buffa and connect-rust, and provide feedback! While I have tried to make the code readable, ergonomic, and correct, there will inevitably be issues with something as complex as a full protobuf and ConnectRPC implementation primarily built by AI in 6 weeks. I am committed to improving the library, to show that such AI-assisted development can be both fast and high quality.

There are also features we have yet to add, but plan to work on soon:

Message extensions — this is a necessary feature to implement many plugins and middleware, like protovalidate.
Reflection — handling unknown message types via runtime provided descriptors is commonly used as part of implementing middleware and plugins.
Textproto and protoyaml — while I had initially decided to not bother supporting textproto as it is fairly old, I've become convinced that it is a useful addition to help facilitate migrations of proto2-era C/C++ services that may still depend on this. Similarly, YAML is a de facto standard for configuration files, and I'd love to be able to support that with an IDL and protovalidate to enforce correctness.

There are likely many other features that you might want in these implementations — please let us know by opening issues on the repositories, and comment on the ConnectRPC RFC!

Top comments (1)

Allwell • Mar 27

Fantastic stuff, Iain. Thank you for building and sharing.

Below is a summary for readers on the big picture - why this is important.

The editions gap is now filled. The Rust ecosystem had no actively maintained protobuf library supporting the modern editions feature. Buffa fills that hole.
Rust now has a proper ConnectRPC peer. If your Go or TypeScript team uses ConnectRPC, you can now write Rust services that talk to them with full protocol compatibility and an endorsed implementation.
It demonstrates Rust-specific design. Most protobuf implementations in other languages are fairly similar. Buffa shows that Rust's ownership system enables fundamentally different (and more efficient) design choices that simply aren't possible in Go, Java, or Python.

A side-note worth mentioning is, Iain said, the two Rust crates were built in 6 weeks with Claude Opus 4.6 doing most of the implementation under their direction. That's a statement on AI-assisted library development.