DEV Community: Iain McGinniss

Zero-copy protobuf and ConnectRPC for Rust

Iain McGinniss — Wed, 25 Mar 2026 20:19:53 +0000

As part of my work at Anthropic, I open sourced two Rust crates that fill a gap in the RPC ecosystem: buffa, a pure-Rust Protocol Buffers implementation with first-class editions support and zero-copy message views, and connect-rust, a Tower-based ConnectRPC implementation that speaks Connect, gRPC, and gRPC-Web on the same handlers. We're nominating connect-rust as the canonical Rust implementation of ConnectRPC — if you're using Connect from Go, TypeScript, or Kotlin, this is intended to be the peer implementation for Rust. This code is already in production at Anthropic.

Both crates pass their full upstream conformance suites — Google's protobuf binary and JSON conformance for buffa, and all ~12,800 ConnectRPC server, client, and TLS tests for connect-rust — though as I'll cover later, a green conformance run turned out to be necessary but far from sufficient for production. They were built in six weeks with Claude Opus 4.6 doing most of the work under my direction — an experiment in specification-driven development for performance- and correctness-sensitive library code.

This post covers the Rust-specific design decisions: how protobuf editions map to codegen, why zero-copy views need an OwnedView escape hatch, the type-level choices for mapping protobuf's semantics onto Rust, and what the conformance suites didn't catch. A separate post on the AI-assisted development process will follow.

Why another protobuf crate?

The short answer: editions, and leaning into the specific capabilities of Rust.

The schism caused by proto2/proto3 semantic divergence is being healed by switching to a feature-flag-driven approach to the wire format, defined by editions. Each edition specifies a default feature set. Messages defined in files from older editions (e.g. proto2) can be used from newer editions. If you are defining new message types, these details are mostly irrelevant, but if you are porting legacy systems from the proto2 era, this is likely to make your migration significantly easier.

The Rust ecosystem hasn't caught up. Prost is the de facto standard, and it's excellent at what it does — but it targets binary proto3, with JSON bolted on via pbjson, and the library is now only passively maintained. Google's official Rust implementation (protobuf v4) supports editions but is built around upb, so it needs a C compiler and there is not yet an RPC layer implementation above it.

Buffa treats editions as the core abstraction, and is also designed to work well with the current best available tooling: buf CLI for language-agnostic code generation (though protoc is of course also supported), a buffa-build crate for build.rs integration for those who prefer cargo-oriented build pipelines, and careful definition of crate features and generated code that allow the library to be used in no_std, or to select the features that matter to your use case (e.g. excluding JSON support).

Zero-copy message views

Rust provides an interesting opportunity that does not exist for implementations in other languages: we can support message "views" where data does not need to be copied from an input buffer to be used, reducing allocation cost.

The need for this wasn't purely speculative. In an early prototype of connect-rust that used prost, profiling showed that per-field String allocation and HashMap construction for map fields significantly contributed to allocator pressure. For string and bytes fields, copying data is avoidable and safe with Rust's borrow checker, referencing the content directly in the input buffer.

Buffa generates two types per message: MyMessage (owned, heap-allocated, similar to what you'd expect in most implementations) and MyMessageView<'a> (borrows directly from the wire buffer). The view type's string fields are &'a str, its bytes fields are &'a [u8], and its map fields are a flat Vec<(K, V)> scan — no hashing on the decode path.

// Owned decode - allocates per string field
let msg = LogRecord::decode_from_slice(&bytes)?;
println!("{}", msg.message); // String

// View decode - zero-copy
let view = LogRecordView::decode_view(&bytes)?;
println!("{}", view.message); // &str, borrowed from `bytes`

The catch with views is correctly handling lifetimes. A FooView<'a> can't cross an .await point if the buffer it borrows from doesn't live long enough — which is exactly the situation in an async RPC handler. OwnedView<V> solves this by bundling a view with its backing Bytes buffer:

// 'static + Send, still zero-copy
let owned = OwnedView::<LogRecordView>::decode(bytes.into())?;
tokio::spawn(async move {
    println!("{}", owned.message); // &str, borrowed from the owned Bytes
});

This is what connect-rust provides to service handlers. On a decode-heavy workload — 50 structured log records per request, ~22 KB batches with varints, strings, nested messages, and map entries — it's about 33% faster than tonic+prost at high concurrency, with allocator pressure at 3.6% of CPU versus 9.6%.

Configurable safety controls

There are some aspects of protobuf that can be unsafe or enable attacks when used in an RPC framework that deserve special consideration. Depending on your use case and environment, it is useful to be able to tune the safety controls around these issues.

Buffa provides a DecodeOptions type to control both recursion limits and message size. Prost enforces a fixed recursion limit of 100 nested messages; buffa uses the same default, but allows for overriding this via with_recursion_limit(n) to a smaller or larger value as needed. For message length, Prost does not apply a limit (this is handled within Tonic for RPC considerations), while buffa provides control at the protobuf level, with a default that matches the protobuf spec (2 GiB). The connect-rust library applies a 4 MiB default limit for messages and HTTP bodies that is more typical for HTTP servers.

Rust String / &str values must be valid UTF-8, whereas proto2 strings do not have this restriction and later editions provide an opt-out for UTF-8 verification. Regardless, the natural user expectation is that a string field should be a String in the Rust struct, so buffa chooses to perform UTF-8 validation for all strings by default. The library also provides an opt-out that changes string fields with utf8_validation = NONE (all proto2 strings by default, or editions fields that explicitly opt out) to Vec<u8> / &[u8] instead, allowing validation during decode to be bypassed without misleading the user as to the safety of the content. The user can then call from_utf8 or from_utf8_unchecked as they deem fit, taking responsibility for the decision.

Ergonomics

Protobuf makes some very opinionated choices around message semantics, which can be quite different from the typical behavior of primitive data types in most languages. Two examples of this semantic mismatch that require careful resolution in Rust are optional message fields and enums.

Message fields have default value semantics, that combined with recursive message types, can be difficult to represent cleanly. Prost uses Option<M> or Option<Box<M>> for optional message fields, depending on whether the message type is recursive. This results in some awkward code when attempting to dereference or assign to those fields:

let name = msg.address.as_ref().unwrap().street.as_str();

msg.address = Some(Address {
    street: "123 Main St".into(),
    ..Default::default()
});

Buffa defines MessageField<T> that handles all message fields, and this provides Deref and From trait implementations. This produces more natural field interaction:

let name = &msg.address.street;

msg.address = Address {
    street: "123 Main St".into(),
    ..Default::default()
}.into();

Protobuf enums in the current editions are "open", due to the possibility of unknown enum values from future evolutions of the enum definition. Prost uses raw i32 for enum values; for buffa we define EnumValue<T> as a proper Rust enum, while preserving unknown values for round-trip fidelity:

use buffa::EnumValue;

pub struct Contact {
    pub phone_type: EnumValue<PhoneType>,
    // ...
}

// Match directly - the type carries the known/unknown distinction:
match contact.phone_type {
    EnumValue::Known(PhoneType::MOBILE) => { /* ... */ }
    EnumValue::Known(PhoneType::HOME) => { /* ... */ }
    EnumValue::Known(PhoneType::WORK) => { /* ... */ }
    EnumValue::Unknown(v) => { /* v is the raw i32 from the wire */ }
}

// Or compare directly (PartialEq<E> is implemented):
if contact.phone_type == PhoneType::MOBILE { /* ... */ }

For closed enums (from proto2), fields are directly the enum type, with no middle EnumValue<T> layer.

Supporting `no_std`

The core runtime is no_std + alloc, with optional JSON serialization via serde. Enabling std adds std::io integration and std::time conversions, but the wire format, views, and JSON all work without it. Rust is well suited to embedded systems and constrained environments, and I believe that protobufs can also be beneficial in such scenarios. The encoding is efficient, and makes it easier for these systems to integrate with the broader ecosystem. While we have not yet pushed this to the logical conclusion of a partial ConnectRPC implementation that works with embassy, reqwless, and/or picoserve, the door is open for others to implement this.

There are some small ergonomic consequences when using no_std — the JsonParseOptions that are normally scoped via a thread-local for deserialization (as serde provides no mechanism to provide a deserialization context for the entire operation) are instead a global OnceBox. This is usually fine, as most applications do not vary the parse options over the lifetime of the process, but it is a loss of flexibility compared to std.

connect-rust: the RPC layer

Connect-rust is a Tower-based implementation of the ConnectRPC protocol, including support for handling gRPC and gRPC-Web requests, and JSON/binary encoded messages, all from the same handler, as the ConnectRPC specification intends. Unary and all three streaming RPC types (client streaming, server streaming, and bidirectional) are supported for both clients and servers. The client transports can use HTTP/1.1 and HTTP/2, with or without TLS as appropriate.

The architecture is straightforward: codegen emits a monomorphic FooServiceServer<T> per service, with a compile-time match on the method name. No Arc<dyn Handler> vtable or per-request allocation is required for dispatch. It drops into any Tower-compatible HTTP framework like Axum, or you can use the built-in standalone server that uses hyper directly:

impl GreetService for MyGreetService {
    async fn greet(
        &self,
        ctx: Context,
        request: OwnedView<GreetRequestView<'static>>,
    ) -> Result<(GreetResponse, Context), ConnectError> {
        let response = GreetResponse {
            greeting: format!("Hello, {}!", request.name),
            ..Default::default()
        };
        Ok((response, ctx))
    }
}

let service = Arc::new(MyGreetService);
let router = service.register(Router::new());
Server::new(router).serve("127.0.0.1:8080".parse()?).await?;

There are some known ergonomics issues here: I prioritized shipping a release for feedback over attempting to achieve perfection for a 0.x release. Threading the context in and out of the handler (returning Ok((response, ctx))) is awkward, and the request type OwnedView<ReqView<'static>> is overly explicit. This will likely change to ConnectRequest<Req> and ConnectResponse<Resp> types in a future release, where the request context and response options are separated and the lifetime is implicit.

Client code for interacting with services is also what you would expect:

let http = HttpClient::plaintext();
let config = ClientConfig::new("http://localhost:8080".parse()?);
let client = GreetServiceClient::new(http, config);

let response = client.greet(GreetRequest {
    name: "World".into(),
    ..Default::default()
}).await?;

It is worth noting one small security ergonomics decision here: the transport constructors have no bare new(), instead one must explicitly choose between plaintext() or with_tls(config), and these enforce the appropriate URL scheme (http and https respectively). This is an intentional choice to make the decision to use plaintext explicit and consequential; obfuscating this detail in options for new() is how security incidents are born.

What conformance tests failed to catch

Both crates passed the full conformance suites for protobuf and ConnectRPC weeks before I would have called them ready for consumption. Conformance exercises protocol correctness. It does not exercise adversarial resource bounds — nobody writes a conformance test that sends you a gzip bomb.

Four real issues made it past green conformance, surfaced during security review:

The server enforced a size limit on incoming request bodies; the client did not, calling .collect().await on whatever the server sent back. The safe pattern had been applied asymmetrically.
CompressionProvider::decompress_with_limit had a default implementation that decompressed fully and checked the size afterwards. The gzip/zstd implementations overrode this behavior correctly, but a custom provider using the default would be vulnerable to decompression bombs.
The TLS handshake had no timeout. A client that connects but never sends a ClientHello would hold the connection forever.
grpc-timeout: 18446744073709551615S parsed to Duration::from_secs(u64::MAX), which panics when added to Instant::now(). The code had a comment saying the spec limits this to 8 digits. The code did not match the comment.

These were all fixed, but the themes generalize past this project: asymmetric client/server defenses, unsafe trait defaults inherited by custom impls, parse-site leniency trusted at the use-site, comments that claim enforcement without enforcing. If you're building an RPC crate, that's a decent checklist.

Where the spec runs out

The protobuf spec carefully defines what happens when an unknown value arrives for a closed enum in a singular field, a repeated field, and a map value — but says nothing about a closed enum inside a oneof. Java treats it like the singular case. Go doesn't implement closed-enum semantics at all and still passes conformance, because conformance doesn't test closed enums. For buffa, we chose to follow Java's precedent.

Similarly, the spec doesn't say whether overflow bits in the 10th byte of a varint should be rejected or silently discarded. C++ and prost discard, whereas for buffa we reject varints with these bits set. Both are defensible choices, but neither is tested or treated preferentially by the conformance tests. Claude did a fantastic job of finding these issues, but only when specifically prompted to compare the end product of spec x tests x code to find possible gaps and inconsistencies relative to other gold standard implementations.

Performance

I want to be careful here, as benchmark numbers are the part most likely to be misread. Connect-rust is not meaningfully faster than tonic for real services. In realistic workloads, like a handler that interacts with a database or upstream services, the optimizations in buffa and connect-rust increase throughput by around 4%. On decode-heavy workloads where buffa's views pay off, it's further ahead: 33% more throughput at high concurrency on the log-ingest benchmark.

What actually moves the needle:

Zero-copy views. Allocator pressure is 3.6% of server CPU versus 9.6% for tonic+prost on string-heavy payloads.
Monomorphic dispatch. Compile-time match beats dyn-dispatch by a small but real margin when there's nothing else in the request path.
Connect framing. On unary RPCs, Connect's protocol is genuinely cheaper than gRPC — no envelope header, no trailing HEADERS frame. At 200k+ req/s, gRPC's trailer is ~200k extra h2 HEADERS encodes per second. The gap is ~5% at low concurrency, ~23% at c=256.

The buffa and connect-rust repositories contain the benchmark code and result snapshots — as always, take synthetic benchmarks with a grain of salt. More performance optimizations are possible in the future, but the gains are likely marginal for all but the most performance-focused and tuned services.

The future

I hope you will try buffa and connect-rust, and provide feedback! While I have tried to make the code readable, ergonomic, and correct, there will inevitably be issues with something as complex as a full protobuf and ConnectRPC implementation primarily built by AI in 6 weeks. I am committed to improving the library, to show that such AI-assisted development can be both fast and high quality.

There are also features we have yet to add, but plan to work on soon:

Message extensions — this is a necessary feature to implement many plugins and middleware, like protovalidate.
Reflection — handling unknown message types via runtime provided descriptors is commonly used as part of implementing middleware and plugins.
Textproto and protoyaml — while I had initially decided to not bother supporting textproto as it is fairly old, I've become convinced that it is a useful addition to help facilitate migrations of proto2-era C/C++ services that may still depend on this. Similarly, YAML is a de facto standard for configuration files, and I'd love to be able to support that with an IDL and protovalidate to enforce correctness.

There are likely many other features that you might want in these implementations — please let us know by opening issues on the repositories, and comment on the ConnectRPC RFC!

Authenticated Docker Hub image pulls in Kubernetes

Iain McGinniss — Sat, 22 Apr 2023 23:20:00 +0000

I recently stumbled over the Docker Hub image pull rate limit in one of my Kubernetes clusters. A pod failed to start due to being unable to pull an image, with a 429 Too Many Requests error response. The Docker Hub documentation says:

For anonymous users, the rate limit is set to 100 pulls per 6 hours per IP address. For authenticated users, it’s 200 pulls per 6 hour period. Users with a paid Docker subscription get up to 5000 pulls per day.

With a small cluster that doesn't change frequently, the rate limit is typically not an issue. However, as your cluster grows and pods are started or replaced more frequently, the likelihood of image pulls failing due to hitting the rate limit increases.

So, what can be done to avoid this? There are a few options:

Authenticate your Docker Hub image pulls. This seems like the obvious answer, but as we will discuss, this can be more complex than you might expect.
Operate a pull-through cache registry, like Artifactory or the open source reference Docker registry. This will allow you to pull images from Docker Hub less frequently, improving your chances of staying under the anonymous usage limit.
Use images from repositories directly controlled by your organization. For example, you could exclusively use images stored in a registry provided by your cloud provider (e.g. AWS Elastic Container Registry).

Options 2 and 3 are worth considering for reasons beyond the scope of this article - reduced data transfer fees, more visibility into the images you deploy, and better options for mitigating supply chain attacks. However, they are not always practical options - the overheads of configuring, operating, and monitoring a private registry can be substantial. Additionally, you will likely need to change all of your image references - a default image reference like busybox:1.36 is implicitly referencing Docker Hub, and would need to be changed to something else like my-image-registry.example/busybox:1.36. If you are using Helm charts to manage the install of common services in your cluster, such overrides are not always possible - the chart may hard-code image references.

So, how can we authenticate our Docker Hub image pulls? If you have control over the underlying operating system of your Kubernetes nodes (e.g. through a custom virtual machine image, or cloud-init configuration), you can provide Docker Hub credentials directly in the containerd registry configuration. It may also be possible to use a kubelet credential provider, though this interface is primarily designed for dynamic credential generation or retrieval, whereas Docker Hub credentials are currently static. I could not find a credential provider for this interface that could supply either static credentials or those sourced from a credential vault.

Staying within the bounds of what Kubernetes offers at the conceptual layer, we can declaratively configure authenticated image pulls using image pull secrets. We will go through the details of this approach, then discuss some of the complexities that arise in larger clusters.

Creating and using image pull secrets

Creating a Docker Hub credential

First, we need a Docker Hub user and password for pulling images. I recommend creating a Docker Hub account specifically for this purpose, separate from that of any specific person in your organization. This will allow you independently manage the lifecycle of this account and its security. If you have a Docker organization, it is best to create this "service account" under that organization, which has the added benefit of giving the account a significantly higher image pull rate limit (16x what you get with unauthenticated pulls). If you need even more, and you still don't want to operate a pull-through cache or private registry, you can pay Docker for even higher limits if needed.

Within your chosen account, you can create a personal access token that can be used as the "password" for authenticated image pulls. I recommend configuring this token to be "Read-only" or "Public Repo Read-only" to limit exposure. A "Read-only" token will allow pulls of images from private Docker Hub repositories that the account has access to, which may be desirable if you are also using Docker Hub as your primary store for private images. If you only intend to use public images from Docker Hub, "Public Repo Read-only" is sufficient.

Creating an image pull secret

To use the personal access token from your Docker Hub account for image pulls in a Kubernetes cluster, we must create a secret object with type kubernetes.io/dockerconfigjson to hold the credentials. The credentials are embedded in a JSON object with the following structure:

{
  "auths": {
    "https://index.docker.io/v1/": {
      "username": "my-robot-account-1234",
      "password": "dckr_pat_asd-fghjklqwertyuiopZXCVBNM"
    }
  }
}

The secret object embeds the Base64 encoded form of that JSON object. It will look something like:

apiVersion: v1
kind: Secret
metadata:
  name: dockerhub-image-pull-secret
  namespace: default
type:  kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: |
    ewogICJhdXRocyI6IHsKICAgICJodHRwczovL2luZGV4LmRvY2tlci5pby92  
    MS8iOiB7CiAgICAgICJ1c2VybmFtZSI6ICJteS1yb2JvdC1hY2NvdW50LTEy  
    MzQiLAogICAgICAicGFzc3dvcmQiOiAiZGNrcl9wYXRfYXNkLWZnaGprbHF3  
    ZXJ0eXVpb3BaWENWQk5NIgogICAgfQogIH0KfQo=

It is important to note that secrets are namespaced. This means they can only be referenced by other Kubernetes resources in the same namespace, unless you set up specific role-based access control rules to allow cross-namespace access to the secret.

Using an image pull secret

With an image pull secret defined, we have two main options for using it. First, we can reference the secret in our pod specifications under the imagePullSecrets field:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: default
spec:
  containers:
    - name: nginx
      image: nginx:1.23.4
  imagePullSecrets:
    - name: dockerhub-image-pull-secret

However, it can be tedious and error-prone to specify this reference across many different pod specifications. The second option is to reference the secret as part of a service account object:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: default
  namespace: default
imagePullSecrets:
  - name: dockerhub-image-pull-secret

Unless explicitly changed, pods reference the default service account of their namespace. So, this service account acts as a shared location to define the image pull secrets.

Problems at scale

Configuring and using a Docker Hub image pull secret for a single namespace is relatively straightforward. However, repeating this work for tens or hundreds of namespaces is tedious and error prone:

You must clone the secret into every namespace, or ensure that one secret is accessible in all namespaces. Both require per-namespace configuration, and if you are dynamically creating namespaces, you will need associated automation.
You must reference the secret in all relevant places, whether those are pod specifications or service accounts.

Fortunately, tools exist that can help automate these tasks.

imagepullsecret-patcher

The problem of using image pull secrets in larger clusters has been known for quite some time. TitanSoft decided to do something about it in 2019, releasing the imagepullsecret-patcher tool. This executes within your cluster and does two things every 10 seconds for every namespace:

Checks to see if an image pull secret exists; if it does not exist or has stale contents, it is cloned from a primary secret.
Checks to ensure the default service account has an imagePullSecrets reference to the cloned secret in that namespace. If it does not, the service account is patched to include the reference. This can also be optionally applied to all service accounts, not just the default service account.

This does exactly what we would want, in the absence of a more official mechanism provided by Kubernetes itself. It appears to have worked well for many people, at least based on the popularity of the GitHub repository. However, there are some risks:

The tool requires cluster-wide read-write access to all secrets and service accounts, via a ClusterRole and ClusterRoleBinding. Any potential vulnerabilities in the tool could be leveraged to gain access to all of your secrets, including service account secrets, which may provide access into other parts of your infrastructure (e.g. via tokens issued by AWS IAM Roles for Service Accounts or equivalents in other cloud providers).
The tool has not been updated since October 2020 - this isn't necessarily an issue, as what the tool does is relatively simple. However, it does mean that the last release is compiled against relatively old versions of the Go standard library and other dependencies. This increases the risk that there are some known vulnerabilities via those dependencies that could be exploited.

The tool is simple and effective, and at only ~1k lines of Go code, it is entirely feasible for a small devops team to maintain a fork for updates or tweaks if desired. For my purposes, I was interested to see if other tools existed that are more actively maintained and could solve the same problems.

Cluster-wide secrets with External Secrets Operator

When you are dealing with larger clusters, it is also likely that your organization is using a centralized secret store like Hashicorp Vault, or cloud-specific solutions like AWS Secrets Manager. If you are doing this, you may also be using the external secrets operator to import secrets from those environments into Kubernetes. This operator supports defining a ClusterExternalSecret, which allows an external secret to be imported into multiple namespaces. The definition will look something like:

apiVersion: external-secrets.io/v1beta1
kind: ClusterExternalSecret
metadata:
  name: dockerhub-image-pull-secret
spec:
  # instantiate the secrets in _every_ namespace
  namespaceSelector: {}
  externalSecretSpec:
    secretStoreRef:
      name: cluster-secret-store
      kind: ClusterSecretStore
    target:
      template:
        type: kubernetes.io/dockerconfigjson
        data:
          .dockerconfigjson: |
            {
              "auths": {
                "https://index.docker.io/v1": {
                  "username": "{{ .username }}"
                  "password": "{{ .password }}"
                }
              }
            }
        dataFrom:
          - extract:
              key: dockerhub-account

This definition imports a secret with external name "dockerhub-account" from a ClusterSecretStore. We extract from this secret the fields "username" and "password", and inject those values into the expected image pull secret structure using a template. The external secrets operator will create an ExternalSecret with the same name in every namespace (as they all implicitly match the namespaceSelector). Those ExternalSecrets will produce Secrets with the same name and the rendered template. The end result is that a Secret named dockerhub-image-pull-secret will exist in every namespace, ready to be referenced as needed.

Additional configuration may be required for your specific environment and needs - see the documentation, and in particular consider changing the default refreshInterval - the default interval is one hour. While your Docker Hub credentials are not likely to change frequently, you may wish to ensure that when you change it in your central system that it propagates to your clusters quickly and without manual intervention.

Patching service accounts with RedHat's patch-operator

The general problem of patching resource definitions that are not fully under your control has also been recognized for some time. This is true of default resources created and updated by cluster maintenance tools (e.g. kOps), or by public helm charts that you use to install common services and operators (e.g. nginx-ingress, cert-manager, and so on). High quality charts will allow you to override the configuration of important components such as service account references, but some simpler charts offer much less configuration.

Red Hat's patch-operator is designed to allow you to declare patches to target resources in your cluster. We can use this to patch service accounts to include references to our image pull secrets:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: Patch
metadata:
  name: dockerhub-image-pull-secret-patch
  namespace: patches
spec:
  serviceAccountRef:
    name: patching-service-account
  patches:
    service-account-patch:
      targetObjectRef: 
        apiVersion: v1
        kind: ServiceAccount
      patchType: application/strategic-merge-patch+json
      patchTemplate: |
        imagePullSecrets:
          - name: dockerhub-image-pull-secret

This Patch custom resource definition declares that we would like to add the dockerhub-image-pull-secret reference to the imagePullSecrets list of all service accounts. The targetObjectRef is not restricted to a particular namespace or name, so it will match all service accounts (as described in the documentation for the operator).

This, combined with external secrets operator to produce the secrets we need in each namespace, allows us to configure image pull secrets across all service accounts. The permissions required to apply this patch are attached to the referenced service account, patching-service-account. This allows us to isolate different permission sets for different patches. For this patch, we need:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: patching-service-account
  namespace: patches
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: service-account-modifier
rules:
  - apiGroups: [""]
    resources: ["serviceaccounts"]
    verbs: ["get", "watch", "list", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: service-account-modifier-binding
subjects:
  - kind: ServiceAccount
    name: patching-service-account
    namespace: patches
roleRef:
  kind: ClusterRole
  name: service-account-modifier
  apiGroup: rbac.authorization.k8s.io

There is one significant issue with this approach, however: there is no declared patch strategy for imagePullSecrets on service accounts. Without this, the default behavior is to replace the list - so if you had any existing image pull secret references in your service account, these would be removed. See this kubernetes GitHub issue from 2019 that describes the problem in more detail, and why it has not been fixed (tl;dr: specifying a patch strategy will break backwards compatibility, and there has not yet been any desire to introduce a v2 of the ServiceAccount object kind, so we're stuck with the behavior).

In my situation, I did not have any service accounts with specifically configured lists of image pull secrets, so the patch is replacing an empty list in every service account with the single reference in the patch. However, the situation in your cluster may differ, and you may want to change the patch targetObjectRef to target a more specific set of service accounts, such as just the default service accounts. It is also possible to use a labelSelector or annotationSelector with a matchExpressions list to avoid modifying service accounts with specific labels or annotations, e.g.

labelSelector:
  matchExpressions:
    key: do-not-patch-image-pull-secrets
    operator: DoesNotExist

This selector will exclude any service account with a label do-no-patch-image-pull-secrets, which you could specifically add to the service accounts that the patch would break. This would have to be communicated to all engineers that define service accounts in your cluster.

Conclusion

Managing authenticated image pulls to Docker Hub in a large cluster is surprisingly difficult. It was likely not anticipated that Docker Hub would introduce such strict rate limits for unauthenticated requests in November 2020 - this now essentially requires that all cluster operators know how to configure authenticated image pulls, as you will need these whether you continue to use Docker Hub or migrate to a pull-through cache registry or privately managed registry with clones of your essential images.

With control over the virtual machines that your kubelets run on, it is possible to configure the necessary credentials for authenticated image pulls to Docker Hub via the containerd config, or by writing a custom credential provider for your kubelets. This can avoid a lot of additional configuration in your Kubernetes resources, but is not necessarily a desirable or available option for all cluster operators. The alternative is to create and use image pull secrets via service accounts.

TitanSoft's imagepullsecret-patcher is a single-binary solution to replicating and using an image pull secret across all namespaces. It is not actively maintained, but the tool is simple enough that a small team should be able to patch and maintain a fork if needed. If you want to stick to other maintained open source tools, a reasonable solution can also be put together using external secrets operator. If you are operating a cluster at scale, you may already be using this. Red Hat's patch-operator can be used to attach the imported secrets to your service accounts across all namespaces, though there are some quirks to be wary of, due to the lack of a defined patch strategy for imagePullSecrets on service accounts.

Request routing for horizontally scaled services

Iain McGinniss — Sun, 08 Aug 2021 21:24:16 +0000

Networked systems engineering is a fundamental aspect of modern software engineering. The double-edged sword of internet-connected services is the opportunity for your service to be utilized by anyone (growth! impact! profit!), but success can result in extremely unpredictable load spikes and overall growth in resource requirements to keep things running smoothly. First, we shall discuss the options for handling variable and increasing load, and then focus on how we can effectively route requests across a horizontally scaled service (a term we shall define momentarily). Through exploring this topic, we will also touch on some more advanced tools and strategies, such as API gateways and service meshes. I hope you enjoy the journey, and that it helps you make some informed choices in your next system design.

Vertical and horizontal scaling

In the early stages of your service's life, over-provisioning is the simplest strategy to handle variable load. You estimate the peak load based on some service-specific characteristics and a wet finger held to the breeze, and ensure that we have sufficient capacity to handle that peak load.

If an unexpected spike in load arrives, or you just can't keep up with demand, bad things will happen.

As this estimated peak grows over time, one strategy to keep up is to vertically scale, which entails the ability to use physical or virtual machines with more resources, such as more compute cores, memory, persistent storage space, or network bandwidth.

To make use of these additional resources, the service must typically make use of a variety of techniques, such as process forking or multi-threading, bigger in-memory caches, and RAID configurations to increase disk I/O throughput and bandwidth.

The key distinction with vertical scaling is that the service can handle additional load without the need to spread across multiple computers - it can maximize utilization of available resources on a single computer. A service that horizontally scales, in contrast, utilizes additional computers and a network to distribute additional load. This involves a very different implementation strategy, and raises an important question: how should requests distributed across a dynamic set of instances?

Most services written before the era of cloud computing utilized vertical scaling, as this was typically the only viable option - provisioning of resources to support horizontal scaling was not economically viable for organizations with small, on-premise data centers. It could take weeks to provision and install new hardware for use by a service, so system architects had to plan ahead, and building services to vertically scale on big, over-provisioned servers was simpler. A client I worked with many years ago utilized huge IBM mainframes that cost millions of dollars to provision and install into their on-premise data center, all because their service was monolithic and unable to horizontally scale. The client was pushing the limits of hardware that could be managed as a single machine, and once that path was exhausted, there would be no choice but to re-architect their system to horizontally scale.

Cloud computing providers support vertically scaling services through infrastructure-as-a-service (IaaS) products that offer a variety of virtual machine sizes, from cheap single-core and low-memory instances (e.g. Amazon's t2.nano instance type with 1 vCPU and 0.5GB of RAM), all the way up to monstrous instances with hundreds of cores and terabytes of memory (e.g. Google's m2-ultramem-416 instance type, with 416 vCPUs and 11.7TB of RAM). You will, of course, pay a steep price for such vertical scaling capability - pricing increases are linear in vCPU and memory to a point, then become bespoke and negotiated when you reach truly specialized hardware. The m2-ultramem-416 instance costs $50.91 per hour with a one year reservation (~$438k/yr), whereas a more typical n2-standard-16 instance with 16 vCPU and 64GB of RAM costs $0.79 per hour (~$7k/yr). If a service can horizontally scale, and more efficiently follow load, your maximum cost of using commodity instances like n2-standard-16 will often be an order of magnitude lower.

Freedom through constraint - PaaS and FaaS

Cloud computing also introduced other ways to think about service development, via platform-as-a-service (PaaS) or function-as-a-service (FaaS) offerings, e.g. Google App Engine, AWS Elastic Beanstalk, AWS Lambda, Azure Functions, and many others. With PaaS/FaaS systems, the compute infrastructure running your service ceases to be your concern, allowing you to focus on the higher level semantics of your service. From the developer's perspective, there is no "server", or alternatively, just one logical server with theoretically unlimited scaling. In reality, the limitations imposed on how a service is implemented by these technologies ensures that your service horizontally scales across multiple instances, in a way that is managed by the cloud provider. The restrictions may also allow for multi-tenancy, where multiple services (potentially even from multiple customers) can run on the same hardware at the same time, yielding resource utilization improvements and cost savings for the cloud provider, and maybe even for you.

The loss of implementation freedom from using a PaaS or FaaS framework may not be acceptable for all services, certainly not without a broad re-think of how the service is implemented. Many organizations will instead choose to stick with vertical scaling utilizing the instance types made available by cloud providers, for as long as possible. With enough growth, a service will inevitably hit a point where it cannot utilize bigger servers effectively, or that bigger servers are just not available.

Implementing services to horizontally scale on dynamically provisioned IaaS cloud resources is the middle ground that many organizations choose. This provides them with more direct control over when and how the system scales, but comes with a significant complexity cost. If your organization is using Docker Swarm or Kubernetes, you are likely self-managing horizontally scaled services, and may be immersed in the overhead and complexity of doing this safely and effectively.

Other good reasons to horizontally scale

Building a service to support some degree of horizontal scaling is a very complex topic in its own right. However, it is increasingly becoming a requirement of contemporary software engineering, for good reasons beyond just scalability. Horizontal scaling can also support reliability, maintainability, and efficiency goals.

Rather than having a single point of failure in our single entry point to the service, we can utilize multiple servers and work to ensure fail-over between servers is transparent. This fail-over can be "active-passive", where a single server is still responsible for all traffic, but when it fails we have a "warm" backup server that is available to take over within minutes. This type of configuration is common with traditional relational databases such as MySQL. Even better, an "active-active" configuration means that all servers are "hot" and capable of handling requests in short order, supporting recovery in under 30 seconds, and often immediately. In an active-active configuration, it is also possible to send requests to any server, and have the set of servers behave as a predictable, single "logical" service.

Aside from this reliability and recovery advantage, the ability to have multiple active-active servers and replace them at will also facilitates transparent maintenance. Rather than requiring downtime to roll out new releases of our service, as was required when replacing the deployment of a service on fixed hardware, we can instead introduce some new servers with a new version and migrate requests from the old set to the new set. We have flexibility in how this can be done, allowing for clever rollout strategies such as canaries, where we send a small percentage of traffic to the new service and ensure it performs acceptably before proceeding to a full rollout. Related to this, a blue-green deployment allows us to incrementally move from the old release to the new release, while retaining the ability to roll back quickly if any undesirable behavior is detected.

Finally, by utilizing multiple smaller machines, the capacity we provision can more closely align to the actual load our service is experiencing at any point in time, resulting in more efficient utilization of resources. For services with a day-night load cycle (i.e. your service is interactive, and all your customers are within a narrow band of time zones, so you see significantly less load at night than during the day) you then have the opportunity to scale up and down periodically, potentially saving a significant amount of money compared to over-provisioning to be capable of handling your estimated peak load at all times. This type of dynamic scaling is a huge advantage of cloud infrastructure, and can also be automated by utilizing real-time aggregate metrics (e.g. CPU usage, network throughput, etc.) to decide how many instances should be active at any given time.

So, how can we implement a service to horizontally scale? This question is so system-dependent that a succinct answer cannot be provided that covers all cases. One broad exception is in the case of "stateless" services - those which handle requests in an isolated, predictable way, with no side effects that are local to the service. A typical stateless service will utilize a data store with ACID properties, and will process requests based entirely on the contents of the request and manipulations of that data store. This service can be horizontally scaled through simple replication - the number of instances required is typically linearly dependent on request throughput. This attractive characteristic is why so much emphasis is placed on utilizing stateless services wherever possible.

DNS and client-side load balancing

So, you have a service that can be horizontally scaled, meaning that there are multiple service instances available to process requests. How can we effectively and evenly direct requests from clients to service instances?

At the simplest level, DNS records can map the canonical name for a service to multiple IP addresses of service instances that implement that service. This basic abstraction allows clients to "find" the service using a durable identifier, while allowing the service maintainer to change the set of instances handling requests over time, as needed.

A DNS record is a very flexible way to map abstract to real - it can map a name to multiple addresses (A/AAAA records), or map to another DNS record (CNAME records). Combined with a reasonable "time to live" (TTL) value for the record, we have a reliable mechanism to propagate changes of our abstract-to-real mapping to users of a service in an efficient way.

A client can choose randomly between the available addresses - in aggregate, this will evenly distributed clients across server IP addresses. This is referred to as client side load balancing, where clients possess sufficient intelligence to satisfy our goals, either through coordination (e.g. in gRPC Traffic Director) or as an emergent property of independent behavior in aggregate.

With the ability to add and remove IP addresses from the DNS mapping as needed, client side load balancing can support both our scalability and efficiency goals. If we can create new instances of our service at will, we can dynamically auto-scale, and escape the limitations of vertical scaling (big, over-provisioned servers) in favor horizontal scaling (small, cheap, easily replaced servers).

Client side load balancing can work well if clients are uniform, meaning that the behavior of each client is roughly equivalent in terms of the demands they place on the system. This is often not the case, however - clients with a 1Gbps fiber connection can place significantly more load on file servers than those with a mediocre cellular connection, for instance. The types of requests that clients make may also result in significant variability in load, dependent upon the data associated to that user's requests. So, some careful evaluation must be made of whether client side load balancing will work for your service or not. In gRPC and other client stacks, the client may attempt to self-distribute the load of the requests they generate by opening multiple connections to different server instances, and perform client-side round-robin distribution of requests across these servers. Even this can still be problematic, as if the client only opens a small number of connections (typically, three), this may still impose their load on a small subset of all available server instances.

There are also other limitations to this DNS-based approach to solving our reliability, scalability and efficiency goals. Clients must do what we desire and expect - this is fine when we also control the client, but can problematic when interacting with third party controlled software like web browsers or clients implemented by other teams or organizations. Changes to our DNS records propagate erratically, depending on which DNS service our clients use. If we use a TTL of 1 minute, we can expect many (perhaps most) of our clients to see the change within a minute, but some may take significantly longer, due to configuration details of infrastructure that may be completely out of your control. There are also some practical limits to how many IP addresses we can reference with our DNS records; managing hundreds of addresses per name may be feasible, but thousands or more is unrealistic - DNS servers do not guarantee to respond with all mappings for a name. When using UDP for DNS lookups, we are limited to what can fit in a packet. When using TCP, a DNS server may limit the number of responses to prevent slow-down for other clients.

Service-side load balancing

So, if we cannot rely on DNS and client-side behavior for fast, reliable changes to our service routing, what else can we do? Like most problems in computing science, we can add a layer of indirection! Load balancers provide a more adaptable approach - point your DNS records at a TCP or HTTP load balancer, then manage a more dynamic "target set" behind that load balancer. This leaves the DNS records in a much more static configuration, while giving you immediate and localized control over where requests are routed to behind that load balancer. Even if you're not using the load balancer for auto-scaling of your service, this is still a very effective tool for handling rolling restarts of services and general maintenance of your service, without worrying about client-side effects.

Load balancers often only handle traffic at layer 3 or 4 in the OSI model - that is, they are packet- or connection-oriented, evenly distributing traffic across the target set at the IP, UDP, or TCP levels. An inbound packet or connection arrives, and the load balancer decides which service to forward that to. This can be a simple strategy such as round-robin distribution, or a more weighted load balancing strategy can be used based on metrics reported from service instances.

L3/L4 load balancing was fine for most systems prior to the advent of HTTP/2, as HTTP/1.1 and other common internet application protocols are connection oriented. Requests are serviced serially on each connection, and each connection belongs to just one client. To achieve more concurrency in request handling, more connections were used. This is ultimately wasteful of bandwidth, with significantly more packets required for maintenance of TCP connection state. It also results in higher average and P99 request processing latency, and can exhaust the operating system's connection handling resources. In an environment where connections are created by clients over the internet (e.g. browsers), these relatively low throughput but highly variable connections can place an uneven load on the servers behind the load balancer, despite your best efforts.

Protocols such as HTTP/1.1 are effectively stateless, meaning that each request carries everything required to process it, and there is no explicit relationship between requests or expectation that requests must be processed in the order they are received. This opens the possibility of decoupling the set of connections into a load balancer from the set of connections out of the load balancer. Thousands of low-throughput inbound connections can be transformed into a much smaller number of high-throughput connections to the service instances, or alternatively, we can ensure that each request from the load balancer to a server uses its own connection, to prevent head-of-line blocking of request processing.

When a load balancer is capable of doing this type of request-level processing, we typically classify it as a layer 7 (aka. application layer) load balancer. Requests from multiple clients with separate connections may be multiplexed onto a single connection to a server. When using a protocol such as HTTP/2, which allows for requests to be processed and responded to out-of-order, this can result in a significant decrease in connection maintenance waste, or conversely much higher utilization of available resources.

Reverse proxies

Once you recognize the utility of a layer 7 load balancer in your architecture, many other possibilities become apparent. As HTTP requests carry information on the client-side view of the service intended to process a request (via the Host header), we can potentially use a single load balancer for multiple logical services. Going further, we could inspect the path, request method, query parameters, or perhaps even the body in making request routing decisions. With these features, we now have what many would call a reverse proxy - a service with a flexible configuration language that allows for more sophisticated routing decisions than just distributing requests blindly across a set of servers known to the load balancer by their IP addresses only.

One of the most commonly used open source reverse proxies is nginx, though cloud vendors also typically provide their own managed options, such as AWS Application Load Balancer, Google Cloud Load Balancing, etc.

These systems typically allow for dynamic configuration of the reverse proxy through an API or a JSON/TOML/YAML configuration language, allowing the routing rules to be changed without any disruption to the currently active request processing. Reverse proxies are typically have higher resource requirements and overhead than L3/L4 load balancers, but are still highly optimized and capable of handling upwards of 10k requests per second per instance on commodity hardware, and are also usually horizontally scalable to hundreds of instances and millions of requests per second. This meta-scaling problem when dealing with millions of requests per second is usually handled by having multiple L4 load balancers directing connections to the reverse proxies, which in turn are directing requests to your service instances.

With a reverse proxy, we can start to implement some more sophisticated request routing patterns, such as routing based on request type. With browser-based web applications we must often define all our request endpoints on the same domain, in order to comply with the web application model where cookies and related security controls will not permit certain types of cross-domain requests. Our reverse proxy can maintain this illusion for the front-end, while splitting requests of different types, typically distinguished by path, to different upstream services that handle those requests. For example, our GraphQL API endpoint (maybe "/api/graphql" from the client perspective) may be serviced by an Apollo Gateway, while other API endpoints (^\/api\/(?!graphql).*$) might be handled by a service we implement, and everything else (^\/(?!api\/).*$) is handled by a static content server.

Deeper request inspection could also allow us to do things like route expensive requests to a separate set of servers, so that we can independently manage the auto-scaling for that request type from other requests. This can also be an effective tool to ensure that these potentially problematic requests do not impact the performance expectations of the other requests; if they are all handled by the same pool of servers, expensive requests may impact the latency and jitter of others through resource contention.

API gateways

So far, we have discussed request routing middleware that is primarily tasked with routing requests efficiently across our service instances, but otherwise does not concern itself with the implementation details of how responses are formulated for requests. However, once we introduce middleware that is inspecting the contents of requests as part of routing decisions, is it not a great conceptual leap from there to middleware that is also responsible for some common request processing tasks. For example, the middleware could be responsible for ensuring that requests carry valid authentication information, such as valid cookies or request signatures. The middleware could also perform tasks such as content encoding transformations, changing uncompressed responses to Brotli compressed responses, resulting in lower bandwidth utilization when communicating with browser clients. Conceptually, any in-line transformation of requests or responses could be handled by the middleware. I refer to request routing middleware with this capability as an API gateway.

API gateways go beyond the capabilities of reverse proxies by providing an extension mechanism that allows for custom code to be executed as part of the request processing pipeline. Traefik Proxy is a good example of this, as it has a plugin mechanism that allows for custom Go code to influence request processing decisions. Kong also deserves a mention here, with the ability to write plugins in Lua, or integrate with external binaries written in practically any language. Most available API gateways provide a set of standard request processing plugins to handle authentication, rate limiting, content type transforms, and so on. In general, they provide a useful way to enforce some consistent request processing standards across all services, that can be implemented in one place, rather than requiring re-implementation across multiple services - particularly if those services are implemented in different languages.

Conclusion

The myriad of request processing middlewares does not end here - there is also the very trendy topic of service meshes that we could cover, but I choose to leave that as an exercise to interested readers, as it is a rapidly evolving and complex space (see: Istio, linkerd, Consul, Tanzu, etc).

So, what should you use in your own architecture? If you are writing something from scratch, I would strongly recommend looking at PaaS/FaaS options to avoid all of this complexity for as long as possible - the less time you have to spending thinking auto-scaling and request processing, the more time you have to build out the value-providing aspects of your service. If you maintain existing services that are incompatible with a PaaS/FaaS approach, your cloud provider's managed load balancer / reverse proxy is likely the most straightforward option to use. If you find that you need a little more flexibility, an API gateway such as Traefik or Kong can be excellent option; just be prepared to have to think much more deeply about the network layer of your application.

Debugging protocol buffer compilation

Iain McGinniss — Fri, 08 May 2020 21:46:44 +0000

In some recent work I have been trying to generate Kotlin extensions to the standard Java code that is generated by the protocol buffer compiler, using the excellent kroto-plus plugin. For those who have not had the pleasure of working with the protocol buffer compiler, protoc, it can be a frustratingly opaque tool to work with. In the process of attempting to understand why compilation was not working as expected, I ended up learning a lot more about how the compiler works.

My problem arose when attempting to move my protocol buffer compilation out of the safe confines of my own project into a shared definition repository that my employer uses, so that I could generate client bindings and server stubs for multiple languages. In making the move, all of a sudden the expected Kotlin extension code was not being generated at all, with no error messages or warnings. The mistake turned out to be trivial, but getting to a position where I could identify the mistake was frustrating.

Invoking protoc

When working with a single language, there are many tools that wrap protoc, such that you never have to understand its interface. For my Kotlin project, the Gradle protobuf plugin follows the typical Gradle idiom: place proto files in a standard directory, specify what plugins to use, and voilà, you have generated code. However, in a polyglot environment, you often have to go a little deeper and at least understand how to invoke protoc directly.

An invocation can look something like:

protoc \
    -I /opt/include \
    -I /path/to/project/protos \
    --java_out=/path/to/project/gen/java \
    --grpc_out=/path/to/project/gen/java \
    --plugin=protoc-gen-grpc=/usr/local/bin/protoc-gen-grpc-java \
    --kroto_out=ConfigPath=/path/to/project/kroto.yml:/path/to/project/gen/java \
    --plugin=protoc-gen-kroto=/usr/local/bin/protoc-gen-kroto-plus \
    /path/to/project/protos/service.proto \
    /path/to/project/protos/internals.proto \
    ...

Breaking this down:

The -I flag specifies an "include" directory, where protobuf files that are imported can be found. There is no module system to speak of for protobuf compilation, so there are just some loose conventions around namespacing using package names that correspond to directory structures. For the "well known" types like google.protobuf.Any, these will typically be sourced from some common include path - in the example above, /opt/include, which exists within a docker container I'm using. Multiple import paths can be specified, and imports are searched for in all the specified include directories, using relative paths. For those familiar with traditional C compilers, this is a very common pattern for compilation in the era before versioned package management.
The --java_out= flag specifies two things: that we want to generate Java code, and where we want that generated code to go. Java code generation is a built-in feature of protoc, so this is all that's required in this case. The built-in generators for protoc are cpp, csharp, java, js, objc, php, python and ruby.
The --grpc_out= flag similarly specifies that we want to generate "grpc", and where to generate to. However, what does "grpc" mean here, as it is not a built-in generator type? By default, the compiler will look for a plugin on the PATH, with name protoc-gen-grpc.
The --plugin=protoc-gen-grpc=... flag explicitly tells the compiler where to find the plugin executable for "grpc". In this case, we're pointing it to a version of protoc-gen-grpc-java. Effectively, we aliased "grpc-java" to "grpc"; we could have instead specified a "--grpc-java_out=" flag without specifying the explicit plugin reference, as long as protoc-gen-grpc-java could be found on the PATH.
Observant readers will notice something slightly different about the flag specified for kroto: it embeds a parameter to be passed to the plugin. The compiler's awkward syntax for doing this is to allow embedding a parameter with syntax --gen_out=param:/gen/path. Only a single string parameter can be specified, but the full string between the first = and the : is treated as that parameter value. I have seen plugins use various conventions here to allow specifying multiple params, like key1=value1,key2=value2,...,keyN=valueN. Some instead use the parameter to point to an external configuration file, which is what the kroto-plus plugin does.
Finally, a list of proto files to be parsed and sent to the generators is provided.

While you can use relative paths rather than absolute paths when invoking protoc, I have stumbled over problems with mixing relative paths and import directives. I find it helps to keep me sane to use absolute paths when working with protoc, so it can be very clearly determined where everything is coming from, irrespective of the current working directory.

Docker containers like those provided by Namely try to help out in a polyglot environment by hiding some of the details of protoc and plugin invocation behind a more uniform contract. I recommend trying these out to see if they fit your needs before implementing your own solution, but I have found that a basic understanding of the protocol buffer compiler and plugins is essential to success.

What are plugins, really?

Protobuf compiler plugins are standalone executables that interpret CodeGeneratorRequest protobufs from stdin, and produce CodeGeneratorResponse protobufs to stdout. The main protobuf compiler executable produces these requests, embedding a set of FileDescriptorProto instances for the parsed proto files. The response from the plugins embeds instructions on source files to be generated and their contents.

Plugins can therefore be implemented using any technology that can serialize and deserialize protobufs. Some are implemented in C++, some in Java, some in Go. It's a very flexible system, if a rather opaque one from the user's perspective when attempting to diagnose a problem.

Intercepting plugin requests and responses

As plugins just need to be something that the protoc process can invoke and interact with using stdin and stdout, we can wrap virtually any plugin in a shell script to see what is being provided and returned, using tee:

#!/bin/sh
tee /tmp/input.pb.bin | /usr/local/bin/kroto-plus | tee /tmp/output.pb.bin

While these files are binary encoded protobufs, they are dominated by text content, as you will see if you open them in a text editor. However, the protoc binary can also decode binary protobufs to its "text proto" format. If we have a clone of the protobuf repo in ~/protobuf, we can run:

protoc --decode=google.protobuf.compiler.CodeGeneratorRequest \
    -I ~/protobuf/src ~/protobuf/src/google/protobuf/compiler/plugin.proto \
    < /tmp/input.pb.bin

This will output the text format of the proto to stdout, making the contents a little easier to read. Similarly, you can do this for the output of the plugin:

protoc --decode=google.protobuf.compiler.CodeGeneratorResponse \
    -I ~/protobuf/src ~/protobuf/src/google/protobuf/compiler/plugin.proto \
    < /tmp/output.pb.bin

How did this help me?

A mentioned earlier, when attempting to use the kroto-plus plugin manually, it was not producing any kotlin output. This was weird, as it was producing kotlin output fine in my separate Gradle-based build environment.

I couldn't see what I was doing wrong: I was using the same version of protoc, the same plugins, and the same source files, though moved around to fit the location conventions in my docker build container. I scrutinized the paths and everything looked correct, but I missed one small detail.

The kroto-plus plugin, as mentioned earlier, requires a parameter to be passed of form ConfigPath=/path/to/config. I had transcribed this incorrectly as ConfigFile=/path/to/config - that four character difference caused all of my problems. I had expected a mistake like this would cause an error, as I had seen errors emitted by the kroto-plus plugin before when pointed at an invalid path for the configuration. However, with an incorrect property name rather than an incorrect path, the plugin does nothing.

I was able to see the difference with the help of my little interception script: by recording the input to the plugin in the working environment and the broken environment, and then performing a diff, the mistake becomes readily apparent:

> diff in_working.proto.txt in_broken.proto.txt
23c23
< parameter: "ConfigPath=/path/to/kroto-config.yml"
---
> parameter: "ConfigFile=/path/to/kroto-config.yml"

After making a dent in my desk with my face, the fix was trivial, and the expected Kotlin output emerged as expected.

Conclusion

The protoc tool is mysterious, and in many respects, poorly documented. During my time at Google using the internal version of Bazel, all the details of correctly compiling protocol buffers to usable code were hidden under several layers of abstraction. For those of us now outside the Chocolate Factory, we are mostly left to fend for ourselves in figuring out how to use this complex tool, or must accept being disintermediated by other tools that may not do what we need.

The approach presented above can help diagnose more complex problems than just typos: through the ability to observe the full input and output to plugins, differences in compiler versions, input source, paths and annotations can be easily observed.

Over time, I believe we will build a community knowledge base and consistent patterns for the usage of protobufs and gRPC. Tools like buf show promise in this regard, and wrappers like Namely's docker containers can provide a good reference for using protoc where the documentation is lacking - take a look at their protoc wrapping script for a real world usage of protoc for polyglot builds.

I hope at least one person out there finds this useful. This my first foray into public technical writing in a few years, and it feels good to share what I have learned beyond my immediate colleagues again. If you have any questions, feel free to contact me: iainmcgin-at-gmail-dot-com.

DEV Community: Iain McGinniss

Zero-copy protobuf and ConnectRPC for Rust

Why another protobuf crate?

Zero-copy message views

Configurable safety controls

Ergonomics

Supporting no_std

connect-rust: the RPC layer

What conformance tests failed to catch

Where the spec runs out

Performance

The future

Authenticated Docker Hub image pulls in Kubernetes

Creating and using image pull secrets

Creating a Docker Hub credential

Creating an image pull secret

Using an image pull secret

Problems at scale

imagepullsecret-patcher

Cluster-wide secrets with External Secrets Operator

Patching service accounts with RedHat's patch-operator

Conclusion

Request routing for horizontally scaled services

Vertical and horizontal scaling

Freedom through constraint - PaaS and FaaS

Other good reasons to horizontally scale

DNS and client-side load balancing

Service-side load balancing

Reverse proxies

API gateways

Conclusion

Debugging protocol buffer compilation

Invoking protoc

What are plugins, really?

Intercepting plugin requests and responses

How did this help me?

Conclusion

Supporting `no_std`