<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Iain McGinniss</title>
    <description>The latest articles on DEV Community by Iain McGinniss (@iainmcgin).</description>
    <link>https://dev.to/iainmcgin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F100513%2F4e6978c9-0226-4507-8504-fecc13a8fb48.jpg</url>
      <title>DEV Community: Iain McGinniss</title>
      <link>https://dev.to/iainmcgin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/iainmcgin"/>
    <language>en</language>
    <item>
      <title>Zero-copy protobuf and ConnectRPC for Rust</title>
      <dc:creator>Iain McGinniss</dc:creator>
      <pubDate>Wed, 25 Mar 2026 20:19:53 +0000</pubDate>
      <link>https://dev.to/iainmcgin/zero-copy-protobuf-and-connectrpc-for-rust-1m3e</link>
      <guid>https://dev.to/iainmcgin/zero-copy-protobuf-and-connectrpc-for-rust-1m3e</guid>
      <description>&lt;p&gt;As part of my work at Anthropic, I open sourced two Rust crates that fill a gap in the RPC ecosystem: &lt;a href="https://crates.io/crates/buffa" rel="noopener noreferrer"&gt;&lt;strong&gt;buffa&lt;/strong&gt;&lt;/a&gt;, a pure-Rust Protocol Buffers implementation with first-class editions support and zero-copy message views, and &lt;a href="https://crates.io/crates/connectrpc" rel="noopener noreferrer"&gt;&lt;strong&gt;connect-rust&lt;/strong&gt;&lt;/a&gt;, a Tower-based ConnectRPC implementation that speaks Connect, gRPC, and gRPC-Web on the same handlers. We're nominating connect-rust as the &lt;a href="https://github.com/connectrpc/connectrpc.com/pull/334" rel="noopener noreferrer"&gt;canonical Rust implementation&lt;/a&gt; of ConnectRPC — if you're using Connect from Go, TypeScript, or Kotlin, this is intended to be the peer implementation for Rust. This code is already in production at Anthropic.&lt;/p&gt;

&lt;p&gt;Both crates pass their full upstream conformance suites — Google's protobuf binary and JSON conformance for buffa, and all ~12,800 ConnectRPC server, client, and TLS tests for connect-rust — though as I'll cover later, a green conformance run turned out to be necessary but far from sufficient for production. They were built in six weeks with Claude Opus 4.6 doing most of the work under my direction — an experiment in specification-driven development for performance- and correctness-sensitive library code.&lt;/p&gt;

&lt;p&gt;This post covers the Rust-specific design decisions: how protobuf editions map to codegen, why zero-copy views need an &lt;code&gt;OwnedView&lt;/code&gt; escape hatch, the type-level choices for mapping protobuf's semantics onto Rust, and what the conformance suites didn't catch. A separate post on the AI-assisted development process will follow.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why another protobuf crate?
&lt;/h1&gt;

&lt;p&gt;The short answer: &lt;strong&gt;editions&lt;/strong&gt;, and leaning into the specific capabilities of Rust.&lt;/p&gt;

&lt;p&gt;The schism caused by proto2/proto3 semantic divergence is being healed by switching to a feature-flag-driven approach to the wire format, defined by &lt;a href="https://protobuf.dev/editions/overview/" rel="noopener noreferrer"&gt;editions&lt;/a&gt;. Each edition specifies a default feature set. Messages defined in files from older editions (e.g. proto2) can be used from newer editions. If you are defining new message types, these details are mostly irrelevant, but if you are porting legacy systems from the proto2 era, this is likely to make your migration significantly easier.&lt;/p&gt;

&lt;p&gt;The Rust ecosystem hasn't caught up. Prost is the de facto standard, and it's excellent at what it does — but it targets binary proto3, with JSON bolted on via pbjson, and the library is now only passively maintained. Google's official Rust implementation (protobuf v4) supports editions but is built around upb, so it needs a C compiler and there is not yet an RPC layer implementation above it.&lt;/p&gt;

&lt;p&gt;Buffa treats editions as the core abstraction, and is also designed to work well with the current best available tooling: &lt;a href="https://buf.build/product/cli" rel="noopener noreferrer"&gt;buf CLI&lt;/a&gt; for language-agnostic code generation (though &lt;code&gt;protoc&lt;/code&gt; is of course also supported), a &lt;code&gt;buffa-build&lt;/code&gt; crate for &lt;code&gt;build.rs&lt;/code&gt; integration for those who prefer cargo-oriented build pipelines, and careful definition of crate features and generated code that allow the library to be used in &lt;code&gt;no_std&lt;/code&gt;, or to select the features that matter to your use case (e.g. excluding JSON support).&lt;/p&gt;

&lt;h2&gt;
  
  
  Zero-copy message views
&lt;/h2&gt;

&lt;p&gt;Rust provides an interesting opportunity that does not exist for implementations in other languages: we can support message "views" where data does not need to be copied from an input buffer to be used, reducing allocation cost.&lt;/p&gt;

&lt;p&gt;The need for this wasn't purely speculative. In an early prototype of connect-rust that used prost, profiling showed that per-field &lt;code&gt;String&lt;/code&gt; allocation and &lt;code&gt;HashMap&lt;/code&gt; construction for map fields significantly contributed to allocator pressure. For string and bytes fields, copying data is avoidable &lt;em&gt;and&lt;/em&gt; safe with Rust's borrow checker, referencing the content directly in the input buffer.&lt;/p&gt;

&lt;p&gt;Buffa generates two types per message: &lt;code&gt;MyMessage&lt;/code&gt; (owned, heap-allocated, similar to what you'd expect in most implementations) and &lt;code&gt;MyMessageView&amp;lt;'a&amp;gt;&lt;/code&gt; (borrows directly from the wire buffer). The view type's string fields are &lt;code&gt;&amp;amp;'a str&lt;/code&gt;, its bytes fields are &lt;code&gt;&amp;amp;'a [u8]&lt;/code&gt;, and its map fields are a flat &lt;code&gt;Vec&amp;lt;(K, V)&amp;gt;&lt;/code&gt; scan — no hashing on the decode path.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Owned decode - allocates per string field&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;LogRecord&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;decode_from_slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="py"&gt;.message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// String&lt;/span&gt;

&lt;span class="c1"&gt;// View decode - zero-copy&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;LogRecordView&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;decode_view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="py"&gt;.message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// &amp;amp;str, borrowed from `bytes`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The catch with views is correctly handling lifetimes. A &lt;code&gt;FooView&amp;lt;'a&amp;gt;&lt;/code&gt; can't cross an &lt;code&gt;.await&lt;/code&gt; point if the buffer it borrows from doesn't live long enough — which is exactly the situation in an async RPC handler. &lt;code&gt;OwnedView&amp;lt;V&amp;gt;&lt;/code&gt; solves this by bundling a view with its backing &lt;code&gt;Bytes&lt;/code&gt; buffer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 'static + Send, still zero-copy&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;owned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;OwnedView&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;LogRecordView&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;spawn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;move&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;owned&lt;/span&gt;&lt;span class="py"&gt;.message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// &amp;amp;str, borrowed from the owned Bytes&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what connect-rust provides to service handlers. On a decode-heavy workload — 50 structured log records per request, ~22 KB batches with varints, strings, nested messages, and map entries — it's about 33% faster than tonic+prost at high concurrency, with allocator pressure at 3.6% of CPU versus 9.6%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configurable safety controls
&lt;/h2&gt;

&lt;p&gt;There are some aspects of protobuf that can be unsafe or enable attacks when used in an RPC framework that deserve special consideration. Depending on your use case and environment, it is useful to be able to tune the safety controls around these issues.&lt;/p&gt;

&lt;p&gt;Buffa provides a &lt;code&gt;DecodeOptions&lt;/code&gt; type to control both recursion limits and message size. Prost enforces a fixed recursion limit of 100 nested messages; buffa uses the same default, but allows for overriding this via &lt;code&gt;with_recursion_limit(n)&lt;/code&gt; to a smaller or larger value as needed. For message length, Prost does not apply a limit (this is handled within Tonic for RPC considerations), while buffa provides control at the protobuf level, with a default that matches the protobuf spec (2 GiB). The &lt;code&gt;connect-rust&lt;/code&gt; library applies a 4 MiB default limit for messages and HTTP bodies that is more typical for HTTP servers.&lt;/p&gt;

&lt;p&gt;Rust &lt;code&gt;String&lt;/code&gt; / &lt;code&gt;&amp;amp;str&lt;/code&gt; values &lt;em&gt;must&lt;/em&gt; be valid UTF-8, whereas proto2 strings do not have this restriction and later editions provide an opt-out for UTF-8 verification. Regardless, the natural user expectation is that a &lt;code&gt;string&lt;/code&gt; field &lt;em&gt;should&lt;/em&gt; be a &lt;code&gt;String&lt;/code&gt; in the Rust struct, so buffa chooses to perform UTF-8 validation for all strings by default. The library also provides an opt-out that changes &lt;code&gt;string&lt;/code&gt; fields with &lt;code&gt;utf8_validation = NONE&lt;/code&gt; (all proto2 strings by default, or editions fields that explicitly opt out) to &lt;code&gt;Vec&amp;lt;u8&amp;gt;&lt;/code&gt; / &lt;code&gt;&amp;amp;[u8]&lt;/code&gt; instead, allowing validation during decode to be bypassed without misleading the user as to the safety of the content. The user can then call &lt;code&gt;from_utf8&lt;/code&gt; or &lt;code&gt;from_utf8_unchecked&lt;/code&gt; as they deem fit, taking responsibility for the decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ergonomics
&lt;/h2&gt;

&lt;p&gt;Protobuf makes some very opinionated choices around message semantics, which can be quite different from the typical behavior of primitive data types in most languages. Two examples of this semantic mismatch that require careful resolution in Rust are optional message fields and enums.&lt;/p&gt;

&lt;p&gt;Message fields have default value semantics, that combined with recursive message types, can be difficult to represent cleanly. Prost uses &lt;code&gt;Option&amp;lt;M&amp;gt;&lt;/code&gt; or &lt;code&gt;Option&amp;lt;Box&amp;lt;M&amp;gt;&amp;gt;&lt;/code&gt; for optional message fields, depending on whether the message type is recursive. This results in some awkward code when attempting to dereference or assign to those fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="py"&gt;.address&lt;/span&gt;&lt;span class="nf"&gt;.as_ref&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="py"&gt;.street&lt;/span&gt;&lt;span class="nf"&gt;.as_str&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="py"&gt;.address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Address&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;street&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"123 Main St"&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="nn"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Buffa defines &lt;strong&gt;&lt;code&gt;MessageField&amp;lt;T&amp;gt;&lt;/code&gt;&lt;/strong&gt; that handles all message fields, and this provides &lt;code&gt;Deref&lt;/code&gt; and &lt;code&gt;From&lt;/code&gt; trait implementations. This produces more natural field interaction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="py"&gt;.address.street&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="py"&gt;.address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Address&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;street&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"123 Main St"&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="nn"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Protobuf enums in the current editions are "open", due to the possibility of unknown enum values from future evolutions of the enum definition. Prost uses raw &lt;code&gt;i32&lt;/code&gt; for enum values; for buffa we define &lt;strong&gt;&lt;code&gt;EnumValue&amp;lt;T&amp;gt;&lt;/code&gt;&lt;/strong&gt; as a proper Rust &lt;code&gt;enum&lt;/code&gt;, while preserving unknown values for round-trip fidelity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;buffa&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;EnumValue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Contact&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;phone_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EnumValue&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PhoneType&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Match directly - the type carries the known/unknown distinction:&lt;/span&gt;
&lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;contact&lt;/span&gt;&lt;span class="py"&gt;.phone_type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nn"&gt;EnumValue&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Known&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;PhoneType&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;MOBILE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nn"&gt;EnumValue&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Known&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;PhoneType&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;HOME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nn"&gt;EnumValue&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Known&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;PhoneType&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;WORK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nn"&gt;EnumValue&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Unknown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* v is the raw i32 from the wire */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Or compare directly (PartialEq&amp;lt;E&amp;gt; is implemented):&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;contact&lt;/span&gt;&lt;span class="py"&gt;.phone_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nn"&gt;PhoneType&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;MOBILE&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For closed enums (from proto2), fields are directly the enum type, with no middle &lt;code&gt;EnumValue&amp;lt;T&amp;gt;&lt;/code&gt; layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supporting &lt;code&gt;no_std&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The core runtime is &lt;code&gt;no_std&lt;/code&gt; + &lt;code&gt;alloc&lt;/code&gt;, with optional JSON serialization via serde. Enabling &lt;code&gt;std&lt;/code&gt; adds &lt;code&gt;std::io&lt;/code&gt; integration and &lt;code&gt;std::time&lt;/code&gt; conversions, but the wire format, views, and JSON all work without it. Rust is well suited to embedded systems and constrained environments, and I believe that protobufs can also be beneficial in such scenarios. The encoding is efficient, and makes it easier for these systems to integrate with the broader ecosystem. While we have not yet pushed this to the logical conclusion of a partial ConnectRPC implementation that works with &lt;a href="https://embassy.dev/" rel="noopener noreferrer"&gt;embassy&lt;/a&gt;, &lt;a href="https://github.com/drogue-iot/reqwless" rel="noopener noreferrer"&gt;reqwless&lt;/a&gt;, and/or &lt;a href="https://github.com/sammhicks/picoserve" rel="noopener noreferrer"&gt;picoserve&lt;/a&gt;, the door is open for others to implement this.&lt;/p&gt;

&lt;p&gt;There are some small ergonomic consequences when using &lt;code&gt;no_std&lt;/code&gt; — the &lt;code&gt;JsonParseOptions&lt;/code&gt; that are normally scoped via a thread-local for deserialization (as serde provides no mechanism to provide a deserialization context for the entire operation) are instead a global &lt;code&gt;OnceBox&lt;/code&gt;. This is usually fine, as most applications do not vary the parse options over the lifetime of the process, but it is a loss of flexibility compared to &lt;code&gt;std&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  connect-rust: the RPC layer
&lt;/h1&gt;

&lt;p&gt;Connect-rust is a &lt;a href="https://docs.rs/tower" rel="noopener noreferrer"&gt;Tower&lt;/a&gt;-based implementation of the &lt;a href="https://connectrpc.com/" rel="noopener noreferrer"&gt;ConnectRPC&lt;/a&gt; protocol, including support for handling gRPC and gRPC-Web requests, and JSON/binary encoded messages, all from the same handler, as the ConnectRPC specification intends. Unary and all three streaming RPC types (client streaming, server streaming, and bidirectional) are supported for both clients and servers. The client transports can use HTTP/1.1 and HTTP/2, with or without TLS as appropriate.&lt;/p&gt;

&lt;p&gt;The architecture is straightforward: codegen emits a monomorphic &lt;code&gt;FooServiceServer&amp;lt;T&amp;gt;&lt;/code&gt; per service, with a compile-time &lt;code&gt;match&lt;/code&gt; on the method name. No &lt;code&gt;Arc&amp;lt;dyn Handler&amp;gt;&lt;/code&gt; vtable or per-request allocation is required for dispatch. It drops into any Tower-compatible HTTP framework like Axum, or you can use the built-in standalone server that uses hyper directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;GreetService&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;MyGreetService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OwnedView&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;GreetRequestView&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;'static&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GreetResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;ConnectError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GreetResponse&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;greeting&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Hello, {}!"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="py"&gt;.name&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="nn"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
        &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MyGreetService&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="nf"&gt;.register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="nn"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"127.0.0.1:8080"&lt;/span&gt;&lt;span class="nf"&gt;.parse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are some known ergonomics issues here: I prioritized shipping a release for feedback over attempting to achieve perfection for a &lt;code&gt;0.x&lt;/code&gt; release. Threading the context in and out of the handler (returning &lt;code&gt;Ok((response, ctx))&lt;/code&gt;) is awkward, and the request type &lt;code&gt;OwnedView&amp;lt;ReqView&amp;lt;'static&amp;gt;&amp;gt;&lt;/code&gt; is overly explicit. This will likely change to &lt;code&gt;ConnectRequest&amp;lt;Req&amp;gt;&lt;/code&gt; and &lt;code&gt;ConnectResponse&amp;lt;Resp&amp;gt;&lt;/code&gt; types in a future release, where the request context and response options are separated and the lifetime is implicit.&lt;/p&gt;

&lt;p&gt;Client code for interacting with services is also what you would expect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;HttpClient&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;plaintext&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;ClientConfig&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://localhost:8080"&lt;/span&gt;&lt;span class="nf"&gt;.parse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;GreetServiceClient&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="nf"&gt;.greet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GreetRequest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"World"&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="nn"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is worth noting one small security ergonomics decision here: the transport constructors have no bare &lt;code&gt;new()&lt;/code&gt;, instead one must explicitly choose between &lt;code&gt;plaintext()&lt;/code&gt; or &lt;code&gt;with_tls(config)&lt;/code&gt;, and these enforce the appropriate URL scheme (&lt;code&gt;http&lt;/code&gt; and &lt;code&gt;https&lt;/code&gt; respectively). This is an intentional choice to make the decision to use plaintext explicit and consequential; obfuscating this detail in options for &lt;code&gt;new()&lt;/code&gt; is how security incidents are born.&lt;/p&gt;

&lt;h1&gt;
  
  
  What conformance tests failed to catch
&lt;/h1&gt;

&lt;p&gt;Both crates passed the full conformance suites for protobuf and ConnectRPC weeks before I would have called them ready for consumption. Conformance exercises &lt;em&gt;protocol correctness&lt;/em&gt;. It does not exercise &lt;em&gt;adversarial resource bounds&lt;/em&gt; — nobody writes a conformance test that sends you a gzip bomb.&lt;/p&gt;

&lt;p&gt;Four real issues made it past green conformance, surfaced during security review:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The server enforced a size limit on incoming request bodies; the client did not, calling &lt;code&gt;.collect().await&lt;/code&gt; on whatever the server sent back. The safe pattern had been applied asymmetrically.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CompressionProvider::decompress_with_limit&lt;/code&gt; had a default implementation that decompressed fully and checked the size &lt;em&gt;afterwards&lt;/em&gt;. The gzip/zstd implementations overrode this behavior correctly, but a custom provider using the default would be vulnerable to decompression bombs.&lt;/li&gt;
&lt;li&gt;The TLS handshake had no timeout. A client that connects but never sends a &lt;code&gt;ClientHello&lt;/code&gt; would hold the connection forever.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;grpc-timeout: 18446744073709551615S&lt;/code&gt; parsed to &lt;code&gt;Duration::from_secs(u64::MAX)&lt;/code&gt;, which panics when added to &lt;code&gt;Instant::now()&lt;/code&gt;. The code had a &lt;em&gt;comment&lt;/em&gt; saying the spec limits this to 8 digits. The code did not match the comment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These were all fixed, but the themes generalize past this project: asymmetric client/server defenses, unsafe trait defaults inherited by custom impls, parse-site leniency trusted at the use-site, comments that claim enforcement without enforcing. If you're building an RPC crate, that's a decent checklist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the spec runs out
&lt;/h2&gt;

&lt;p&gt;The protobuf spec carefully defines what happens when an unknown value arrives for a closed enum in a singular field, a repeated field, and a map value — but says nothing about a closed enum inside a oneof. Java treats it like the singular case. Go doesn't implement closed-enum semantics at all and still passes conformance, because conformance doesn't test closed enums. For buffa, we chose to follow Java's precedent.&lt;/p&gt;

&lt;p&gt;Similarly, the spec doesn't say whether overflow bits in the 10th byte of a varint should be rejected or silently discarded. C++ and prost discard, whereas for buffa we reject varints with these bits set. Both are defensible choices, but neither is tested or treated preferentially by the conformance tests. Claude did a fantastic job of finding these issues, but only when specifically prompted to compare the end product of spec x tests x code to find possible gaps and inconsistencies relative to other gold standard implementations.&lt;/p&gt;

&lt;h1&gt;
  
  
  Performance
&lt;/h1&gt;

&lt;p&gt;I want to be careful here, as benchmark numbers are the part most likely to be misread. Connect-rust is &lt;strong&gt;not&lt;/strong&gt; meaningfully faster than tonic for real services. In realistic workloads, like a handler that interacts with a database or upstream services, the optimizations in buffa and connect-rust increase throughput by around 4%. On decode-heavy workloads where buffa's views pay off, it's further ahead: 33% more throughput at high concurrency on the log-ingest benchmark.&lt;/p&gt;

&lt;p&gt;What actually moves the needle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero-copy views.&lt;/strong&gt; Allocator pressure is 3.6% of server CPU versus 9.6% for tonic+prost on string-heavy payloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monomorphic dispatch.&lt;/strong&gt; Compile-time &lt;code&gt;match&lt;/code&gt; beats dyn-dispatch by a small but real margin when there's nothing else in the request path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect framing.&lt;/strong&gt; On unary RPCs, Connect's protocol is genuinely cheaper than gRPC — no envelope header, no trailing HEADERS frame. At 200k+ req/s, gRPC's trailer is ~200k extra h2 HEADERS encodes per second. The gap is ~5% at low concurrency, ~23% at c=256.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://github.com/anthropics/buffa" rel="noopener noreferrer"&gt;buffa&lt;/a&gt; and &lt;a href="https://github.com/anthropics/connect-rust" rel="noopener noreferrer"&gt;connect-rust&lt;/a&gt; repositories contain the benchmark code and result snapshots — as always, take synthetic benchmarks with a grain of salt. More performance optimizations are possible in the future, but the gains are likely marginal for all but the most performance-focused and tuned services.&lt;/p&gt;

&lt;h1&gt;
  
  
  The future
&lt;/h1&gt;

&lt;p&gt;I hope you will try buffa and connect-rust, and provide feedback! While I have tried to make the code readable, ergonomic, and correct, there will inevitably be issues with something as complex as a full protobuf and ConnectRPC implementation primarily built by AI in 6 weeks. I am committed to improving the library, to show that such AI-assisted development can be both fast &lt;em&gt;and&lt;/em&gt; high quality.&lt;/p&gt;

&lt;p&gt;There are also features we have yet to add, but plan to work on soon:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Message extensions — this is a necessary feature to implement many plugins and middleware, like &lt;a href="https://protovalidate.com/" rel="noopener noreferrer"&gt;protovalidate&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reflection — handling unknown message types via runtime provided descriptors is commonly used as part of implementing middleware and plugins.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Textproto and protoyaml — while I had initially decided to not bother supporting textproto as it is fairly old, I've become convinced that it is a useful addition to help facilitate migrations of proto2-era C/C++ services that may still depend on this. Similarly, YAML is a de facto standard for configuration files, and I'd love to be able to support that with an IDL and protovalidate to enforce correctness.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are likely many other features that you might want in these implementations — please let us know by opening issues on the repositories, and comment on &lt;a href="https://github.com/connectrpc/connectrpc.com/pull/334" rel="noopener noreferrer"&gt;the ConnectRPC RFC&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>protobuf</category>
      <category>connectrpc</category>
      <category>rust</category>
      <category>anthropic</category>
    </item>
    <item>
      <title>Authenticated Docker Hub image pulls in Kubernetes</title>
      <dc:creator>Iain McGinniss</dc:creator>
      <pubDate>Sat, 22 Apr 2023 23:20:00 +0000</pubDate>
      <link>https://dev.to/iainmcgin/authenticated-docker-hub-image-pulls-in-kubernetes-k57</link>
      <guid>https://dev.to/iainmcgin/authenticated-docker-hub-image-pulls-in-kubernetes-k57</guid>
      <description>&lt;p&gt;I recently stumbled over the Docker Hub &lt;a href="https://docs.docker.com/docker-hub/download-rate-limit/" rel="noopener noreferrer"&gt;image pull rate limit&lt;/a&gt; in one of my Kubernetes clusters. A pod failed to start due to being unable to pull an image, with a &lt;code&gt;429 Too Many Requests&lt;/code&gt; error response. The Docker Hub documentation says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For anonymous users, the rate limit is set to 100 pulls per 6 hours per IP address. For &lt;a href="https://docs.docker.com/docker-hub/download-rate-limit/#how-do-i-authenticate-pull-requests" rel="noopener noreferrer"&gt;authenticated&lt;/a&gt; users, it’s 200 pulls per 6 hour period. Users with a paid &lt;a href="https://www.docker.com/pricing" rel="noopener noreferrer"&gt;Docker subscription&lt;/a&gt; get up to 5000 pulls per day.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With a small cluster that doesn't change frequently, the rate limit is typically not an issue. However, as your cluster grows and pods are started or replaced more frequently, the likelihood of image pulls failing due to hitting the rate limit increases.&lt;/p&gt;

&lt;p&gt;So, what can be done to avoid this? There are a few options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Authenticate your Docker Hub image pulls. This seems like the obvious answer, but as we will discuss, this can be more complex than you might expect.&lt;/li&gt;
&lt;li&gt;Operate a pull-through cache registry, like &lt;a href="https://jfrog.com/artifactory/" rel="noopener noreferrer"&gt;Artifactory&lt;/a&gt; or the &lt;a href="https://docs.docker.com/registry/" rel="noopener noreferrer"&gt;open source reference Docker registry&lt;/a&gt;. This will allow you to pull images from Docker Hub less frequently, improving your chances of staying under the anonymous usage limit.&lt;/li&gt;
&lt;li&gt;Use images from repositories directly controlled by your organization. For example, you could exclusively use images stored in a registry provided by your cloud provider (e.g. AWS Elastic Container Registry).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Options 2 and 3 are worth considering for reasons beyond the scope of this article - reduced data transfer fees, more visibility into the images you deploy, and better options for &lt;a href="https://blog.aquasec.com/supply-chain-threats-using-container-images" rel="noopener noreferrer"&gt;mitigating supply chain attacks&lt;/a&gt;. However, they are not always practical options - the overheads of configuring, operating, and monitoring a private registry can be substantial. Additionally, you will likely need to change all of your image references - a default image reference like &lt;code&gt;busybox:1.36&lt;/code&gt; is implicitly referencing Docker Hub, and would need to be changed to something else like &lt;code&gt;my-image-registry.example/busybox:1.36&lt;/code&gt;. If you are using Helm charts to manage the install of common services in your cluster, such overrides are not always possible - the chart may hard-code image references.&lt;/p&gt;

&lt;p&gt;So, how can we authenticate our Docker Hub image pulls? If you have control over the underlying operating system of your Kubernetes nodes (e.g. through a custom &lt;a href="https://www.dmtf.org/standards/ovf" rel="noopener noreferrer"&gt;virtual machine image&lt;/a&gt;, or &lt;a href="https://cloudinit.readthedocs.io/en/latest/index.html" rel="noopener noreferrer"&gt;cloud-init&lt;/a&gt; configuration), you can provide Docker Hub credentials directly in the &lt;a href="https://github.com/containerd/containerd/blob/main/docs/cri/config.md#registry-configuration" rel="noopener noreferrer"&gt;containerd registry configuration&lt;/a&gt;. It may also be possible to use a &lt;a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-credential-provider/" rel="noopener noreferrer"&gt;kubelet credential provider&lt;/a&gt;, though this interface is primarily designed for &lt;em&gt;dynamic&lt;/em&gt; credential generation or retrieval, whereas Docker Hub credentials are currently static. I could not find a credential provider for this interface that could supply either static credentials or those sourced from a credential vault.&lt;/p&gt;

&lt;p&gt;Staying within the bounds of what Kubernetes offers at the conceptual layer, we can declaratively configure authenticated image pulls using &lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/" rel="noopener noreferrer"&gt;image pull secrets&lt;/a&gt;. We will go through the details of this approach, then discuss some of the complexities that arise in larger clusters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating and using image pull secrets
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Creating a Docker Hub credential
&lt;/h3&gt;

&lt;p&gt;First, we need a Docker Hub user and password for pulling images. I recommend creating a Docker Hub account specifically for this purpose, separate from that of any specific person in your organization. This will allow you independently manage the lifecycle of this account and its security. If you have a &lt;a href="https://docs.docker.com/docker-hub/orgs/" rel="noopener noreferrer"&gt;Docker organization&lt;/a&gt;, it is best to create this &lt;a href="https://docs.docker.com/docker-hub/service-accounts/" rel="noopener noreferrer"&gt;"service account"&lt;/a&gt; under that organization, which has the added benefit of giving the account a significantly higher image pull rate limit (16x what you get with unauthenticated pulls). If you need even more, and you still don't want to operate a pull-through cache or private registry, you can pay Docker for &lt;a href="https://docs.docker.com/docker-hub/service-accounts/#enhanced-service-account-add-on-pricing" rel="noopener noreferrer"&gt;even higher limits&lt;/a&gt; if needed.&lt;/p&gt;

&lt;p&gt;Within your chosen account, you can &lt;a href="https://docs.docker.com/docker-hub/access-tokens/" rel="noopener noreferrer"&gt;create a personal access token&lt;/a&gt; that can be used as the "password" for authenticated image pulls. I recommend configuring this token to be "Read-only" or "Public Repo Read-only" to limit exposure. A "Read-only" token will allow pulls of images from private Docker Hub repositories that the account has access to, which may be desirable if you are also using Docker Hub as your primary store for private images. If you only intend to use public images from Docker Hub, "Public Repo Read-only" is sufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating an image pull secret
&lt;/h3&gt;

&lt;p&gt;To use the personal access token from your Docker Hub account for image pulls in a Kubernetes cluster, we must create a &lt;a href="https://kubernetes.io/docs/concepts/configuration/secret/" rel="noopener noreferrer"&gt;secret object&lt;/a&gt; with type &lt;code&gt;kubernetes.io/dockerconfigjson&lt;/code&gt; to hold the credentials. The credentials are embedded in a JSON object with the following structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"auths"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"https://index.docker.io/v1/"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-robot-account-1234"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"password"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dckr_pat_asd-fghjklqwertyuiopZXCVBNM"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The secret object embeds the Base64 encoded form of that JSON object. It will look something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dockerhub-image-pull-secret&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;kubernetes.io/dockerconfigjson&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;.dockerconfigjson&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;ewogICJhdXRocyI6IHsKICAgICJodHRwczovL2luZGV4LmRvY2tlci5pby92  &lt;/span&gt;
    &lt;span class="s"&gt;MS8iOiB7CiAgICAgICJ1c2VybmFtZSI6ICJteS1yb2JvdC1hY2NvdW50LTEy  &lt;/span&gt;
    &lt;span class="s"&gt;MzQiLAogICAgICAicGFzc3dvcmQiOiAiZGNrcl9wYXRfYXNkLWZnaGprbHF3  &lt;/span&gt;
    &lt;span class="s"&gt;ZXJ0eXVpb3BaWENWQk5NIgogICAgfQogIH0KfQo=&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is important to note that secrets are &lt;em&gt;namespaced&lt;/em&gt;. This means they can only be referenced by other Kubernetes resources in the same namespace, unless you set up specific &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/rbac/" rel="noopener noreferrer"&gt;role-based access control rules&lt;/a&gt; to allow cross-namespace access to the secret.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using an image pull secret
&lt;/h3&gt;

&lt;p&gt;With an image pull secret defined, we have two main options for using it. First, we can reference the secret in our &lt;a href="https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#podspec-v1-core" rel="noopener noreferrer"&gt;pod specifications&lt;/a&gt; under the &lt;code&gt;imagePullSecrets&lt;/code&gt; field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:1.23.4&lt;/span&gt;
  &lt;span class="na"&gt;imagePullSecrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dockerhub-image-pull-secret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, it can be tedious and error-prone to specify this reference across many different pod specifications. The second option is to reference the secret as part of a &lt;a href="https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#serviceaccount-v1-core" rel="noopener noreferrer"&gt;service account object&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;imagePullSecrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dockerhub-image-pull-secret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unless explicitly changed, pods reference the &lt;code&gt;default&lt;/code&gt; service account of their namespace. So, this service account acts as a shared location to define the image pull secrets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problems at scale
&lt;/h2&gt;

&lt;p&gt;Configuring and using a Docker Hub image pull secret for a single namespace is relatively straightforward. However, repeating this work for tens or hundreds of namespaces is tedious and error prone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You must clone the secret into every namespace, or ensure that one secret is accessible in all namespaces. Both require per-namespace configuration, and if you are dynamically creating namespaces, you will need associated automation.&lt;/li&gt;
&lt;li&gt;You must reference the secret in all relevant places, whether those are pod specifications or service accounts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fortunately, tools exist that can help automate these tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  imagepullsecret-patcher
&lt;/h3&gt;

&lt;p&gt;The problem of using image pull secrets in larger clusters has been known for quite some time. TitanSoft decided to do something about it in 2019, releasing the &lt;a href="https://github.com/titansoft-pte-ltd/imagepullsecret-patcher" rel="noopener noreferrer"&gt;imagepullsecret-patcher&lt;/a&gt; tool. This executes within your cluster and does two things every 10 seconds for every namespace:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checks to see if an image pull secret exists; if it does not exist or has stale contents, it is cloned from a primary secret.&lt;/li&gt;
&lt;li&gt;Checks to ensure the &lt;code&gt;default&lt;/code&gt; service account has an &lt;code&gt;imagePullSecrets&lt;/code&gt; reference to the cloned secret in that namespace. If it does not, the service account is patched to include the reference. This can also be optionally applied to &lt;em&gt;all&lt;/em&gt; service accounts, not just the default service account.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This does exactly what we would want, in the absence of a more official mechanism provided by Kubernetes itself. It appears to have worked well for many people, at least based on the popularity of the GitHub repository. However, there are some risks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The tool requires cluster-wide read-write access to all secrets and service accounts, via a ClusterRole and ClusterRoleBinding. Any potential vulnerabilities in the tool could be leveraged to gain access to all of your secrets, including service account secrets, which may provide access into other parts of your infrastructure (e.g. via tokens issued by &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html" rel="noopener noreferrer"&gt;AWS IAM Roles for Service Accounts&lt;/a&gt; or equivalents in other cloud providers).&lt;/li&gt;
&lt;li&gt;The tool has not been updated since October 2020  - this isn't necessarily an issue, as what the tool does is relatively simple. However, it does mean that the last release is compiled against relatively old versions of the Go standard library and other dependencies. This increases the risk that there are some known vulnerabilities via those dependencies that could be exploited.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tool is simple and effective, and at only ~1k lines of Go code, it is entirely feasible for a small devops team to maintain a fork for updates or tweaks if desired. For my purposes, I was interested to see if other tools existed that are more actively maintained and could solve the same problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cluster-wide secrets with External Secrets Operator
&lt;/h3&gt;

&lt;p&gt;When you are dealing with larger clusters, it is also likely that your organization is using a centralized secret store like &lt;a href="https://www.hashicorp.com/products/vault" rel="noopener noreferrer"&gt;Hashicorp Vault&lt;/a&gt;, or cloud-specific solutions like &lt;a href="https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html" rel="noopener noreferrer"&gt;AWS Secrets Manager&lt;/a&gt;. If you are doing this, you may also be using the &lt;a href="https://external-secrets.io/" rel="noopener noreferrer"&gt;external secrets operator&lt;/a&gt; to import secrets from those environments into Kubernetes. This operator supports defining a &lt;a href="https://external-secrets.io/v0.8.1/api/clusterexternalsecret/" rel="noopener noreferrer"&gt;ClusterExternalSecret&lt;/a&gt;, which allows an external secret to be imported into multiple namespaces. The definition will look something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;external-secrets.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterExternalSecret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dockerhub-image-pull-secret&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# instantiate the secrets in _every_ namespace&lt;/span&gt;
  &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
  &lt;span class="na"&gt;externalSecretSpec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;secretStoreRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cluster-secret-store&lt;/span&gt;
      &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterSecretStore&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kubernetes.io/dockerconfigjson&lt;/span&gt;
        &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;.dockerconfigjson&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;{&lt;/span&gt;
              &lt;span class="s"&gt;"auths": {&lt;/span&gt;
                &lt;span class="s"&gt;"https://index.docker.io/v1": {&lt;/span&gt;
                  &lt;span class="s"&gt;"username": "{{ .username }}"&lt;/span&gt;
                  &lt;span class="s"&gt;"password": "{{ .password }}"&lt;/span&gt;
                &lt;span class="s"&gt;}&lt;/span&gt;
              &lt;span class="s"&gt;}&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
        &lt;span class="na"&gt;dataFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;extract&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dockerhub-account&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This definition imports a secret with external name "dockerhub-account" from a &lt;a href="https://external-secrets.io/v0.8.1/api/clustersecretstore/" rel="noopener noreferrer"&gt;ClusterSecretStore&lt;/a&gt;. We extract from this secret the fields "username" and "password", and inject those values into the expected image pull secret structure using a &lt;a href="https://external-secrets.io/v0.8.1/guides/templating/" rel="noopener noreferrer"&gt;template&lt;/a&gt;. The external secrets operator will create an &lt;a href="https://external-secrets.io/v0.8.1/api/externalsecret/" rel="noopener noreferrer"&gt;ExternalSecret&lt;/a&gt; with the same name in every namespace (as they all implicitly match the &lt;code&gt;namespaceSelector&lt;/code&gt;). Those ExternalSecrets will produce Secrets with the same name and the rendered template. The end result is that a Secret named &lt;code&gt;dockerhub-image-pull-secret&lt;/code&gt; will exist in every namespace, ready to be referenced as needed.&lt;/p&gt;

&lt;p&gt;Additional configuration may be required for your specific environment and needs - see the &lt;a href="https://external-secrets.io" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;, and in particular consider changing the default &lt;a href="https://external-secrets.io/v0.8.1/api/externalsecret/#update-behavior" rel="noopener noreferrer"&gt;refreshInterval&lt;/a&gt; - the default interval is &lt;em&gt;one hour&lt;/em&gt;. While your Docker Hub credentials are not likely to change frequently, you may wish to ensure that when you change it in your central system that it propagates to your clusters quickly and without manual intervention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Patching service accounts with RedHat's patch-operator
&lt;/h3&gt;

&lt;p&gt;The general problem of patching resource definitions that are not fully under your control has also been recognized for some time. This is true of default resources created and updated by cluster maintenance tools (e.g. &lt;a href="https://kops.sigs.k8s.io/" rel="noopener noreferrer"&gt;kOps&lt;/a&gt;), or by public helm charts that you use to install common services and operators (e.g. &lt;a href="https://artifacthub.io/packages/helm/ingress-nginx/ingress-nginx" rel="noopener noreferrer"&gt;nginx-ingress&lt;/a&gt;,  &lt;a href="https://artifacthub.io/packages/helm/cert-manager/cert-manager" rel="noopener noreferrer"&gt;cert-manager&lt;/a&gt;, and so on). High quality charts will allow you to override the configuration of important components such as service account references, but some simpler charts offer much less configuration.&lt;/p&gt;

&lt;p&gt;Red Hat's &lt;a href="https://github.com/redhat-cop/patch-operator" rel="noopener noreferrer"&gt;patch-operator&lt;/a&gt; is designed to allow you to declare patches to target resources in your cluster. We can use this to patch service accounts to include references to our image pull secrets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redhatcop.redhat.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Patch&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dockerhub-image-pull-secret-patch&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;patches&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;patching-service-account&lt;/span&gt;
  &lt;span class="na"&gt;patches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;service-account-patch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;targetObjectRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
        &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
        &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
      &lt;span class="na"&gt;patchType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;application/strategic-merge-patch+json&lt;/span&gt;
      &lt;span class="na"&gt;patchTemplate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;imagePullSecrets:&lt;/span&gt;
          &lt;span class="s"&gt;- name: dockerhub-image-pull-secret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Patch custom resource definition declares that we would like to add the &lt;code&gt;dockerhub-image-pull-secret&lt;/code&gt; reference to the &lt;code&gt;imagePullSecrets&lt;/code&gt; list of all service accounts. The &lt;code&gt;targetObjectRef&lt;/code&gt; is not restricted to a particular namespace or name, so it will match all service accounts (as described in the documentation for the operator).&lt;/p&gt;

&lt;p&gt;This, combined with external secrets operator to produce the secrets we need in each namespace, allows us to configure image pull secrets across all service accounts. The permissions required to apply this patch are attached to the referenced service account, &lt;code&gt;patching-service-account&lt;/code&gt;. This allows us to isolate different permission sets for different patches. For this patch, we need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;patching-service-account&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;patches&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterRole&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service-account-modifier&lt;/span&gt;
&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;serviceaccounts"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;verbs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;watch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patch"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterRoleBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service-account-modifier-binding&lt;/span&gt;
&lt;span class="na"&gt;subjects&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;patching-service-account&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;patches&lt;/span&gt;
&lt;span class="na"&gt;roleRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterRole&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service-account-modifier&lt;/span&gt;
  &lt;span class="na"&gt;apiGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is one significant issue with this approach, however: there is no declared &lt;a href="https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/#notes-on-the-strategic-merge-patch" rel="noopener noreferrer"&gt;patch strategy&lt;/a&gt; for &lt;code&gt;imagePullSecrets&lt;/code&gt; on service accounts. Without this, the default behavior is to &lt;em&gt;replace&lt;/em&gt; the list - so if you had any existing image pull secret references in your service account, these would be removed. See &lt;a href="https://github.com/kubernetes/kubernetes/issues/72475" rel="noopener noreferrer"&gt;this kubernetes GitHub issue from 2019&lt;/a&gt; that describes the problem in more detail, and why it has not been fixed (tl;dr: specifying a patch strategy will break backwards compatibility, and there has not yet been any desire to introduce a &lt;code&gt;v2&lt;/code&gt; of the ServiceAccount object kind, so we're stuck with the behavior).&lt;/p&gt;

&lt;p&gt;In my situation, I did not have any service accounts with specifically configured lists of image pull secrets, so the patch is replacing an empty list in every service account with the single reference in the patch. However, the situation in your cluster may differ, and you may want to change the patch &lt;code&gt;targetObjectRef&lt;/code&gt; to target a more specific set of service accounts, such as just the default service accounts. It is also possible to use a &lt;code&gt;labelSelector&lt;/code&gt; or &lt;code&gt;annotationSelector&lt;/code&gt; with a &lt;code&gt;matchExpressions&lt;/code&gt; list to avoid modifying service accounts with specific labels or annotations, e.g.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;matchExpressions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;do-not-patch-image-pull-secrets&lt;/span&gt;
    &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DoesNotExist&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This selector will exclude any service account with a label &lt;code&gt;do-no-patch-image-pull-secrets&lt;/code&gt;, which you could specifically add to the service accounts that the patch would break. This would have to be communicated to all engineers that define service accounts in your cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Managing authenticated image pulls to Docker Hub in a large cluster is surprisingly difficult. It was likely not anticipated that Docker Hub would introduce such strict rate limits for unauthenticated requests in November 2020 - this now essentially &lt;em&gt;requires&lt;/em&gt; that all cluster operators know how to configure authenticated image pulls, as you will need these whether you continue to use Docker Hub or migrate to a pull-through cache registry or privately managed registry with clones of your essential images.&lt;/p&gt;

&lt;p&gt;With control over the virtual machines that your kubelets run on, it is possible to configure the necessary credentials for authenticated image pulls to Docker Hub via the containerd config, or by writing a custom &lt;a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-credential-provider/" rel="noopener noreferrer"&gt;credential provider for your kubelets&lt;/a&gt;. This can avoid a lot of additional configuration in your Kubernetes resources, but is not necessarily a desirable or available option for all cluster operators. The alternative is to create and use image pull secrets via service accounts. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/titansoft-pte-ltd/imagepullsecret-patcher" rel="noopener noreferrer"&gt;TitanSoft's imagepullsecret-patcher&lt;/a&gt; is a single-binary solution to replicating and using an image pull secret across all namespaces. It is not actively maintained, but the tool is simple enough that a small team should be able to patch and maintain a fork if needed. If you want to stick to other maintained open source tools, a reasonable solution can also be put together using &lt;a href="https://external-secrets.io/" rel="noopener noreferrer"&gt;external secrets operator&lt;/a&gt;. If you are operating a cluster at scale, you may already be using this. Red Hat's &lt;a href="https://github.com/redhat-cop/patch-operator" rel="noopener noreferrer"&gt;patch-operator&lt;/a&gt; can be used to attach the imported secrets to your service accounts across all namespaces, though there are some quirks to be wary of, due to the lack of a defined patch strategy for &lt;code&gt;imagePullSecrets&lt;/code&gt; on service accounts.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Request routing for horizontally scaled services</title>
      <dc:creator>Iain McGinniss</dc:creator>
      <pubDate>Sun, 08 Aug 2021 21:24:16 +0000</pubDate>
      <link>https://dev.to/iainmcgin/request-routing-for-horizontally-scaled-services-5c3f</link>
      <guid>https://dev.to/iainmcgin/request-routing-for-horizontally-scaled-services-5c3f</guid>
      <description>&lt;p&gt;Networked systems engineering is a fundamental aspect of modern software engineering. The double-edged sword of internet-connected services is the opportunity for your service to be utilized by anyone (growth! impact! profit!), but success can result in extremely unpredictable load spikes and overall growth in resource requirements to keep things running smoothly. First, we shall discuss the options for handling variable and increasing load, and then focus on how we can effectively route requests across a &lt;em&gt;horizontally scaled&lt;/em&gt; service (a term we shall define momentarily). Through exploring this topic, we will also touch on some more advanced tools and strategies, such as API gateways and service meshes. I hope you enjoy the journey, and that it helps you make some informed choices in your next system design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vertical and horizontal scaling
&lt;/h2&gt;

&lt;p&gt;In the early stages of your service's life, &lt;em&gt;over-provisioning&lt;/em&gt; is the simplest strategy to handle variable load. You estimate the peak load based on some service-specific characteristics and a wet finger held to the breeze, and ensure that we have sufficient capacity to handle that peak load. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fvert_service_initial.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fvert_service_initial.jpg" alt="in the happy early days of your service, users generate a manageable amount of load on your service, and your server still has some unused resources that can be claimed by the service as needed" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If an unexpected spike in load arrives, or you just can't keep up with demand, bad things will happen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fvert_service_overloaded.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fvert_service_overloaded.jpg" alt="more users arrive, forming an unexpected ravenous mob. Your service now requires more resources to operate than the poor server can supply, and it begins to fail" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As this estimated peak grows over time, one strategy to keep up is to &lt;em&gt;vertically scale&lt;/em&gt;, which entails the ability to use physical or virtual machines with more resources, such as more compute cores, memory, persistent storage space, or network bandwidth. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fvert_service_rescaled.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fvert_service_rescaled.jpg" alt="the server is replaced by a new one with more compute resources, and you can now satiate the desires of the mob, with some additional headroom for future growth" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To make use of these additional resources, the service must typically make use of a variety of techniques, such as &lt;a href="https://en.wikipedia.org/wiki/Fork_(system_call)" rel="noopener noreferrer"&gt;process forking&lt;/a&gt; or &lt;a href="https://en.wikipedia.org/wiki/Multithreading_(computer_architecture)" rel="noopener noreferrer"&gt;multi-threading&lt;/a&gt;, bigger in-memory caches, and RAID configurations to increase disk I/O throughput and bandwidth.&lt;/p&gt;

&lt;p&gt;The key distinction with vertical scaling is that the service can handle additional load &lt;em&gt;without&lt;/em&gt; the need to spread across multiple computers - it can maximize utilization of available resources on a single computer. A service that &lt;em&gt;horizontally scales&lt;/em&gt;, in contrast, utilizes additional computers and a network to distribute additional load. This involves a very different implementation strategy, and raises an important question: how should requests distributed across a dynamic set of instances?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fhoriz_service.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fhoriz_service.jpg" alt="with a horizontally scaled service, requests from users are distributed across multiple instances of the service - but how should these requests be distributed?" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most services written before the era of cloud computing utilized vertical scaling, as this was typically the only viable option - provisioning of resources to support horizontal scaling was not economically viable for organizations with small, on-premise data centers. It could take weeks to provision and install new hardware for use by a service, so system architects had to plan ahead, and building services to vertically scale on big, over-provisioned servers was simpler. A client I worked with many years ago utilized huge IBM mainframes that cost millions of dollars to provision and install into their on-premise data center, all because their service was monolithic and unable to horizontally scale. The client was pushing the limits of hardware that could be managed as a single machine, and once that path was exhausted, there would be no choice but to re-architect their system to horizontally scale.&lt;/p&gt;

&lt;p&gt;Cloud computing providers support vertically scaling services through infrastructure-as-a-service (IaaS) products that offer a variety of virtual machine sizes, from cheap single-core and low-memory instances (e.g. Amazon's &lt;code&gt;t2.nano&lt;/code&gt; instance type with 1 vCPU and 0.5GB of RAM), all the way up to monstrous instances with hundreds of cores and terabytes of memory (e.g. Google's &lt;code&gt;m2-ultramem-416&lt;/code&gt; instance type, with 416 vCPUs and 11.7TB of RAM). You will, of course, pay a steep price for such vertical scaling capability - pricing increases are linear in vCPU and memory to a point, then become bespoke and negotiated when you reach truly specialized hardware. The &lt;code&gt;m2-ultramem-416&lt;/code&gt; instance costs $50.91 &lt;em&gt;per hour&lt;/em&gt; with a one year reservation (~$438k/yr), whereas a more typical &lt;code&gt;n2-standard-16&lt;/code&gt; instance with 16 vCPU and 64GB of RAM costs $0.79 per hour (~$7k/yr). If a service can horizontally scale, and more efficiently follow load, your maximum cost of using commodity instances like &lt;code&gt;n2-standard-16&lt;/code&gt; will often be an order of magnitude lower.&lt;/p&gt;

&lt;h1&gt;
  
  
  Freedom through constraint - PaaS and FaaS
&lt;/h1&gt;

&lt;p&gt;Cloud computing also introduced other ways to think about service development, via platform-as-a-service (PaaS) or function-as-a-service (FaaS) offerings, e.g. &lt;a href="https://cloud.google.com/appengine" rel="noopener noreferrer"&gt;Google App Engine&lt;/a&gt;, &lt;a href="https://aws.amazon.com/elasticbeanstalk/" rel="noopener noreferrer"&gt;AWS Elastic Beanstalk&lt;/a&gt;, &lt;a href="https://aws.amazon.com/lambda/" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt;, &lt;a href="https://azure.microsoft.com/en-us/services/functions/#overview" rel="noopener noreferrer"&gt;Azure Functions&lt;/a&gt;, and many others. With PaaS/FaaS systems, the compute infrastructure running your service ceases to be your concern, allowing you to focus on the higher level semantics of your service. From the developer's perspective, there is no "server", or alternatively, just one logical server with theoretically unlimited scaling. In reality, the limitations imposed on how a service is implemented by these technologies ensures that your service &lt;em&gt;horizontally scales&lt;/em&gt; across multiple instances, in a way that is managed by the cloud provider. The restrictions may also allow for &lt;em&gt;multi-tenancy&lt;/em&gt;, where multiple services (potentially even from multiple customers) can run on the same hardware at the same time, yielding resource utilization improvements and cost savings for the cloud provider, and maybe even for you.&lt;/p&gt;

&lt;p&gt;The loss of implementation freedom from using a PaaS or FaaS framework may not be acceptable for all services, certainly not without a broad re-think of how the service is implemented. Many organizations will instead choose to stick with vertical scaling utilizing the instance types made available by cloud providers, for as long as possible. With enough growth, a service will inevitably hit a point where it cannot utilize bigger servers effectively, or that bigger servers are just not available.&lt;/p&gt;

&lt;p&gt;Implementing services to horizontally scale on dynamically provisioned IaaS cloud resources is the middle ground that many organizations choose. This provides them with more direct control over when and how the system scales, but comes with a significant complexity cost. If your organization is using Docker Swarm or Kubernetes, you are likely self-managing horizontally scaled services, and may be immersed in the overhead and complexity of doing this safely and effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other good reasons to horizontally scale
&lt;/h2&gt;

&lt;p&gt;Building a service to support some degree of horizontal scaling is a very complex topic in its own right. However, it is increasingly becoming a requirement of contemporary software engineering, for good reasons beyond just scalability. Horizontal scaling can also support reliability, maintainability, and efficiency goals.&lt;/p&gt;

&lt;p&gt;Rather than having a single point of failure in our single entry point to the service, we can utilize multiple servers and work to ensure fail-over between servers is transparent. This fail-over can be "active-passive", where a single server is still responsible for all traffic, but when it fails we have a "warm" backup server that is available to take over within minutes. This type of configuration is common with traditional relational databases such as MySQL. Even better, an "active-active" configuration means that all servers are "hot" and capable of handling requests in short order, supporting recovery in under 30 seconds, and often &lt;em&gt;immediately&lt;/em&gt;. In an active-active configuration, it is also possible to send requests to any server, and have the set of servers behave as a predictable, single "logical" service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Factive_failover.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Factive_failover.jpg" alt="with an active-active configuration, requests are sent to all instances, and when a server dies, clients can simply reconnect to another instance and recover" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Aside from this reliability and recovery advantage, the ability to have multiple active-active servers and replace them at will also facilitates transparent maintenance. Rather than requiring downtime to roll out new releases of our service, as was required when replacing the deployment of a service on fixed hardware, we can instead introduce some new servers with a new version and migrate requests from the old set to the new set. We have flexibility in how this can be done, allowing for clever rollout strategies such as &lt;a href="https://martinfowler.com/bliki/CanaryRelease.html" rel="noopener noreferrer"&gt;canaries&lt;/a&gt;, where we send a small percentage of traffic to the new service and ensure it performs acceptably before proceeding to a full rollout. Related to this, a &lt;a href="https://martinfowler.com/bliki/BlueGreenDeployment.html" rel="noopener noreferrer"&gt;blue-green&lt;/a&gt; deployment allows us to incrementally move from the old release to the new release, while retaining the ability to roll back quickly if any undesirable behavior is detected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fblue_green.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fblue_green.jpg" alt="with an blue-green deployment, we can gradually migrate clients from the " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, by utilizing multiple smaller machines, the capacity we provision can more closely align to the actual load our service is experiencing at any point in time, resulting in more efficient utilization of resources. For services with a day-night load cycle (i.e. your service is interactive, and all your customers are within a narrow band of time zones, so you see significantly less load at night than during the day) you then have the opportunity to scale up and down periodically, potentially saving a significant amount of money compared to over-provisioning to be capable of handling your estimated peak load at all times. This type of dynamic scaling is a huge advantage of cloud infrastructure, and can also be automated by utilizing real-time aggregate metrics (e.g. CPU usage, network throughput, etc.) to decide how many instances should be active at any given time.&lt;/p&gt;

&lt;p&gt;So, how can we implement a service to horizontally scale? This question is so system-dependent that a succinct answer cannot be provided that covers all cases. One broad exception is in the case of "stateless" services - those which handle requests in an isolated, predictable way, with no side effects that are &lt;em&gt;local&lt;/em&gt; to the service. A typical stateless service will utilize a data store with &lt;a href="https://en.wikipedia.org/wiki/ACID" rel="noopener noreferrer"&gt;ACID properties&lt;/a&gt;, and will process requests based entirely on the contents of the request and manipulations of that data store. This service can be horizontally scaled through simple replication - the number of instances required is typically linearly dependent on request throughput. This attractive characteristic is why so much emphasis is placed on utilizing stateless services wherever possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  DNS and client-side load balancing
&lt;/h2&gt;

&lt;p&gt;So, you have a service that can be horizontally scaled, meaning that there are multiple service instances available to process requests. How can we effectively and evenly direct requests from clients to service instances?&lt;/p&gt;

&lt;p&gt;At the simplest level, DNS records can map the canonical name for a service to multiple IP addresses of service instances that implement that service. This basic abstraction allows clients to "find" the service using a durable identifier, while allowing the service maintainer to change the set of instances handling requests over time, as needed.&lt;/p&gt;

&lt;p&gt;A DNS record is a very flexible way to map abstract to real  - it can map a name to multiple addresses (A/AAAA records), or map to another DNS record (CNAME records). Combined with a reasonable "time to live" (TTL) value for the record, we have a reliable mechanism to propagate changes of our abstract-to-real mapping to users of a service in an efficient way.&lt;/p&gt;

&lt;p&gt;A client can choose randomly between the available addresses - in aggregate, this will evenly distributed clients across server IP addresses. This is referred to as client side load balancing, where clients possess sufficient intelligence to satisfy our goals, either through coordination (e.g. in &lt;a href="https://cloud.google.com/traffic-director/docs/proxyless-overview" rel="noopener noreferrer"&gt;gRPC Traffic Director&lt;/a&gt;) or as an emergent property of independent behavior in aggregate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fdns_client_lb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Fdns_client_lb.jpg" alt="clients perform a DNS lookup for the service address, then randomly select between the available addresses" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the ability to add and remove IP addresses from the DNS mapping as needed, client side load balancing can support both our scalability and efficiency goals. If we can create new instances of our service at will, we can dynamically auto-scale, and escape the limitations of vertical scaling (big, over-provisioned servers) in favor horizontal scaling (small, cheap, easily replaced servers).&lt;/p&gt;

&lt;p&gt;Client side load balancing can work well if clients are &lt;em&gt;uniform&lt;/em&gt;, meaning that the behavior of each client is roughly equivalent in terms of the demands they place on the system. This is often not the case, however - clients with a 1Gbps fiber connection can place significantly more load on file servers than those with a mediocre cellular connection, for instance. The types of requests that clients make may also result in significant variability in load, dependent upon the data associated to that user's requests. So, some careful evaluation must be made of whether client side load balancing will work for your service or not. In gRPC and other client stacks, the client may attempt to self-distribute the load of the requests they generate by opening multiple connections to different server instances, and perform client-side round-robin distribution of requests across these servers. Even this can still be problematic, as if the client only opens a small number of connections (typically, three), this may still impose their load on a small subset of all available server instances.&lt;/p&gt;

&lt;p&gt;There are also other limitations to this DNS-based approach to solving our reliability, scalability and efficiency goals. Clients must do what we desire and expect - this is fine when we also control the client, but can problematic when interacting with third party controlled software like web browsers or clients implemented by other teams or organizations. Changes to our DNS records propagate erratically, depending on which DNS service our clients use. If we use a TTL of 1 minute, we can expect many (perhaps most) of our clients to see the change within a minute, but some may take significantly longer, due to configuration details of infrastructure that may be completely out of your control. There are also some practical limits to how many IP addresses we can reference with our DNS records; managing hundreds of addresses per name may be feasible, but thousands or more is unrealistic - DNS servers do not guarantee to respond with all mappings for a name. When using UDP for DNS lookups, we are limited to what can fit in a packet. When using TCP, a DNS server may limit the number of responses to prevent slow-down for other clients.&lt;/p&gt;

&lt;h2&gt;
  
  
  Service-side load balancing
&lt;/h2&gt;

&lt;p&gt;So, if we cannot rely on DNS and client-side behavior for fast, reliable changes to our service routing, what else can we do? Like most problems in computing science, we can add a layer of indirection! Load balancers provide a more adaptable approach - point your DNS records at a TCP or HTTP &lt;em&gt;load balancer&lt;/em&gt;, then manage a more dynamic "target set" behind that load balancer. This leaves the DNS records in a much more static configuration, while giving you immediate and localized control over where requests are routed to behind that load balancer. Even if you're not using the load balancer for auto-scaling of your service, this is still a very effective tool for handling rolling restarts of services and general maintenance of your service, without worrying about client-side effects.&lt;/p&gt;

&lt;p&gt;Load balancers often only handle traffic at layer 3 or 4 in the &lt;a href="https://en.wikipedia.org/wiki/OSI_model" rel="noopener noreferrer"&gt;OSI model&lt;/a&gt; - that is, they are packet- or connection-oriented, evenly distributing traffic across the target set at the &lt;a href="https://en.wikipedia.org/wiki/Internet_Protocol" rel="noopener noreferrer"&gt;IP&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol" rel="noopener noreferrer"&gt;UDP&lt;/a&gt;, or &lt;a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol" rel="noopener noreferrer"&gt;TCP&lt;/a&gt; levels. An inbound packet or connection arrives, and the load balancer decides which service to forward that to. This can be a simple strategy such as round-robin distribution, or a more weighted load balancing strategy can be used based on metrics reported from service instances.&lt;/p&gt;

&lt;p&gt;L3/L4 load balancing was fine for most systems prior to the advent of HTTP/2, as HTTP/1.1 and other common internet application protocols are connection oriented. Requests are serviced serially on each connection, and each connection belongs to just one client. To achieve more concurrency in request handling, more connections were used. This is ultimately wasteful of bandwidth, with significantly more packets required for maintenance of TCP connection state. It also results in higher average and P99 request processing latency, and can exhaust the operating system's connection handling resources. In an environment where connections are created by clients over the internet (e.g. browsers), these relatively low throughput but highly variable connections can place an uneven load on the servers behind the load balancer, despite your best efforts.&lt;/p&gt;

&lt;p&gt;Protocols such as HTTP/1.1 are effectively stateless, meaning that each request carries everything required to process it, and there is no explicit relationship between requests or expectation that requests must be processed in the order they are received. This opens the possibility of decoupling the set of connections into a load balancer from the set of connections out of the load balancer. Thousands of low-throughput inbound connections can be transformed into a much smaller number of high-throughput connections to the service instances, or alternatively, we can ensure that each request from the load balancer to a server uses its own connection, to prevent head-of-line blocking of request processing.&lt;/p&gt;

&lt;p&gt;When a load balancer is capable of doing this type of request-level processing, we typically classify it as a layer 7 (aka. application layer) load balancer. Requests from multiple clients with separate connections may be multiplexed onto a single connection to a server. When using a protocol such as HTTP/2, which allows for requests to be processed and responded to out-of-order, this can result in a significant decrease in connection maintenance waste, or conversely much higher utilization of available resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reverse proxies
&lt;/h2&gt;

&lt;p&gt;Once you recognize the utility of a layer 7 load balancer in your architecture, many other possibilities become apparent. As HTTP requests carry information on the client-side view of the service intended to process a request (via the Host header), we can potentially use a single load balancer for multiple logical services. Going further, we could inspect the path, request method, query parameters, or perhaps even the body in making request routing decisions. With these features, we now have what many would call a &lt;em&gt;reverse proxy&lt;/em&gt; -  a service with a flexible configuration language that allows for more sophisticated routing decisions than just distributing requests blindly across a set of servers known to the load balancer by their IP addresses only.&lt;/p&gt;

&lt;p&gt;One of the most commonly used open source reverse proxies is &lt;a href="https://www.nginx.com/" rel="noopener noreferrer"&gt;nginx&lt;/a&gt;, though cloud vendors also typically provide their own managed options, such as &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html" rel="noopener noreferrer"&gt;AWS Application Load Balancer&lt;/a&gt;, &lt;a href="https://cloud.google.com/load-balancing" rel="noopener noreferrer"&gt;Google Cloud Load Balancing&lt;/a&gt;, etc.&lt;/p&gt;

&lt;p&gt;These systems typically allow for dynamic configuration of the reverse proxy through an API or a JSON/TOML/YAML configuration language, allowing the routing rules to be changed without any disruption to the currently active request processing. Reverse proxies are typically have higher resource requirements and overhead than L3/L4 load balancers, but are still highly optimized and capable of handling upwards of 10k requests per second per instance on commodity hardware, and are also usually horizontally scalable to hundreds of instances and millions of requests per second. This meta-scaling problem when dealing with millions of requests per second is usually handled by having multiple L4 load balancers directing connections to the reverse proxies, which in turn are directing requests to your service instances.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Flb_proxy_scaling.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.iainmcgin.co.uk%2Farticle-res%2Frequest-routing%2Flb_proxy_scaling.jpg" alt="requests are routed from clients to the L4 load balancer, which divides them across the reverse proxy instances, which in turn divide them across the service instances according to their L7 rules" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With a reverse proxy, we can start to implement some more sophisticated request routing patterns, such as routing based on request &lt;em&gt;type&lt;/em&gt;. With browser-based web applications we must often define all our request endpoints on the same domain, in order to comply with the web application model where cookies and related security controls will not permit certain types of cross-domain requests. Our reverse proxy can maintain this illusion for the front-end, while splitting requests of different types, typically distinguished by path, to different upstream services that handle those requests. For example, our GraphQL API endpoint (maybe "/api/graphql" from the client perspective) may be serviced by an &lt;a href="https://www.apollographql.com/docs/federation/gateway/" rel="noopener noreferrer"&gt;Apollo Gateway&lt;/a&gt;, while other API endpoints (&lt;code&gt;^\/api\/(?!graphql).*$&lt;/code&gt;) might be handled by a service we implement, and everything else (&lt;code&gt;^\/(?!api\/).*$&lt;/code&gt;) is handled by a static content server.&lt;/p&gt;

&lt;p&gt;Deeper request inspection could also allow us to do things like route &lt;em&gt;expensive&lt;/em&gt; requests to a separate set of servers, so that we can independently manage the auto-scaling for that request type from other requests. This can also be an effective tool to ensure that these potentially problematic requests do not impact the performance expectations of the other requests; if they are all handled by the same pool of servers, expensive requests may impact the latency and jitter of others through resource contention.&lt;/p&gt;

&lt;h2&gt;
  
  
  API gateways
&lt;/h2&gt;

&lt;p&gt;So far, we have discussed request routing middleware that is primarily tasked with routing requests efficiently across our service instances, but otherwise does not concern itself with the implementation details of how responses are formulated for requests. However, once we introduce middleware that is inspecting the contents of requests as part of routing decisions, is it not a great conceptual leap from there to middleware that is also responsible for some common request processing tasks. For example, the middleware could be responsible for ensuring that requests carry valid authentication information, such as valid cookies or request signatures. The middleware could also perform tasks such as content encoding transformations, changing uncompressed responses to &lt;a href="https://en.wikipedia.org/wiki/Brotli" rel="noopener noreferrer"&gt;Brotli&lt;/a&gt; compressed responses, resulting in lower bandwidth utilization when communicating with browser clients. Conceptually, &lt;em&gt;any&lt;/em&gt; in-line transformation of requests or responses could be handled by the middleware. I refer to request routing middleware with this capability as an &lt;em&gt;API gateway&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;API gateways go beyond the capabilities of reverse proxies by providing an extension mechanism that allows for custom code to be executed as part of the request processing pipeline. &lt;a href="https://traefik.io/" rel="noopener noreferrer"&gt;Traefik Proxy&lt;/a&gt; is a good example of this, as it has a plugin mechanism that allows for custom Go code to influence request processing decisions. &lt;a href="https://konghq.com/" rel="noopener noreferrer"&gt;Kong&lt;/a&gt; also deserves a mention here, with the ability to write plugins in Lua, or integrate with external binaries written in practically any language. Most available API gateways provide a set of standard request processing plugins to handle authentication, rate limiting, content type transforms, and so on. In general, they provide a useful way to enforce some consistent request processing standards across all services, that can be implemented in one place, rather than requiring re-implementation across multiple services - particularly if those services are implemented in different languages.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;The myriad of request processing middlewares does not end here - there is also the very trendy topic of &lt;em&gt;service meshes&lt;/em&gt; that we could cover, but I choose to leave that as an exercise to interested readers, as it is a rapidly evolving and complex space (see: &lt;a href="https://istio.io/" rel="noopener noreferrer"&gt;Istio&lt;/a&gt;, &lt;a href="https://linkerd.io/" rel="noopener noreferrer"&gt;linkerd&lt;/a&gt;, &lt;a href="https://www.consul.io/" rel="noopener noreferrer"&gt;Consul&lt;/a&gt;, &lt;a href="https://www.vmware.com/products/tanzu-service-mesh.html" rel="noopener noreferrer"&gt;Tanzu&lt;/a&gt;, etc).&lt;/p&gt;

&lt;p&gt;So, what should you use in your own architecture? If you are writing something from scratch, I would strongly recommend looking at PaaS/FaaS options to avoid all of this complexity for as long as possible - the less time you have to spending thinking auto-scaling and request processing, the more time you have to build out the value-providing aspects of your service. If you maintain existing services that are incompatible with a PaaS/FaaS approach, your cloud provider's managed load balancer / reverse proxy is likely the most straightforward option to use. If you find that you need a little more flexibility, an API gateway such as Traefik or Kong can be excellent option; just be prepared to have to think much more deeply about the network layer of your application. &lt;/p&gt;

</description>
      <category>networking</category>
      <category>scaling</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Debugging protocol buffer compilation</title>
      <dc:creator>Iain McGinniss</dc:creator>
      <pubDate>Fri, 08 May 2020 21:46:44 +0000</pubDate>
      <link>https://dev.to/iainmcgin/debugging-protocol-buffer-compilation-jd1</link>
      <guid>https://dev.to/iainmcgin/debugging-protocol-buffer-compilation-jd1</guid>
      <description>&lt;p&gt;In some recent work I have been trying to generate &lt;a href="https://kotlinlang.org/" rel="noopener noreferrer"&gt;Kotlin&lt;/a&gt; extensions to the standard Java code that is generated by the &lt;a href="https://developers.google.com/protocol-buffers" rel="noopener noreferrer"&gt;protocol buffer&lt;/a&gt; compiler, using the excellent &lt;a href="https://github.com/marcoferrer/kroto-plus" rel="noopener noreferrer"&gt;kroto-plus plugin&lt;/a&gt;. For those who have not had the pleasure of working with the protocol buffer compiler, &lt;code&gt;protoc&lt;/code&gt;, it can be a frustratingly opaque tool to work with. In the process of attempting to understand why compilation was not working as expected, I ended up learning a lot more about how the compiler works.&lt;/p&gt;

&lt;p&gt;My problem arose when attempting to move my protocol buffer compilation out of the safe confines of my own project into a shared definition repository that my employer uses, so that I could generate client bindings and server stubs for multiple languages. In making the move, all of a sudden the expected Kotlin extension code was not being generated &lt;em&gt;at all&lt;/em&gt;, with no error messages or warnings. The mistake turned out to be trivial, but getting to a position where I could &lt;em&gt;identify&lt;/em&gt; the mistake was frustrating.&lt;/p&gt;

&lt;h2&gt;
  
  
  Invoking protoc
&lt;/h2&gt;

&lt;p&gt;When working with a single language, there are many tools that wrap &lt;code&gt;protoc&lt;/code&gt;, such that you never have to understand its interface. For my Kotlin project, the &lt;a href="https://github.com/google/protobuf-gradle-plugin" rel="noopener noreferrer"&gt;Gradle protobuf plugin &lt;/a&gt; follows the typical Gradle idiom: place proto files in a standard directory, specify what plugins to use, and voilà, you have generated code. However, in a polyglot environment, you often have to go a little deeper and at least understand how to invoke protoc directly.&lt;/p&gt;

&lt;p&gt;An invocation can look something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;protoc &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-I&lt;/span&gt; /opt/include &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-I&lt;/span&gt; /path/to/project/protos &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--java_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/path/to/project/gen/java &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--grpc_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/path/to/project/gen/java &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--plugin&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;protoc-gen-grpc&lt;span class="o"&gt;=&lt;/span&gt;/usr/local/bin/protoc-gen-grpc-java &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--kroto_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;ConfigPath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/path/to/project/kroto.yml:/path/to/project/gen/java &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--plugin&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;protoc-gen-kroto&lt;span class="o"&gt;=&lt;/span&gt;/usr/local/bin/protoc-gen-kroto-plus &lt;span class="se"&gt;\&lt;/span&gt;
    /path/to/project/protos/service.proto &lt;span class="se"&gt;\&lt;/span&gt;
    /path/to/project/protos/internals.proto &lt;span class="se"&gt;\&lt;/span&gt;
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Breaking this down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;-I&lt;/code&gt; flag specifies an "include" directory, where protobuf files that are imported can be found. There is no module system to speak of for protobuf compilation, so there are just some loose conventions around namespacing using package names that correspond to directory structures. For the "well known" types like &lt;a href="https://developers.google.com/protocol-buffers/docs/proto3#any" rel="noopener noreferrer"&gt;google.protobuf.Any&lt;/a&gt;, these will typically be sourced from some common include path - in the example above, &lt;code&gt;/opt/include&lt;/code&gt;, which exists within a docker container I'm using. Multiple import paths can be specified, and imports are searched for in all the specified include directories, using relative paths. For those familiar with traditional C compilers, this is a very common pattern for compilation in the era before versioned package management.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;--java_out=&lt;/code&gt; flag specifies two things: that we want to generate Java code, and where we want that generated code to go. Java code generation is a built-in feature of &lt;code&gt;protoc&lt;/code&gt;, so this is all that's required in this case. The built-in generators for protoc are &lt;code&gt;cpp&lt;/code&gt;, &lt;code&gt;csharp&lt;/code&gt;, &lt;code&gt;java&lt;/code&gt;, &lt;code&gt;js&lt;/code&gt;, &lt;code&gt;objc&lt;/code&gt;, &lt;code&gt;php&lt;/code&gt;, &lt;code&gt;python&lt;/code&gt; and &lt;code&gt;ruby&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;--grpc_out=&lt;/code&gt; flag similarly specifies that we want to generate "grpc", and where to generate to. However, what does "grpc" mean here, as it is not a built-in generator type? By default, the compiler will look for a &lt;em&gt;plugin&lt;/em&gt; on the &lt;code&gt;PATH&lt;/code&gt;, with name &lt;code&gt;protoc-gen-grpc&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;--plugin=protoc-gen-grpc=...&lt;/code&gt; flag explicitly tells the compiler where to find the plugin executable for "grpc". In this case, we're pointing it to a version of &lt;a href="https://github.com/grpc/grpc-java/tree/master/compiler" rel="noopener noreferrer"&gt;protoc-gen-grpc-java&lt;/a&gt;. Effectively, we aliased "grpc-java" to "grpc"; we could have instead specified a "--grpc-java_out=" flag without specifying the explicit plugin reference, as long as &lt;code&gt;protoc-gen-grpc-java&lt;/code&gt; could be found on the &lt;code&gt;PATH&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Observant readers will notice something slightly different about the flag specified for &lt;code&gt;kroto&lt;/code&gt;: it embeds a &lt;em&gt;parameter&lt;/em&gt; to be passed to the plugin. The compiler's awkward syntax for doing this is to allow embedding a parameter with syntax &lt;code&gt;--gen_out=param:/gen/path&lt;/code&gt;. Only a &lt;a href="https://github.com/protocolbuffers/protobuf/blob/c781df3d212c0d30a072ed98de00d1ba2fea22b9/src/google/protobuf/compiler/plugin.proto#L75" rel="noopener noreferrer"&gt;single string  parameter&lt;/a&gt; can be specified, but the full string between the first &lt;code&gt;=&lt;/code&gt; and the &lt;code&gt;:&lt;/code&gt; is treated as that parameter value. I have seen plugins use various conventions here to allow specifying multiple params, like &lt;code&gt;key1=value1,key2=value2,...,keyN=valueN&lt;/code&gt;. Some instead use the parameter to point to an external configuration file, which is what the &lt;a href="https://github.com/marcoferrer/kroto-plus" rel="noopener noreferrer"&gt;kroto-plus plugin&lt;/a&gt; does.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, a list of proto files to be parsed and sent to the generators is provided.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While you &lt;em&gt;can&lt;/em&gt; use relative paths rather than absolute paths when invoking &lt;code&gt;protoc&lt;/code&gt;, I have stumbled over problems with mixing relative paths and import directives. I find it helps to keep me sane to use absolute paths when working with &lt;code&gt;protoc&lt;/code&gt;, so it can be very clearly determined where everything is coming from, irrespective of the current working directory.&lt;/p&gt;

&lt;p&gt;Docker containers like those &lt;a href="https://github.com/namely/docker-protoc" rel="noopener noreferrer"&gt;provided by Namely&lt;/a&gt; try to help out in a polyglot environment by hiding some of the details of protoc and plugin invocation behind a more uniform contract. I recommend trying these out to see if they fit your needs before implementing your own solution, but I have found that a basic understanding of the protocol buffer compiler and plugins is essential to success.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are plugins, really?
&lt;/h2&gt;

&lt;p&gt;Protobuf compiler plugins are standalone executables that interpret &lt;a href="https://github.com/protocolbuffers/protobuf/blob/c781df3d212c0d30a072ed98de00d1ba2fea22b9/src/google/protobuf/compiler/plugin.proto#L68" rel="noopener noreferrer"&gt;CodeGeneratorRequest&lt;/a&gt; protobufs from stdin, and produce &lt;a href="https://github.com/protocolbuffers/protobuf/blob/c781df3d212c0d30a072ed98de00d1ba2fea22b9/src/google/protobuf/compiler/plugin.proto#L99" rel="noopener noreferrer"&gt;CodeGeneratorResponse&lt;/a&gt; protobufs to stdout. The main protobuf compiler executable produces these requests, embedding a set of &lt;a href="https://github.com/protocolbuffers/protobuf/blob/c781df3d212c0d30a072ed98de00d1ba2fea22b9/java/compatibility_tests/v2.5.0/more_protos/src/proto/google/protobuf/descriptor.proto#L56" rel="noopener noreferrer"&gt;FileDescriptorProto&lt;/a&gt; instances for the parsed proto files. The response from the plugins embeds instructions on source files to be generated and their contents.&lt;/p&gt;

&lt;p&gt;Plugins can therefore be implemented using any technology that can serialize and deserialize protobufs. Some are implemented in C++, some in Java, some in Go. It's a very flexible system, if a rather opaque one from the user's perspective when attempting to diagnose a problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intercepting plugin requests and responses
&lt;/h2&gt;

&lt;p&gt;As plugins just need to be something that the &lt;code&gt;protoc&lt;/code&gt; process can invoke and interact with using stdin and stdout, we can wrap virtually any plugin in a shell script to see what is being provided and returned, using &lt;code&gt;tee&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/sh&lt;/span&gt;
&lt;span class="nb"&gt;tee&lt;/span&gt; /tmp/input.pb.bin | /usr/local/bin/kroto-plus | &lt;span class="nb"&gt;tee&lt;/span&gt; /tmp/output.pb.bin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While these files are binary encoded protobufs, they are dominated by text content, as you will see if you open them in a text editor. However, the &lt;code&gt;protoc&lt;/code&gt; binary can also decode binary protobufs to its "text proto" format. If we have a clone of the &lt;a href="https://github.com/protocolbuffers/protobuf" rel="noopener noreferrer"&gt;protobuf repo&lt;/a&gt; in &lt;code&gt;~/protobuf&lt;/code&gt;, we can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;protoc &lt;span class="nt"&gt;--decode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;google.protobuf.compiler.CodeGeneratorRequest &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-I&lt;/span&gt; ~/protobuf/src ~/protobuf/src/google/protobuf/compiler/plugin.proto &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt; /tmp/input.pb.bin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will output the text format of the proto to stdout, making the contents a little easier to read. Similarly, you can do this for the output of the plugin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;protoc &lt;span class="nt"&gt;--decode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;google.protobuf.compiler.CodeGeneratorResponse &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-I&lt;/span&gt; ~/protobuf/src ~/protobuf/src/google/protobuf/compiler/plugin.proto &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt; /tmp/output.pb.bin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How did this help me?
&lt;/h2&gt;

&lt;p&gt;A mentioned earlier, when attempting to use the kroto-plus plugin manually, it was not producing any kotlin output. This was weird, as it was producing kotlin output fine in my separate Gradle-based build environment.&lt;/p&gt;

&lt;p&gt;I couldn't see what I was doing wrong: I was using the same version of &lt;code&gt;protoc&lt;/code&gt;, the same plugins, and the same source files, though moved around to fit the location conventions in my docker build container. I scrutinized the paths and everything looked correct, but I missed one small detail.&lt;/p&gt;

&lt;p&gt;The kroto-plus plugin, as mentioned earlier, requires a parameter to be passed of form &lt;code&gt;ConfigPath=/path/to/config&lt;/code&gt;. I had transcribed this incorrectly as &lt;code&gt;ConfigFile=/path/to/config&lt;/code&gt; - that four character difference caused all of my problems. I had expected a mistake like this would cause an error, as I had seen errors emitted by the kroto-plus plugin before when pointed at an invalid path for the configuration. However, with an incorrect property name rather than an incorrect path, the plugin does &lt;em&gt;nothing&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I was able to see the difference with the help of my little interception script: by recording the input to the plugin in the working environment and the broken environment, and then performing a diff, the mistake becomes readily apparent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; diff in_working.proto.txt in_broken.proto.txt
23c23
&amp;lt; parameter: "ConfigPath=/path/to/kroto-config.yml"
---
&amp;gt; parameter: "ConfigFile=/path/to/kroto-config.yml"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After making a dent in my desk with my face, the fix was trivial, and the expected Kotlin output emerged as expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;protoc&lt;/code&gt; tool is mysterious, and in many respects, poorly documented. During my time at Google using the internal version of &lt;a href="https://bazel.build/" rel="noopener noreferrer"&gt;Bazel&lt;/a&gt;, all the details of correctly compiling protocol buffers to usable code were hidden under several layers of abstraction. For those of us now outside the &lt;a href="https://www.quora.com/Why-is-Google-also-referred-to-as-the-Chocolate-Factory" rel="noopener noreferrer"&gt;Chocolate Factory&lt;/a&gt;, we are mostly left to fend for ourselves in figuring out how to use this complex tool, or must accept being disintermediated by other tools that may not do what we need. &lt;/p&gt;

&lt;p&gt;The approach presented above can help diagnose more complex problems than just typos: through the ability to observe the full input and output to plugins, differences in compiler versions, input source, paths and annotations can be easily observed.&lt;/p&gt;

&lt;p&gt;Over time, I believe we will build a community knowledge base and consistent patterns for the usage of protobufs and gRPC. Tools like &lt;a href="https://buf.build/" rel="noopener noreferrer"&gt;buf&lt;/a&gt; show promise in this regard, and wrappers like &lt;a href="https://github.com/namely/docker-protoc" rel="noopener noreferrer"&gt;Namely's docker containers&lt;/a&gt; can provide a good reference for using protoc where the documentation is lacking - take a look at their &lt;a href="https://github.com/namely/docker-protoc/blob/master/all/entrypoint.sh" rel="noopener noreferrer"&gt;protoc wrapping script&lt;/a&gt; for a real world usage of protoc for polyglot builds.&lt;/p&gt;

&lt;p&gt;I hope at least one person out there finds this useful. This my first foray into public technical writing in a few years, and it feels good to share what I have learned beyond my immediate colleagues again. If you have any questions, feel free to contact me: &lt;code&gt;iainmcgin-at-gmail-dot-com&lt;/code&gt;.&lt;/p&gt;

</description>
      <category>protobuf</category>
      <category>grpc</category>
      <category>todayilearned</category>
    </item>
  </channel>
</rss>
