<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Harrison Guo</title>
    <description>The latest articles on DEV Community by Harrison Guo (@harrisonsec).</description>
    <link>https://dev.to/harrisonsec</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3809272%2F593698c5-7201-4bb0-898e-055cdbc0a2d2.png</url>
      <title>DEV Community: Harrison Guo</title>
      <link>https://dev.to/harrisonsec</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/harrisonsec"/>
    <language>en</language>
    <item>
      <title>Channels Aren't Message Passing — How Parked Goroutines OOM-Killed a Pod</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Thu, 14 May 2026 05:26:27 +0000</pubDate>
      <link>https://dev.to/harrisonsec/channels-arent-message-passing-how-parked-goroutines-oom-killed-a-pod-4ijf</link>
      <guid>https://dev.to/harrisonsec/channels-arent-message-passing-how-parked-goroutines-oom-killed-a-pod-4ijf</guid>
      <description>&lt;p&gt;It's 3am. The Kafka consumer pod that's been running cleanly for six weeks gets OOM-killed. Kubernetes restarts it. Five minutes later: OOM-killed again. Restart. OOM-killed a third time. By the fourth restart I've shelved the dashboard and started reading &lt;code&gt;runtime/chan.go&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The code that died fit on one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I want to tell you that line is the bug. It isn't. An unbuffered channel will happily backpressure a &lt;em&gt;single&lt;/em&gt; producer — every send rendezvous with a receiver, the producer cannot run ahead. The channel did exactly what it was designed to do.&lt;/p&gt;

&lt;p&gt;What I had built &lt;em&gt;around&lt;/em&gt; it didn't. The Kafka consumer loop wrapped &lt;code&gt;events &amp;lt;- parseEvent(msg)&lt;/code&gt; inside a &lt;code&gt;go func(msg) { ... }(msg)&lt;/code&gt;, spawning a fresh goroutine per inbound message. Every one of those goroutines blocked on send, parked on the channel's &lt;code&gt;sendq&lt;/code&gt; list, and kept its stack and the parsed event alive in memory. The channel was the gravestone. The unbounded &lt;code&gt;go func&lt;/code&gt; fan-out was what filled it.&lt;/p&gt;

&lt;p&gt;This is the story of what a Go channel actually is at the runtime level, why "channels are message passing" is one of the most expensive lies in the Go ecosystem, and why the most common channel bug isn't &lt;em&gt;in&lt;/em&gt; the channel — it's in the code that calls into it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — A Go channel is not a queue and not a message bus. It's a heap-allocated &lt;code&gt;hchan&lt;/code&gt; struct containing a mutex, a ring buffer, and two parked-goroutine lists. The send operation is a &lt;code&gt;memcpy&lt;/code&gt; under a lock, not a transmission. &lt;strong&gt;Channels only deliver backpressure if the producer side is bounded.&lt;/strong&gt; The OOM that started this story came not from &lt;code&gt;make(chan Event)&lt;/code&gt; — that was working as designed — but from an unbounded &lt;code&gt;go func(msg)&lt;/code&gt; fan-out parking thousands of goroutines on &lt;code&gt;sendq&lt;/code&gt;, each retaining a 10KB payload. The fix isn't a buffer size. It's making backpressure part of the producer contract: a single long-lived producer with &lt;code&gt;select&lt;/code&gt;-based backoff, plus a bounded queue as a safety net. The same architectural mistake shows up at every layer where engineers reach for an "in-process queue" — including the inbound queue of your AI agent.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Mental Model That Killed The Pod
&lt;/h2&gt;

&lt;p&gt;Here is what I thought a channel did, and I suspect most Go engineers carry some version of this picture:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A channel is like a Kafka topic in-process. Producers push messages onto it. Consumers pull messages off it. The runtime handles ordering and delivery. It's CSP — Communicating Sequential Processes — Hoare's thing, basically a typed pipe."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every word of that sentence is wrong in a way that matters. There is no topic. Nothing is being pushed anywhere. The runtime is not a broker. The word &lt;em&gt;passing&lt;/em&gt; — borrowed from message-passing concurrency, where independent processes communicate across an isolation boundary — is the most misleading part. In a Go channel, there is no isolation boundary. There is one struct on the heap, and both goroutines reach in and mutate it.&lt;/p&gt;

&lt;p&gt;I held the message-passing model long enough that when the Kafka consumer started ingesting a 12-hour upstream replay at full throttle, I had no instinct that &lt;em&gt;the messages were going somewhere bounded&lt;/em&gt;. They weren't. They were sitting in a ring buffer that I had failed to give a size to.&lt;/p&gt;




&lt;h2&gt;
  
  
  What A Channel Actually Is
&lt;/h2&gt;

&lt;p&gt;Crack open &lt;a href="https://github.com/golang/go/blob/master/src/runtime/chan.go" rel="noopener noreferrer"&gt;runtime/chan.go&lt;/a&gt; in the Go source tree and you'll find this (layout stable since Go 1.7, confirmed against Go 1.21–1.25):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;hchan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;qcount&lt;/span&gt;   &lt;span class="kt"&gt;uint&lt;/span&gt;           &lt;span class="c"&gt;// total data in the queue&lt;/span&gt;
    &lt;span class="n"&gt;dataqsiz&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt;           &lt;span class="c"&gt;// size of the circular queue&lt;/span&gt;
    &lt;span class="n"&gt;buf&lt;/span&gt;      &lt;span class="n"&gt;unsafe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pointer&lt;/span&gt; &lt;span class="c"&gt;// points to an array of dataqsiz elements&lt;/span&gt;
    &lt;span class="n"&gt;elemsize&lt;/span&gt; &lt;span class="kt"&gt;uint16&lt;/span&gt;
    &lt;span class="n"&gt;closed&lt;/span&gt;   &lt;span class="kt"&gt;uint32&lt;/span&gt;
    &lt;span class="n"&gt;elemtype&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;_type&lt;/span&gt;
    &lt;span class="n"&gt;sendx&lt;/span&gt;    &lt;span class="kt"&gt;uint&lt;/span&gt;           &lt;span class="c"&gt;// send index&lt;/span&gt;
    &lt;span class="n"&gt;recvx&lt;/span&gt;    &lt;span class="kt"&gt;uint&lt;/span&gt;           &lt;span class="c"&gt;// receive index&lt;/span&gt;
    &lt;span class="n"&gt;recvq&lt;/span&gt;    &lt;span class="n"&gt;waitq&lt;/span&gt;          &lt;span class="c"&gt;// list of recv waiters&lt;/span&gt;
    &lt;span class="n"&gt;sendq&lt;/span&gt;    &lt;span class="n"&gt;waitq&lt;/span&gt;          &lt;span class="c"&gt;// list of send waiters&lt;/span&gt;
    &lt;span class="n"&gt;lock&lt;/span&gt;     &lt;span class="n"&gt;mutex&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. That's the channel. A struct with a mutex, a pointer to a circular byte array, two indices to track read/write positions in the ring, and two intrusive linked lists holding parked goroutines that are waiting to send or receive.&lt;/p&gt;

&lt;p&gt;When you write &lt;code&gt;ch &amp;lt;- value&lt;/code&gt;, the runtime calls &lt;a href="https://github.com/golang/go/blob/master/src/runtime/chan.go#L160" rel="noopener noreferrer"&gt;&lt;code&gt;chansend&lt;/code&gt;&lt;/a&gt;, which does roughly this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Take the lock&lt;/strong&gt; (&lt;code&gt;lock(&amp;amp;c.lock)&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check &lt;code&gt;recvq&lt;/code&gt;&lt;/strong&gt; — is there a goroutine already parked waiting to receive? If yes, copy &lt;code&gt;value&lt;/code&gt; &lt;em&gt;directly&lt;/em&gt; from the sender's stack into the receiver's stack via &lt;code&gt;sendDirect&lt;/code&gt;, mark the receiver runnable with &lt;code&gt;goready&lt;/code&gt;, release the lock, return. No buffer involved — when a receiver is already waiting, send can hand off directly without ever touching the ring buffer. (In normal operation a buffered channel can't simultaneously have queued data AND parked receivers; if &lt;code&gt;recvq&lt;/code&gt; has a waiter, the buffer is empty.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Otherwise, check buffer space&lt;/strong&gt; — if &lt;code&gt;qcount &amp;lt; dataqsiz&lt;/code&gt;, copy &lt;code&gt;value&lt;/code&gt; into &lt;code&gt;buf[sendx]&lt;/code&gt;, advance &lt;code&gt;sendx&lt;/code&gt;, increment &lt;code&gt;qcount&lt;/code&gt;, release the lock, return.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Otherwise, park the sender&lt;/strong&gt; — append the sender's goroutine to &lt;code&gt;sendq&lt;/code&gt;, release the lock, and call &lt;code&gt;gopark&lt;/code&gt; to suspend execution until a receiver wakes it up.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Receive is the mirror image, calling &lt;code&gt;chanrecv&lt;/code&gt; with &lt;code&gt;sendq&lt;/code&gt; and &lt;code&gt;recvq&lt;/code&gt; swapped.&lt;/p&gt;

&lt;p&gt;Here is the shape of it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggVEQKICAgIHN1YmdyYXBoIFNlbmRlciBbU2VuZGVyIGdvcm91dGluZV0KICAgICAgICBTMVsiY2ggJmx0Oy0gdmFsdWUiXQogICAgZW5kCgogICAgc3ViZ3JhcGggQ2hhbm5lbCBbaGNoYW4gc3RydWN0IG9uIGhlYXBdCiAgICAgICAgTFttdXRleCBsb2NrXQogICAgICAgIEJbInJpbmcgYnVmZmVyPGJyLz5kYXRhcXNpeiBzbG90cyJdCiAgICAgICAgUlFbcmVjdnE6IHBhcmtlZCByZWNlaXZlcnNdCiAgICAgICAgU1Fbc2VuZHE6IHBhcmtlZCBzZW5kZXJzXQogICAgZW5kCgogICAgc3ViZ3JhcGggUmVjZWl2ZXIgW1JlY2VpdmVyIGdvcm91dGluZV0KICAgICAgICBSMVsidiA6PSAmbHQ7LWNoIl0KICAgIGVuZAoKICAgIFMxIC0tPnwiMS4gYWNxdWlyZSBsb2NrInwgTAogICAgTCAtLT58IjIuIHJlY3ZxIGVtcHR5PyJ8IEIKICAgIEwgLS0-fCIyLiByZWN2cSBoYXMgd2FpdGVyInwgUlEKICAgIFJRIC0tPnwiZGlyZWN0IGNvcHksIG5vIGJ1ZmZlciJ8IFIxCiAgICBCIC0tPnwiY29weSB0byBidWYgaWYgc3BhY2UifCBSMQogICAgTCAtLT58ImJ1ZmZlciBmdWxsLCBwYXJrIHNlbmRlciJ8IFNR" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggVEQKICAgIHN1YmdyYXBoIFNlbmRlciBbU2VuZGVyIGdvcm91dGluZV0KICAgICAgICBTMVsiY2ggJmx0Oy0gdmFsdWUiXQogICAgZW5kCgogICAgc3ViZ3JhcGggQ2hhbm5lbCBbaGNoYW4gc3RydWN0IG9uIGhlYXBdCiAgICAgICAgTFttdXRleCBsb2NrXQogICAgICAgIEJbInJpbmcgYnVmZmVyPGJyLz5kYXRhcXNpeiBzbG90cyJdCiAgICAgICAgUlFbcmVjdnE6IHBhcmtlZCByZWNlaXZlcnNdCiAgICAgICAgU1Fbc2VuZHE6IHBhcmtlZCBzZW5kZXJzXQogICAgZW5kCgogICAgc3ViZ3JhcGggUmVjZWl2ZXIgW1JlY2VpdmVyIGdvcm91dGluZV0KICAgICAgICBSMVsidiA6PSAmbHQ7LWNoIl0KICAgIGVuZAoKICAgIFMxIC0tPnwiMS4gYWNxdWlyZSBsb2NrInwgTAogICAgTCAtLT58IjIuIHJlY3ZxIGVtcHR5PyJ8IEIKICAgIEwgLS0-fCIyLiByZWN2cSBoYXMgd2FpdGVyInwgUlEKICAgIFJRIC0tPnwiZGlyZWN0IGNvcHksIG5vIGJ1ZmZlciJ8IFIxCiAgICBCIC0tPnwiY29weSB0byBidWYgaWYgc3BhY2UifCBSMQogICAgTCAtLT58ImJ1ZmZlciBmdWxsLCBwYXJrIHNlbmRlciJ8IFNR" alt="graph TD" width="792" height="628"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three things are worth burning into memory:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One — there is no transport.&lt;/strong&gt; The "message" never leaves the heap. Sender writes bytes; receiver reads bytes; the lock arbitrates. This is shared-memory synchronization with the &lt;em&gt;appearance&lt;/em&gt; of message passing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two — the buffer is just a ring of typed slots.&lt;/strong&gt; &lt;code&gt;dataqsiz&lt;/code&gt; is set exactly once, at &lt;code&gt;make(chan T, N)&lt;/code&gt; time. If you write &lt;code&gt;make(chan T)&lt;/code&gt;, &lt;code&gt;dataqsiz&lt;/code&gt; is zero and there is no buffer at all — every send must rendezvous with a receiver or park.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three — &lt;code&gt;sendq&lt;/code&gt; is unbounded.&lt;/strong&gt; This is the part nobody talks about. The ring buffer has a fixed size. The list of &lt;em&gt;parked senders waiting to write into the ring buffer&lt;/em&gt; does not. If a thousand goroutines all hit a full channel, the runtime parks all thousand of them on &lt;code&gt;sendq&lt;/code&gt; and each one keeps its stack and any data it was about to send alive in memory.&lt;/p&gt;

&lt;p&gt;That third point is what made the OOM I had a different shape from the one I was about to describe.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Incident, Mechanism By Mechanism
&lt;/h2&gt;

&lt;p&gt;The pod that died had a goroutine topology that looked like this — and the bug is &lt;em&gt;not&lt;/em&gt; the &lt;code&gt;make(chan Event)&lt;/code&gt; line. Watch the outer loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Consumer — slow.&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ev&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// ~3ms per event&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}()&lt;/span&gt;

&lt;span class="c"&gt;// THE ACTUAL BUG: outer loop spawns a fresh goroutine per inbound message.&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;parseEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// every blocked send parks on sendq&lt;/span&gt;
    &lt;span class="p"&gt;}(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you replace the inner &lt;code&gt;go func(msg) { ... }(msg)&lt;/code&gt; with a direct &lt;code&gt;events &amp;lt;- parseEvent(msg)&lt;/code&gt;, the outer loop &lt;em&gt;itself&lt;/em&gt; becomes the producer, and the unbuffered channel correctly backpressures it — the loop simply doesn't advance until the consumer is ready. No OOM.&lt;/p&gt;

&lt;p&gt;But because each message is dispatched to a fresh helper goroutine, the outer loop never blocks. It keeps spawning. Each helper goroutine reaches the send, finds no waiting receiver, and parks on &lt;code&gt;sendq&lt;/code&gt;. Now &lt;code&gt;sendq&lt;/code&gt; is the unbounded thing. Here is what actually happened, in order:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Sustained baseline: rendezvous works
&lt;/h3&gt;

&lt;p&gt;At 1K msg/sec inbound and ~3ms per &lt;code&gt;process&lt;/code&gt; call (~333/sec consumer throughput), the consumer is already behind by 3x at steady state. For weeks this didn't OOM because the Kafka client's own internal buffer absorbed the gap, and lag built up on the broker side — visible in Grafana, ignored by me.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Replay: the producer detaches from the consumer's pace
&lt;/h3&gt;

&lt;p&gt;When upstream re-emitted 12 hours of events, the Kafka client's internal pre-fetch buffer filled to capacity (default &lt;code&gt;fetch.message.max.bytes&lt;/code&gt; × partition count = several hundred MB) and started backing up &lt;em&gt;Kafka-side&lt;/em&gt; without applying backpressure to the consumer goroutine, because the client library was configured with a large internal queue.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The actual heap growth: parked sender goroutines
&lt;/h3&gt;

&lt;p&gt;Each call to &lt;code&gt;events &amp;lt;- parseEvent(msg)&lt;/code&gt; on the unbuffered channel would either rendezvous (rare during replay) or park. When it parked, the sender goroutine held:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Its own stack (~8KB minimum, grew under load)&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Event&lt;/code&gt; value it was about to send (~10KB per event with strings, headers, payload)&lt;/li&gt;
&lt;li&gt;A reference into the Kafka message it was parsing (another ~10KB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multiply by the number of in-flight parsing goroutines — which kept being spawned by an outer loop that didn't apply backpressure to itself — and you arrive at the 12GB heap. The channel's &lt;code&gt;sendq&lt;/code&gt; was the proximate memory sink, not the buffer (which was zero-sized).&lt;/p&gt;

&lt;p&gt;The goroutine lifecycle for each parsing goroutine looked like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc3RhdGVEaWFncmFtLXYyCiAgICBbKl0gLS0-IFJ1bm5pbmc6IGdvIGZ1bmMoKQogICAgUnVubmluZyAtLT4gUGFya2VkX29uX3NlbmRxOiBjaCA8LSB2YWx1ZSAobm8gcmVjZWl2ZXIpCiAgICBQYXJrZWRfb25fc2VuZHEgLS0-IFJ1bm5hYmxlOiByZWNlaXZlciB3YWtlcyBtZQogICAgUnVubmFibGUgLS0-IFJ1bm5pbmc6IHNjaGVkdWxlciBwaWNrcyBtZQogICAgUnVubmluZyAtLT4gWypdOiBmdW5jdGlvbiByZXR1cm5zCgogICAgbm90ZSByaWdodCBvZiBQYXJrZWRfb25fc2VuZHEKICAgICAgICBTdGFjayByZXRhaW5lZC4KICAgICAgICBFdmVudCBwYXlsb2FkIHJldGFpbmVkLgogICAgICAgIEthZmthIG1zZyByZWZlcmVuY2UgcmV0YWluZWQuCiAgICAgICAgc2VuZHEgaGFzIE5PIHNpemUgYm91bmQuCiAgICBlbmQgbm90ZQ%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc3RhdGVEaWFncmFtLXYyCiAgICBbKl0gLS0-IFJ1bm5pbmc6IGdvIGZ1bmMoKQogICAgUnVubmluZyAtLT4gUGFya2VkX29uX3NlbmRxOiBjaCA8LSB2YWx1ZSAobm8gcmVjZWl2ZXIpCiAgICBQYXJrZWRfb25fc2VuZHEgLS0-IFJ1bm5hYmxlOiByZWNlaXZlciB3YWtlcyBtZQogICAgUnVubmFibGUgLS0-IFJ1bm5pbmc6IHNjaGVkdWxlciBwaWNrcyBtZQogICAgUnVubmluZyAtLT4gWypdOiBmdW5jdGlvbiByZXR1cm5zCgogICAgbm90ZSByaWdodCBvZiBQYXJrZWRfb25fc2VuZHEKICAgICAgICBTdGFjayByZXRhaW5lZC4KICAgICAgICBFdmVudCBwYXlsb2FkIHJldGFpbmVkLgogICAgICAgIEthZmthIG1zZyByZWZlcmVuY2UgcmV0YWluZWQuCiAgICAgICAgc2VuZHEgaGFzIE5PIHNpemUgYm91bmQuCiAgICBlbmQgbm90ZQ%3D%3D" alt="stateDiagram-v2" width="612" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every goroutine sitting in &lt;code&gt;Parked_on_sendq&lt;/code&gt; is reachable (it's on the runtime's wait queue, which is rooted in the &lt;code&gt;hchan&lt;/code&gt; struct, which is rooted by both the producer and consumer goroutines). Reachable means non-collectible. The longer the consumer falls behind, the longer the queue grows.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. GC can't help
&lt;/h3&gt;

&lt;p&gt;Go's GC can only reclaim unreachable memory. Every parked goroutine on &lt;code&gt;sendq&lt;/code&gt; is reachable (it's on the runtime's scheduler queue). Every &lt;code&gt;Event&lt;/code&gt; it's holding is reachable. The GC ran, found nothing to free, and the heap continued growing until the kernel OOM-killer fired.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The cgroup hammer drops
&lt;/h3&gt;

&lt;p&gt;cgroup memory limit was 4GB. Heap crossed 4GB. OOM kill. Kubernetes restarted the pod. The replay was still in progress on the broker side, so the same sequence ran again. And again.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this looks like in pprof
&lt;/h3&gt;

&lt;p&gt;You don't have to take my word for the mechanism — it reproduces in under a minute. I built a minimal demo at &lt;a href="https://github.com/harrison001/channels-oom-demo" rel="noopener noreferrer"&gt;&lt;code&gt;harrison001/channels-oom-demo&lt;/code&gt;&lt;/a&gt; (&lt;a href="https://github.com/harrison001/channels-oom-demo/blob/main/cmd/bug/main.go" rel="noopener noreferrer"&gt;&lt;code&gt;cmd/bug&lt;/code&gt;&lt;/a&gt;) that runs the same workload shape on a laptop. The output of the bug version over 22 seconds, captured with &lt;code&gt;runtime.NumGoroutine()&lt;/code&gt; and &lt;code&gt;runtime.MemStats.HeapAlloc&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;t=   1s  goroutines=   497  heap_alloc=     5 MB
t=   5s  goroutines=  2462  heap_alloc=    28 MB
t=  10s  goroutines=  4915  heap_alloc=    61 MB
t=  15s  goroutines=  7369  heap_alloc=    89 MB
t=  20s  goroutines=  9828  heap_alloc=   109 MB
t=  22s  goroutines= 10813  heap_alloc=   125 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Goroutine count grows at almost exactly 1 per millisecond (the spawn rate). Heap grows at ~5MB/sec, dominated by the 10KB Event payload each parked goroutine is holding. Extrapolate to a 12-hour replay at production volume and you arrive at the original 12GB OOM.&lt;/p&gt;

&lt;p&gt;For comparison, the fix version (&lt;a href="https://github.com/harrison001/channels-oom-demo/blob/main/cmd/fix/main.go" rel="noopener noreferrer"&gt;&lt;code&gt;cmd/fix&lt;/code&gt;&lt;/a&gt;) on the same workload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;t=   1s  goroutines=     3  heap_alloc=     3 MB  chan_len= 256
t=  10s  goroutines=     3  heap_alloc=     4 MB  chan_len= 256
t=  20s  goroutines=     3  heap_alloc=     5 MB  chan_len= 256
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three goroutines (producer, consumer, pprof listener). Heap flat at 4-5 MB. Channel pinned at its 256-slot bound, meaning the producer is constantly blocked on send and applying backpressure upstream — exactly what we want.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fix, And Why It Works
&lt;/h2&gt;

&lt;p&gt;The visible code change was one parameter. The real fix was making backpressure part of the producer contract — two changes, working together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// (1) bounded queue as safety net&lt;/span&gt;

&lt;span class="c"&gt;// (2) single long-lived producer goroutine with select-based backoff —&lt;/span&gt;
&lt;span class="c"&gt;// NO outer loop spawning fresh goroutines per message.&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;parseEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="c"&gt;// sent — loop continues at consumer speed when channel fills&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key word in change (2) is &lt;strong&gt;single&lt;/strong&gt;. There is exactly one goroutine reading from Kafka and writing to the channel. When the channel fills, that goroutine blocks on send; the &lt;code&gt;for msg := range&lt;/code&gt; loop stops calling &lt;code&gt;Poll()&lt;/code&gt;; the Kafka client's internal pre-fetch queue stops draining; consumer lag accumulates broker-side; the broker simply retains messages until we come back. No &lt;code&gt;go func(msg)&lt;/code&gt; helpers. Nothing piling up on &lt;code&gt;sendq&lt;/code&gt;. Memory stays bounded because the &lt;em&gt;producer&lt;/em&gt; is bounded — the buffer is only a safety net to absorb short bursts.&lt;/p&gt;

&lt;p&gt;What this changes, mechanically:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before (unbounded &lt;code&gt;go func&lt;/code&gt; fan-out + &lt;code&gt;make(chan Event)&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;After (single producer + &lt;code&gt;make(chan Event, 256)&lt;/code&gt;)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One goroutine per inbound message&lt;/td&gt;
&lt;td&gt;One long-lived producer goroutine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;sendq&lt;/code&gt; grows unboundedly with parked helpers&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;sendq&lt;/code&gt; empty by construction; producer is sole sender&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No signal to upstream — outer loop never blocks&lt;/td&gt;
&lt;td&gt;Producer blocks on send; outer loop runs at consumer speed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka client keeps pre-fetching, lag invisible&lt;/td&gt;
&lt;td&gt;Kafka client's internal queue fills, consumer stops polling, broker-side lag accumulates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OOM&lt;/td&gt;
&lt;td&gt;Bounded heap, bounded latency, Kafka rebalances cleanly when behind&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A bounded channel buffer alone does not prevent this OOM. If you applied change (1) without change (2), you'd merely increase the OOM-killing rate — the outer &lt;code&gt;go func(msg)&lt;/code&gt; fan-out would keep spawning, the buffer would fill in milliseconds, helpers would pile up on &lt;code&gt;sendq&lt;/code&gt; exactly as before. Backpressure is not a property of any one component — it is a property of the entire chain having no unbounded buffer (and no unbounded fan-out) anywhere in it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggTFIKICAgIEFbS2Fma2EgYnJva2VyXSAtLT58ZmV0Y2h8IEJbS2Fma2EgY2xpZW50PGJyLz5wcmUtZmV0Y2ggYnVmZmVyXQogICAgQiAtLT4gQ1tQcm9kdWNlcjxici8-Z29yb3V0aW5lXQogICAgQyAtLT58c2VsZWN0fCBEWyJldmVudHM8YnIvPmNoYW4gVCwgMjU2Il0KICAgIEQgLS0-IEVbQ29uc3VtZXI8YnIvPmdvcm91dGluZV0KICAgIEUgLS0-IEZbKERhdGFiYXNlKV0KCiAgICBGIC0uIHNsb3cgLi0-IEUKICAgIEUgLS4gc2xvdyBkcmFpbiAuLT4gRAogICAgRCAtLiBmdWxsIC4tPiBDCiAgICBDIC0uIGJsb2NrcyBvbiBzZW5kIC4tPiBCCiAgICBCIC0uIHF1ZXVlIGZpbGxzLCBmZXRjaCBzbG93cyAuLT4gQQogICAgQSAtLiBicm9rZXIgcmV0YWlucyBtc2dzPGJyLz5jb25zdW1lciBsYWcgZ3Jvd3MgLi0-IEEKCiAgICBjbGFzc0RlZiBib3VuZCBmaWxsOiNjZmUsc3Ryb2tlOiMwODAKICAgIGNsYXNzIEQgYm91bmQ%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggTFIKICAgIEFbS2Fma2EgYnJva2VyXSAtLT58ZmV0Y2h8IEJbS2Fma2EgY2xpZW50PGJyLz5wcmUtZmV0Y2ggYnVmZmVyXQogICAgQiAtLT4gQ1tQcm9kdWNlcjxici8-Z29yb3V0aW5lXQogICAgQyAtLT58c2VsZWN0fCBEWyJldmVudHM8YnIvPmNoYW4gVCwgMjU2Il0KICAgIEQgLS0-IEVbQ29uc3VtZXI8YnIvPmdvcm91dGluZV0KICAgIEUgLS0-IEZbKERhdGFiYXNlKV0KCiAgICBGIC0uIHNsb3cgLi0-IEUKICAgIEUgLS4gc2xvdyBkcmFpbiAuLT4gRAogICAgRCAtLiBmdWxsIC4tPiBDCiAgICBDIC0uIGJsb2NrcyBvbiBzZW5kIC4tPiBCCiAgICBCIC0uIHF1ZXVlIGZpbGxzLCBmZXRjaCBzbG93cyAuLT4gQQogICAgQSAtLiBicm9rZXIgcmV0YWlucyBtc2dzPGJyLz5jb25zdW1lciBsYWcgZ3Jvd3MgLi0-IEEKCiAgICBjbGFzc0RlZiBib3VuZCBmaWxsOiNjZmUsc3Ryb2tlOiMwODAKICAgIGNsYXNzIEQgYm91bmQ%3D" alt="graph LR" width="1521" height="188"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every link in this chain is bounded — the database has connection pool limits, the consumer is rate-limited by &lt;code&gt;process()&lt;/code&gt; latency, the channel buffer is 256, the Kafka client's internal queue has a configured max, and the broker simply retains messages on disk when its consumer falls behind. When ANY downstream link slows, the pressure propagates back up by the consumer ceasing to pull; the broker doesn't need to be told anything. The whole system runs at the rate of its slowest component.&lt;/p&gt;

&lt;p&gt;If any link in that chain has an unbounded buffer, the chain has no backpressure. That link will absorb the load until it OOMs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bounded Buffers Are Not About Channels
&lt;/h2&gt;

&lt;p&gt;The lesson is not "use buffered channels." The lesson is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Any in-process queue without a capacity bound is a latent OOM.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This applies identically across runtimes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Runtime&lt;/th&gt;
&lt;th&gt;The footgun&lt;/th&gt;
&lt;th&gt;The fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;Unbounded goroutine fan-out parked on sends (&lt;code&gt;go func(msg) { ch &amp;lt;- ... }(msg)&lt;/code&gt;); oversized buffered channels&lt;/td&gt;
&lt;td&gt;Single long-lived producer + &lt;code&gt;select&lt;/code&gt; + bounded buffer as safety net&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust (Tokio)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mpsc::unbounded_channel()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mpsc::channel(N)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python (asyncio)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;asyncio.Queue()&lt;/code&gt; with no &lt;code&gt;maxsize&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;asyncio.Queue(maxsize=N)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node.js&lt;/td&gt;
&lt;td&gt;Unbounded array of in-flight Promises&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;p-limit&lt;/code&gt;, &lt;code&gt;Sema&lt;/code&gt;, or explicit pool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Erlang/Elixir&lt;/td&gt;
&lt;td&gt;Process mailbox grows unboundedly when selective receive can't keep up&lt;/td&gt;
&lt;td&gt;Demand-driven flow control: &lt;a href="https://hexdocs.pm/gen_stage/GenStage.html" rel="noopener noreferrer"&gt;&lt;code&gt;GenStage&lt;/code&gt;&lt;/a&gt; / &lt;code&gt;Flow&lt;/code&gt; for pipelines, or explicit ack-based protocols in &lt;code&gt;gen_statem&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every one of these reaches for the same shape — an in-process queue — and every one of them OOMs the same way when the shape is unbounded.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Channels Are The Right Tool
&lt;/h2&gt;

&lt;p&gt;I want to be careful not to overcorrect. Channels are not a mistake. They are an excellent primitive used incorrectly. Cases where reaching for a channel is the right call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cancellation signaling&lt;/strong&gt; — &lt;code&gt;context.Done()&lt;/code&gt; is a &lt;code&gt;&amp;lt;-chan struct{}&lt;/code&gt;. This is canonical.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fan-out work distribution with a worker pool&lt;/strong&gt; — a bounded channel feeding N worker goroutines is a clean semaphore. Buffer size = pool size or small multiple of it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Producer-consumer with a known throughput ratio&lt;/strong&gt; — yes, with a bounded buffer sized to the latency budget.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error aggregation from concurrent goroutines&lt;/strong&gt; — small buffered channel, drain on goroutine completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handoff between pipeline stages&lt;/strong&gt; — bounded, with explicit close semantics on the upstream stage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cases where reaching for a channel is the wrong call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cross-process messaging&lt;/strong&gt; — use a real broker (NATS, Kafka, Redis Streams). Channels do not survive a pod restart.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence&lt;/strong&gt; — channels are stack-local-ish. If your pod dies, the in-flight data is gone. If you need "at least once" across restarts, you need a real queue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bursty load with unknown shape&lt;/strong&gt; — if you cannot put a meaningful upper bound on the buffer, you have not understood the load. Adding a channel does not give you understanding; it postpones the OOM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anything that wants to be a message bus&lt;/strong&gt; — that's not a channel. That's a message bus. They are different categories of system.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Same Bug, Different Layer: AI Agent Inbound Queues
&lt;/h2&gt;

&lt;p&gt;The reason this post lives in the SecurityLab track and not just "Go tips" is that the exact same mistake is now happening, at scale, in LLM agent infrastructure. I've seen the pattern repeatedly in recent AI backends — same architectural shape, different runtime.&lt;/p&gt;

&lt;p&gt;The pattern: an agent backend exposes an HTTP endpoint. Each inbound request is dispatched to a worker pool via an in-process queue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The bug, in a different language
&lt;/span&gt;&lt;span class="n"&gt;request_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# unbounded
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;http_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# never blocks
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queued&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 8 seconds, sometimes 30
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Steady state is fine: requests arrive faster than they're processed, queue grows slowly, latency creeps up, nobody notices because the HTTP layer keeps returning 200.&lt;/p&gt;

&lt;p&gt;Then a launch happens. Or a viral tweet. Or a marketing email goes out. Inbound rate spikes 50x for 20 minutes. The queue accepts everything (it's unbounded). The worker pool can't keep up — LLM calls are inelastic, you can't parallelize past your token-per-minute quota. The queue grows to 200K items. Each item holds a request payload (~50KB with conversation history) and a future. 10GB of heap. OOM. Pod restart. All 200K requests lost. Users see 500s instead of the explicit "rate-limited, try again in 30s" they would have seen with proper backpressure.&lt;/p&gt;

&lt;p&gt;The fix is identical to the Go fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;request_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maxsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;http_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;request_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_nowait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QueueFull&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retry-After&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queued&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;503 is a feature. It is the system telling the client &lt;em&gt;we're at capacity, retry in 30 seconds&lt;/em&gt;. It is honest. It is bounded. It is the difference between a system that degrades gracefully and one that dies silently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reproducing This Yourself
&lt;/h2&gt;

&lt;p&gt;The numbers in this post come from a minimal Go program that fits in under 100 lines per command. The repo lives at:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/harrison001/channels-oom-demo" rel="noopener noreferrer"&gt;github.com/harrison001/channels-oom-demo&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/harrison001/channels-oom-demo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;channels-oom-demo

&lt;span class="c"&gt;# Watch goroutine count + heap climb every second&lt;/span&gt;
go run ./cmd/bug

&lt;span class="c"&gt;# Switch to the fix — flat at 3 goroutines, 5 MB heap&lt;/span&gt;
go run ./cmd/fix
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each program exposes pprof on &lt;code&gt;localhost:6060&lt;/code&gt;. While the bug version is running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Confirm 10K+ goroutines parked on chansend → runtime_chanrecv1&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s1"&gt;'http://localhost:6060/debug/pprof/goroutine?debug=1'&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;

&lt;span class="c"&gt;# Confirm the heap is dominated by Event payloads, not the channel itself&lt;/span&gt;
go tool pprof &lt;span class="nt"&gt;-text&lt;/span&gt; http://localhost:6060/debug/pprof/heap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The bug demo has a hard cap at 20,000 goroutines so it won't actually OOM your laptop. Remove the cap if you want to see the kernel finish the job.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Wish I'd Known
&lt;/h2&gt;

&lt;p&gt;If I could send one note back to myself eighteen months before the OOM:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you reach for an in-process queue, you are choosing a backpressure boundary. The buffer size is not a performance tuning knob. It is a contract: &lt;em&gt;under sustained load greater than my consumer's throughput, this is how much memory I am willing to lose before I tell the producer to stop.&lt;/em&gt; If you don't pick a number, the runtime picks one for you, and the number is &lt;em&gt;whatever fits in RAM right before the kernel kills the process.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Channels in Go look like message-passing because the syntax was deliberately borrowed from CSP, a model where independent processes communicate by passing values across an isolation boundary. In Go there is no isolation boundary. The channel is a struct in shared memory, the goroutines are coroutines on the same scheduler, and the entire setup is synchronization plumbing in CSP clothing.&lt;/p&gt;

&lt;p&gt;Once you see the &lt;code&gt;hchan&lt;/code&gt; struct, you can't un-see it. Every channel decision after that is a synchronization decision, not a transport decision. And synchronization decisions always have a capacity bound — you just have to choose whether to pick it explicitly or have the OOM-killer pick it for you.&lt;/p&gt;




&lt;h3&gt;
  
  
  Keep going
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code&lt;/strong&gt;: &lt;a href="https://github.com/harrison001/channels-oom-demo" rel="noopener noreferrer"&gt;&lt;code&gt;harrison001/channels-oom-demo&lt;/code&gt;&lt;/a&gt; — reproduce both versions, capture your own pprof&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next piece&lt;/strong&gt;: &lt;em&gt;Goroutines Are Cheap — Until Backpressure Is Missing&lt;/em&gt; — coming next. The producer side of the same mistake: why "just spawn a goroutine" is the second half of the bug.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subscribe&lt;/strong&gt;: I write one of these monthly on runtime mechanics, distributed systems postmortems, and the security implications of getting them wrong. &lt;a href="https://buttondown.com/harrisonsec" rel="noopener noreferrer"&gt;Newsletter&lt;/a&gt; · &lt;a href="https://harrisonsec.com/track/securitylab/" rel="noopener noreferrer"&gt;SecurityLab track&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If you've hit this bug — or its cousin in a different runtime — I'd genuinely like to hear about it. The Erlang and Node.js shapes especially: I have hunches but not enough scars. Reply to the newsletter or open an issue on the demo repo.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>go</category>
      <category>concurrency</category>
      <category>performance</category>
      <category>backend</category>
    </item>
    <item>
      <title>How I Improved an AI Agent from 40% to 60% — With A/B Test Data</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Tue, 12 May 2026 15:49:19 +0000</pubDate>
      <link>https://dev.to/harrisonsec/how-i-improved-an-ai-agent-from-40-to-60-with-ab-test-data-4f2i</link>
      <guid>https://dev.to/harrisonsec/how-i-improved-an-ai-agent-from-40-to-60-with-ab-test-data-4f2i</guid>
      <description>&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I was optimizing an AI agent for a production system — a creator agent that handles user requests like "make this character fiercer" or "rename this entity." The agent runs a 5-layer pipeline: Perceive → Cognate → Decide → Act → Express, with real LLM calls at each step.&lt;/p&gt;

&lt;p&gt;Quality was bad. Not "it doesn't work" bad — "it works 40% of the time" bad. The remaining 60% were wrong entity targeting, infinite reasoning loops, and silent failures.&lt;/p&gt;

&lt;p&gt;I ran 5 standardized test cases, each repeated 5 times (LLMs are non-deterministic), measuring pass rate:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;QL-001&lt;/td&gt;
&lt;td&gt;Create 4 entities + 1 relationship in one message&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-002&lt;/td&gt;
&lt;td&gt;Classify user intent correctly&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-003&lt;/td&gt;
&lt;td&gt;Update the right entity in a world with 6 characters + 4 locations&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-004&lt;/td&gt;
&lt;td&gt;Maintain context across long conversation&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-005&lt;/td&gt;
&lt;td&gt;Simple rename ("Ember" → "Infernia")&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Overall: 40% pass rate.&lt;/strong&gt; The model (equivalent to GPT-4 class) was plenty capable. Something else was wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Diagnosis: Context Was the Problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  QL-003: Why the Agent Confused Entities (40% → 80%)
&lt;/h3&gt;

&lt;p&gt;The user says: "Make Ember more fierce and give her fire breath."&lt;/p&gt;

&lt;p&gt;The world has 10 entities: 6 characters (Ember, Luna, Grak, Roland, Mira, Pip) and 4 locations. The agent's &lt;code&gt;BuildChatCompletionMessages&lt;/code&gt; function dumped ALL entity data into the prompt — every character's backstory, every location's description.&lt;/p&gt;

&lt;p&gt;The LLM had to find Ember in a wall of irrelevant text. Sometimes it picked Luna. Sometimes it referenced the wrong character's traits. Not because the model was stupid — because the context was noisy.&lt;/p&gt;

&lt;h3&gt;
  
  
  QL-005: Why Simple Rename Failed (20% → 80%)
&lt;/h3&gt;

&lt;p&gt;"Rename Ember to Infernia." One entity, one operation. Should be trivial.&lt;/p&gt;

&lt;p&gt;Two problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;No round limit — the agent sometimes looped 15+ times on a rename, reasoning tools firing endlessly&lt;/li&gt;
&lt;li&gt;When a tool failed, the LLM got: &lt;code&gt;{"error": true, "message": "This tool is temporarily unavailable."}&lt;/code&gt; — no context on what to do next&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model gave up or produced responses that didn't contain "Infernia."&lt;/p&gt;

&lt;h3&gt;
  
  
  QL-001: Why Multi-Step Creation Was Impossible (0% → 0%)
&lt;/h3&gt;

&lt;p&gt;"Create a dragon named Ember who lives in Crystal Caves. Ember has a rivalry with Sir Roland who guards the village gate."&lt;/p&gt;

&lt;p&gt;This requires creating 4 entities + 1 relationship. The 5-layer pipeline processes entities sequentially, each in isolation. The relationship creation doesn't know the knight was just created — there's no shared state between action steps.&lt;/p&gt;

&lt;p&gt;Both baseline and improved scored 0%. This is an architectural problem, not a context problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fixes: 8 Changes, 7 Pure Code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fix 1: PlanExecution (the only LLM call)
&lt;/h3&gt;

&lt;p&gt;One API call before the main loop. The LLM generates a plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Goal: Update Ember's properties
Steps: 1. Identify Ember entity  2. Apply personality changes
Tools needed: updateCharacter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This plan gets injected into the cognition layer's context. The intent classifier now sees a roadmap, not just raw entity data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; ~$0.003 per request, 3-5s latency. The only fix that uses an LLM call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 2: PrioritizeContext (pure code)
&lt;/h3&gt;

&lt;p&gt;Sort context items by salience score. Higher-relevance items go first. Low-relevance items dropped when the token budget is exceeded.&lt;/p&gt;

&lt;p&gt;When the user says "Make Ember fiercer," Ember's data gets priority. Luna's backstory gets dropped. The LLM sees signal, not noise.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Salience&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Salience&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tokenBudget&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. Pure sort + filter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 3: CompressContext (pure code)
&lt;/h3&gt;

&lt;p&gt;Old conversation rounds get summarized extractively — find tool names, find CONCLUSION markers, truncate the rest. No LLM needed for this level of compression.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. String operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 4: Preserve Conclusions (pure code)
&lt;/h3&gt;

&lt;p&gt;When reasoning text is truncated at 4,000 characters, the truncation used to cut wherever it landed. If the LLM decided "I need to rename Ember to Infernia" in round 1 but that conclusion was at character 4,100, round 2 forgot the decision.&lt;/p&gt;

&lt;p&gt;Fix: &lt;code&gt;truncateReasoningPreservingConclusions()&lt;/code&gt; finds CONCLUSION/DECISION markers and keeps them even when truncating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. String search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 5: Max Rounds Cap (pure code)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;DefaultMaxRounds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;roundCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DefaultMaxRounds&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Previously unlimited. The agent sometimes looped 15+ rounds on a trivial task. Now it stops at 10 and produces its best result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. One if-statement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 6: Structured Tool Errors (pure code)
&lt;/h3&gt;

&lt;p&gt;Before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tool_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"updateCharacter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This tool is temporarily unavailable."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tool_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"updateCharacter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This tool is temporarily unavailable."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"error_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"timeout"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"retryable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;retryable: true&lt;/code&gt;, the LLM knows to try again instead of giving up. With &lt;code&gt;error_type: "timeout"&lt;/code&gt;, it knows the issue is transient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. String classification.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 7: Circuit Breaker (pure code)
&lt;/h3&gt;

&lt;p&gt;Count failures per LLM provider. After 3 consecutive failures, skip that provider and try the fallback. Prevents the agent from burning through 120 seconds of timeout on a dead provider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. Counter + threshold.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 8: HTTP Client Reuse (pure code)
&lt;/h3&gt;

&lt;p&gt;Store &lt;code&gt;*http.Client&lt;/code&gt; on the provider struct, reuse across calls. Previously each call created a new client, a new TCP connection, a new TLS handshake.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. Struct field.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;th&gt;After Fix&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;th&gt;What Fixed It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;QL-001&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;=&lt;/td&gt;
&lt;td&gt;Needs pipeline architecture change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-002&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;=&lt;/td&gt;
&lt;td&gt;Already working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-003&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+40%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PrioritizeContext + PlanExecution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-004&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;=&lt;/td&gt;
&lt;td&gt;Already working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-005&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+60%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Max rounds + structured errors + conclusion preservation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Overall: 40% → 60%. Same model. Better input.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Latency went from 26s to 43s due to the PlanExecution LLM call (~3-5s per test). The HTTP reuse and circuit breaker savings show up under concurrent load, not in a 5-test sequential run.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Didn't Improve — And Why
&lt;/h2&gt;

&lt;p&gt;QL-001 (multi-step creation) stayed at 0%. This isn't a context problem — it's a pipeline architecture problem. Each entity is created in isolation, and the IDs returned by each step are discarded before the next step runs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBVWyJVc2VyOiBDcmVhdGUgRW1iZXIgKyBSb2xhbmQ8YnIvPisgcml2YWxyeSBiZXR3ZWVuIHRoZW0iXQogICAgVSAtLT4gUzFbIlN0ZXAgMTxici8-Y3JlYXRlQ2hhcmFjdGVyKEVtYmVyKTxici8-4oaSIHJldHVybnMgZHJhZ29uX2lkXzQyIl0KICAgIFMxIC0uIHN0YXRlIGRpc2NhcmRlZCAuLT4gUzJbIlN0ZXAgMjxici8-Y3JlYXRlQ2hhcmFjdGVyKFJvbGFuZCk8YnIvPuKGkiByZXR1cm5zIGtuaWdodF9pZF83NyJdCiAgICBTMiAtLiBzdGF0ZSBkaXNjYXJkZWQgLi0-IFMzWyJTdGVwIDM8YnIvPmNyZWF0ZVJlbGF0aW9uc2hpcCg_LCA_KTxici8-bm8gSURzIGF2YWlsYWJsZSJdCiAgICBTMyAtLXggRlsiUmVsYXRpb25zaGlwIGZhaWxzPGJyLz5RTC0wMDE6IDAlIHBhc3MgcmF0ZSJd" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBVWyJVc2VyOiBDcmVhdGUgRW1iZXIgKyBSb2xhbmQ8YnIvPisgcml2YWxyeSBiZXR3ZWVuIHRoZW0iXQogICAgVSAtLT4gUzFbIlN0ZXAgMTxici8-Y3JlYXRlQ2hhcmFjdGVyKEVtYmVyKTxici8-4oaSIHJldHVybnMgZHJhZ29uX2lkXzQyIl0KICAgIFMxIC0uIHN0YXRlIGRpc2NhcmRlZCAuLT4gUzJbIlN0ZXAgMjxici8-Y3JlYXRlQ2hhcmFjdGVyKFJvbGFuZCk8YnIvPuKGkiByZXR1cm5zIGtuaWdodF9pZF83NyJdCiAgICBTMiAtLiBzdGF0ZSBkaXNjYXJkZWQgLi0-IFMzWyJTdGVwIDM8YnIvPmNyZWF0ZVJlbGF0aW9uc2hpcCg_LCA_KTxici8-bm8gSURzIGF2YWlsYWJsZSJdCiAgICBTMyAtLXggRlsiUmVsYXRpb25zaGlwIGZhaWxzPGJyLz5RTC0wMDE6IDAlIHBhc3MgcmF0ZSJd" alt="mermaid diagram" width="1607" height="118"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fixing this requires collapsing the 5-layer pipeline into a unified agent with cross-step state — a larger architectural change, not a context fix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; Context optimization has a ceiling. Past that ceiling, you need architecture changes. But the ceiling is higher than most people think — we still had 20% improvement available before hitting it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Still Missing
&lt;/h2&gt;

&lt;p&gt;Three pieces of infrastructure were built but not wired:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Gap&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;VerifyOutput&lt;/td&gt;
&lt;td&gt;Logs quality issues&lt;/td&gt;
&lt;td&gt;Doesn't retry on failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ScoreMemoryUsage&lt;/td&gt;
&lt;td&gt;Computes relevance scores&lt;/td&gt;
&lt;td&gt;Scores never applied to future retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PlanExecution&lt;/td&gt;
&lt;td&gt;Generates plan before loop&lt;/td&gt;
&lt;td&gt;Plan not tracked during execution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three are &lt;strong&gt;open loops&lt;/strong&gt;. The infrastructure detects problems but doesn't act on them. Closing these loops is the next 20% — getting from 60% to 80%+.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Better input → better output. The LLM is the same.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your agent is underperforming, check the context before blaming the model. In our case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;7 out of 8 fixes were pure code&lt;/li&gt;
&lt;li&gt;Zero additional LLM cost (except one planning call at $0.003)&lt;/li&gt;
&lt;li&gt;20% quality improvement without changing the model&lt;/li&gt;
&lt;li&gt;The model was always capable — the context was holding it back&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The highest-ROI investment in any agent system is context management. It's not glamorous. It's sort, filter, compress, truncate, prioritize. But it's the difference between 40% and 60% — and the foundation for everything else.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the AI Agent Architecture series. See also: &lt;a href="https://harrisonsec.com/blog/ai-agent-90-percent-problem/" rel="noopener noreferrer"&gt;The 90% Problem&lt;/a&gt; for the broader framework, and &lt;a href="https://harrisonsec.com/blog/claude-code-context-engineering-compression-pipeline/" rel="noopener noreferrer"&gt;Claude Code Deep Dive Part 3&lt;/a&gt; for how Anthropic solves context at scale.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>contextmanagement</category>
      <category>engineering</category>
      <category>testing</category>
    </item>
    <item>
      <title>Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Wed, 06 May 2026 15:50:16 +0000</pubDate>
      <link>https://dev.to/harrisonsec/consistency-in-distributed-systems-scenarios-trade-offs-and-what-actually-works-42fd</link>
      <guid>https://dev.to/harrisonsec/consistency-in-distributed-systems-scenarios-trade-offs-and-what-actually-works-42fd</guid>
      <description>&lt;p&gt;There's an impulse, when someone first learns about consistency models in distributed systems, to want to classify the taxonomy into neat drawers. Strong here. Eventual there. Linearizable above it. Read-your-writes below. Study the diagram, pass the interview.&lt;/p&gt;

&lt;p&gt;That taxonomy is real, but it's not useful the way people think. Production systems don't pick a consistency model and run with it. They pick a different model per feature, often per &lt;em&gt;type of operation&lt;/em&gt; within a feature, and spend most of their engineering effort on the gaps between what the model provides and what users actually expect. The taxonomy is the menu. The interesting question is which dish each scenario needs.&lt;/p&gt;

&lt;p&gt;This is a working engineer's walk through ten real consistency scenarios — from the obvious ones (money transfers need strong) to the less obvious (collaborative editing, notification feeds, analytic dashboards) — with the specific engineering that makes each one work.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Consistency is not a global system property; it's a per-operation property. A well-designed distributed system picks different consistency levels for different operations based on what users actually notice, what the business actually requires, and what latency budget each operation has. The CAP-theorem framing ("pick 2 of 3") is a caricature; real systems use PACELC (which adds the latency trade-off during normal operation) and pick per-feature.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Frames That Matter
&lt;/h2&gt;

&lt;p&gt;Before scenarios, three frames you actually use in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CAP&lt;/strong&gt; (Consistency, Availability, Partition tolerance, pick 2). Useful as a first-week mental model. Misleading if taken literally, because (a) you can't give up partition tolerance in a real network, and (b) the choice isn't binary — you can tune per operation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PACELC&lt;/strong&gt;: if there's a Partition, pick A (availability) or C (consistency). Else, pick L (latency) or C (consistency). Adds the latency trade-off you pay during normal operation, which is where 99% of design decisions actually live. A system that's "consistent when no partition" but pays 50ms of cross-region round-trip for every write has made a latency-vs-consistency call, not a CAP call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consistency models, from strongest to weakest&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linearizable&lt;/strong&gt;: operations appear to happen instantaneously, in a total order consistent with real time. The strongest practical model. Expensive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sequential&lt;/strong&gt;: operations appear in a total order, but not necessarily aligned with real time. Slightly weaker, slightly cheaper.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Causal&lt;/strong&gt;: if event A causally precedes event B, every observer sees A before B. Preserves the "this reply should appear after the comment it replied to" property.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-your-writes&lt;/strong&gt;: you see the effects of your own operations, even if other users don't yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monotonic read&lt;/strong&gt;: once you see a value, you won't see an older value later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eventual&lt;/strong&gt;: if writes stop, replicas eventually converge. No ordering guarantees during the transient.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't need to memorize these. You need to recognize which one each feature actually needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBTdHJvbmdbIlN0cm9uZ2VyIMK3IG1vcmUgZXhwZW5zaXZlIl0KICAgICAgICBMWyJMaW5lYXJpemFibGU8YnIvPk1vbmV5IHRyYW5zZmVyIMK3IGRpc3RyaWJ1dGVkIGxvY2tzIl0KICAgICAgICBTUVsiU2VxdWVudGlhbDxici8-TXVsdGktbGVhZGVyIHdpdGggY2xvY2sgc3luYyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBNaWRbIk1pZGRsZSBncm91bmQgwrcgdXN1YWxseSB0aGUgcmlnaHQgYW5zd2VyIl0KICAgICAgICBDQVsiQ2F1c2FsPGJyLz5Tb2NpYWwgZmVlZCDCtyByZXBsaWVzIGFmdGVyIGNvbW1lbnRzIl0KICAgICAgICBSWVsiUmVhZC15b3VyLXdyaXRlczxici8-VXNlciBwcm9maWxlIMK3IHNldHRpbmdzIl0KICAgICAgICBNUlsiTW9ub3RvbmljIHJlYWQ8YnIvPlBhZ2luYXRpb24gwrcgZGFzaGJvYXJkcyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBXZWFrWyJXZWFrZXIgwrcgY2hlYXBlciBhbmQgZmFzdGVyIl0KICAgICAgICBFQ1siRXZlbnR1YWw8YnIvPkNvdW50ZXJzIMK3IGFuYWx5dGljcyDCtyBDRE4iXQogICAgZW5kCgogICAgU3Ryb25nIC0tPiBNaWQgLS0-IFdlYWsKCiAgICBjbGFzc0RlZiBzdHJvbmcgZmlsbDojZmVkN2Q3LHN0cm9rZTojYzUzMDMwCiAgICBjbGFzc0RlZiBtaWQgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiB3ZWFrIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgU3Ryb25nIHN0cm9uZwogICAgY2xhc3MgTWlkIG1pZAogICAgY2xhc3MgV2VhayB3ZWFr" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBTdHJvbmdbIlN0cm9uZ2VyIMK3IG1vcmUgZXhwZW5zaXZlIl0KICAgICAgICBMWyJMaW5lYXJpemFibGU8YnIvPk1vbmV5IHRyYW5zZmVyIMK3IGRpc3RyaWJ1dGVkIGxvY2tzIl0KICAgICAgICBTUVsiU2VxdWVudGlhbDxici8-TXVsdGktbGVhZGVyIHdpdGggY2xvY2sgc3luYyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBNaWRbIk1pZGRsZSBncm91bmQgwrcgdXN1YWxseSB0aGUgcmlnaHQgYW5zd2VyIl0KICAgICAgICBDQVsiQ2F1c2FsPGJyLz5Tb2NpYWwgZmVlZCDCtyByZXBsaWVzIGFmdGVyIGNvbW1lbnRzIl0KICAgICAgICBSWVsiUmVhZC15b3VyLXdyaXRlczxici8-VXNlciBwcm9maWxlIMK3IHNldHRpbmdzIl0KICAgICAgICBNUlsiTW9ub3RvbmljIHJlYWQ8YnIvPlBhZ2luYXRpb24gwrcgZGFzaGJvYXJkcyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBXZWFrWyJXZWFrZXIgwrcgY2hlYXBlciBhbmQgZmFzdGVyIl0KICAgICAgICBFQ1siRXZlbnR1YWw8YnIvPkNvdW50ZXJzIMK3IGFuYWx5dGljcyDCtyBDRE4iXQogICAgZW5kCgogICAgU3Ryb25nIC0tPiBNaWQgLS0-IFdlYWsKCiAgICBjbGFzc0RlZiBzdHJvbmcgZmlsbDojZmVkN2Q3LHN0cm9rZTojYzUzMDMwCiAgICBjbGFzc0RlZiBtaWQgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiB3ZWFrIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgU3Ryb25nIHN0cm9uZwogICAgY2xhc3MgTWlkIG1pZAogICAgY2xhc3MgV2VhayB3ZWFr" alt="flowchart LR" width="1904" height="189"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Moving left to right: cheaper, faster, less coordinated — and more work you do in application code to close the gap between what the model gives you and what users expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ten Scenarios
&lt;/h2&gt;

&lt;p&gt;Quick reference — each row is expanded into its own section below.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Consistency Model&lt;/th&gt;
&lt;th&gt;Key Technique&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Money transfer between accounts&lt;/td&gt;
&lt;td&gt;Linearizable&lt;/td&gt;
&lt;td&gt;Synchronous quorum + idempotency keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Inventory decrement, hot key&lt;/td&gt;
&lt;td&gt;Strong w/ sharding&lt;/td&gt;
&lt;td&gt;Reserved-inventory buckets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;User profile update&lt;/td&gt;
&lt;td&gt;Read-your-writes&lt;/td&gt;
&lt;td&gt;Session timestamp + sticky read&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Social media feed&lt;/td&gt;
&lt;td&gt;Causal&lt;/td&gt;
&lt;td&gt;Version vectors / Lamport timestamps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Collaborative document editing&lt;/td&gt;
&lt;td&gt;Eventual + CRDT/OT&lt;/td&gt;
&lt;td&gt;Conflict-free operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Ad click counter&lt;/td&gt;
&lt;td&gt;Eventual&lt;/td&gt;
&lt;td&gt;Local shard + async aggregation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Multi-region primary/secondary&lt;/td&gt;
&lt;td&gt;Eventual + RYW on demand&lt;/td&gt;
&lt;td&gt;Primary routing per write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Distributed lock / leader election&lt;/td&gt;
&lt;td&gt;Linearizable&lt;/td&gt;
&lt;td&gt;Raft/Paxos consensus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Analytics dashboard&lt;/td&gt;
&lt;td&gt;Append-only / none&lt;/td&gt;
&lt;td&gt;Stream → warehouse ETL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Cross-service orchestration&lt;/td&gt;
&lt;td&gt;Saga&lt;/td&gt;
&lt;td&gt;Local txns + compensations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  1. Money Transfer Between Accounts
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: strict linearizability. No double-spend. No lost updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: transactional database with serializable isolation, or a strongly-consistent coordination layer (Paxos/Raft quorum). Typical implementation: single-region primary Postgres with synchronous replication, or a distributed SQL (Spanner, CockroachDB, YugabyteDB) with linearizable reads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you give up&lt;/strong&gt;: latency (especially cross-region), availability during partitions. This is the right trade — a bank doesn't tolerate double-spend to save 30ms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key engineering&lt;/strong&gt;: idempotency keys on every request, deduplication at the persistence layer, well-audited transaction boundaries. Strong consistency at the DB isn't enough if your retry logic double-writes.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Inventory Decrement with High Contention
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: "no overselling" without blocking every request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: the classic "hot key" problem. Options in ascending sophistication:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pessimistic locking&lt;/strong&gt; — &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; on the inventory row. Works; serializes hot items. Under peak traffic on Black Friday, this queues up and tail latencies explode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimistic concurrency&lt;/strong&gt; — read version, decrement, compare-and-swap. Retries on conflict. Better tail latency at moderate contention, worse at very high contention (retry storms).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reserved-inventory buckets&lt;/strong&gt; — maintain N "shards" of available inventory, route requests to a random shard, only one shard hits contention at a time. Sacrifices a small amount of overselling risk (if shard A has 5 left but shard B has 0, a user might get told "out of stock" while 5 remain total) for huge throughput wins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best-effort with async reconciliation&lt;/strong&gt; — accept orders optimistically, reconcile at a background worker, cancel overbooks with apology emails. Used by event-ticketing sites for popular drops.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right choice depends on business rules. If overselling by 1% is unacceptable, pessimistic. If overselling by 0.1% is tolerable and user-experience matters, shard or async reconcile.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. User Profile Update
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: read-your-writes. After I save my display name, I see it on next page load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: sticky reads. Either the session pins to the write replica for a short window, or the application tracks a "last write timestamp" per user and refuses to serve reads from a replica that hasn't caught up.&lt;/p&gt;

&lt;p&gt;The naive alternative — "eventually consistent, just retry" — breaks user expectations immediately. "I updated my name and it didn't save" is one of the most expensive support tickets on a per-incident basis, because the user has no way to distinguish "didn't save" from "saved but replication is lagging."&lt;/p&gt;

&lt;p&gt;The engineering is not glamorous. A session cookie that carries &lt;code&gt;last_write_ts&lt;/code&gt;, a read path that asserts &lt;code&gt;replica.latest_ts &amp;gt;= last_write_ts&lt;/code&gt;, and a fallback to the primary if the assertion fails. Most frameworks don't give you this for free; you build it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBVIGFzIFVzZXIKICAgIHBhcnRpY2lwYW50IEFwcCBhcyBBcHAgU2VydmVyCiAgICBwYXJ0aWNpcGFudCBQIGFzIFByaW1hcnkKICAgIHBhcnRpY2lwYW50IFIgYXMgUmVwbGljYQoKICAgIFUtPj5BcHA6IFVwZGF0ZSBwcm9maWxlCiAgICBBcHAtPj5QOiBXcml0ZSAocmV0dXJucyBjb21taXRfdHMgVDEpCiAgICBQLS0-PkFwcDogb2sKICAgIEFwcC0tPj5VOiBSZXNwb25zZSArIGNvb2tpZSB7bGFzdF93cml0ZV90czogVDF9CgogICAgTm90ZSBvdmVyIFUsQXBwOiBOZXh0IHBhZ2UgbG9hZCDigJQgdXNlciBoYXMgY29va2llIFQxCiAgICBVLT4-QXBwOiBSZWFkIHByb2ZpbGUgKGNvb2tpZSBUMSkKICAgIEFwcC0-PlI6IHJlcGxpY2FfdHMgPj0gVDEgPwoKICAgIGFsdCBSZXBsaWNhIGNhdWdodCB1cAogICAgICAgIFItLT4-QXBwOiB5ZXMKICAgICAgICBBcHAtLT4-VTogUHJvZmlsZSAoZnJvbSByZXBsaWNhIMK3IGZhc3QpCiAgICBlbHNlIFJlcGxpY2EgbGFnZ2luZwogICAgICAgIFItLT4-QXBwOiBubywgcmVwbGljYV90cyA8IFQxCiAgICAgICAgQXBwLT4-UDogUmVhZCBmcm9tIHByaW1hcnkKICAgICAgICBQLS0-PkFwcDogUHJvZmlsZSBkYXRhCiAgICAgICAgQXBwLS0-PlU6IFByb2ZpbGUgKGZyb20gcHJpbWFyeSDCtyBjb3JyZWN0KQogICAgZW5k" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBVIGFzIFVzZXIKICAgIHBhcnRpY2lwYW50IEFwcCBhcyBBcHAgU2VydmVyCiAgICBwYXJ0aWNpcGFudCBQIGFzIFByaW1hcnkKICAgIHBhcnRpY2lwYW50IFIgYXMgUmVwbGljYQoKICAgIFUtPj5BcHA6IFVwZGF0ZSBwcm9maWxlCiAgICBBcHAtPj5QOiBXcml0ZSAocmV0dXJucyBjb21taXRfdHMgVDEpCiAgICBQLS0-PkFwcDogb2sKICAgIEFwcC0tPj5VOiBSZXNwb25zZSArIGNvb2tpZSB7bGFzdF93cml0ZV90czogVDF9CgogICAgTm90ZSBvdmVyIFUsQXBwOiBOZXh0IHBhZ2UgbG9hZCDigJQgdXNlciBoYXMgY29va2llIFQxCiAgICBVLT4-QXBwOiBSZWFkIHByb2ZpbGUgKGNvb2tpZSBUMSkKICAgIEFwcC0-PlI6IHJlcGxpY2FfdHMgPj0gVDEgPwoKICAgIGFsdCBSZXBsaWNhIGNhdWdodCB1cAogICAgICAgIFItLT4-QXBwOiB5ZXMKICAgICAgICBBcHAtLT4-VTogUHJvZmlsZSAoZnJvbSByZXBsaWNhIMK3IGZhc3QpCiAgICBlbHNlIFJlcGxpY2EgbGFnZ2luZwogICAgICAgIFItLT4-QXBwOiBubywgcmVwbGljYV90cyA8IFQxCiAgICAgICAgQXBwLT4-UDogUmVhZCBmcm9tIHByaW1hcnkKICAgICAgICBQLS0-PkFwcDogUHJvZmlsZSBkYXRhCiAgICAgICAgQXBwLS0-PlU6IFByb2ZpbGUgKGZyb20gcHJpbWFyeSDCtyBjb3JyZWN0KQogICAgZW5k" alt="sequenceDiagram" width="1026" height="866"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Social Media Feed
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: causal consistency for comments and replies. Eventual consistency everywhere else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: two-tier. Posts and likes are written to a local region with async replication. Replies are linked to their parent post with an explicit cause-precedes relationship — the reply's store won't surface the reply until the parent has propagated.&lt;/p&gt;

&lt;p&gt;The CRDT-adjacent pattern (version vectors, Lamport timestamps) sits underneath, but you don't usually expose it to the application. What you expose is "here's the list of replies, in causally-consistent order." What the user sees: "I replied to a comment, and my reply appears under it" — which is exactly the mental model they expect.&lt;/p&gt;

&lt;p&gt;What you save by not using strong consistency everywhere: low write latency (local region only), high availability during partitions, and the ability to handle massive fan-out (a celebrity's post propagating to 40M followers doesn't need to wait on a single coordinator).&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Collaborative Document Editing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: offline-first, multi-user concurrent edits, always-eventually-converge, no lost updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: CRDT (conflict-free replicated data type) or OT (operational transformation). This is one of the few spots where CRDTs genuinely shine. The underlying math guarantees that any two replicas will converge to the same state, regardless of the order operations arrive in, as long as all operations eventually reach all replicas.&lt;/p&gt;

&lt;p&gt;Google Docs uses a version of OT. Figma uses multivalue registers and CRDT-adjacent primitives. Notion uses a mix. The common property: any user can edit while offline, sync when reconnected, and the final document reflects all edits.&lt;/p&gt;

&lt;p&gt;What you give up: simplicity. CRDT implementations are subtle, and naive "last-write-wins" semantics are almost never what the user wants (their previous sentence vanished, not merged).&lt;/p&gt;

&lt;p&gt;What you gain: offline support without an ugly "you've been offline, your changes may conflict" modal.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Ad Click Counter
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: eventual consistency, very high write throughput, lossy-okay for a tiny fraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: local counter per shard, periodic aggregation to central store. Writes are fire-and-forget to a stream (Kafka, Kinesis). Reads come from a precomputed aggregate that's a few seconds stale.&lt;/p&gt;

&lt;p&gt;Why this works: no advertiser is going to detect the difference between "47,312 clicks" and "47,318 clicks" in their dashboard. Counting-with-precision across a global distributed system is ten times harder than counting approximately. Do the latter.&lt;/p&gt;

&lt;p&gt;What's non-obvious: the system should be &lt;em&gt;designed&lt;/em&gt; for approximate counts, with explicit tolerance in the SLA ("counts are accurate to within 0.01% and updated every 30 seconds"). If you don't say that upfront, someone will eventually ask "why don't our counts match the backend logs exactly" and you'll be in a two-week project to eliminate errors that never mattered.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Multi-Region Primary / Secondary
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: fast reads in every region, writes can live in one region.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: primary-in-region-A, async replication to regions B/C/D. Reads in B/C/D may lag the primary by milliseconds to seconds. Writes always route to A.&lt;/p&gt;

&lt;p&gt;Consistency model you're serving: &lt;strong&gt;eventual, with read-your-writes available on demand&lt;/strong&gt; (see scenario 3). Reads from the primary are strongly consistent; reads from secondaries are lagged but fast.&lt;/p&gt;

&lt;p&gt;Key engineering: the client SDK should know which operations need primary reads (after a recent write, for "show me the thing I just wrote" operations) and which can hit secondaries (dashboards, history views, anything time-insensitive).&lt;/p&gt;

&lt;p&gt;This is where most backend systems actually live. The bulk of reads go to secondaries — cheap, fast. A small percentage route to primary for freshness. Latency and availability both win.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Distributed Lock / Leader Election
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: exactly one leader, no split-brain, sometimes-unavailable-is-okay.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: a consensus system (Zookeeper, etcd, Consul — all Raft or Paxos variants). Acquire the lock or lease, renew it, do the work, release. If you lose the network partition, the other side knows you lost it because it couldn't renew.&lt;/p&gt;

&lt;p&gt;The classic failure: leader election on top of Redis. Redis is not a consensus system. RedLock has well-documented failure modes — it is not safe for correctness-critical locking. Use etcd. Use Zookeeper. Use a real consensus system. The tempting shortcut will, eventually, bite.&lt;/p&gt;

&lt;p&gt;What consensus buys you: guaranteed linearizability for operations on the lock/lease. What it costs: every operation is a quorum round-trip. That's fine for leader election (infrequent). It's not fine for a hot write path (use a different mechanism).&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Analytics Dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: all the data, eventually, in a queryable form. No urgency on freshness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: stream writes to a durable log (Kafka), have an ETL job populate a columnar warehouse (BigQuery, ClickHouse, Snowflake) on a schedule. Dashboards query the warehouse. Data is minutes to hours stale.&lt;/p&gt;

&lt;p&gt;Consistency model: none, in the traditional sense. You have an append-only log and a materialized view. The view is eventually consistent with the log, and that's the whole contract.&lt;/p&gt;

&lt;p&gt;This is simple but worth calling out because people sometimes try to do analytics against the operational database directly ("we'll run these queries on the primary, it'll be fine"). It will not be fine. Analytic queries are different workload shapes — they want columnar storage, aggressive parallelism, no transactional overhead. Put them in a warehouse.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Cross-Service Orchestration (Saga)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: multi-step business flow across services — create order, reserve inventory, charge payment, schedule shipment. Each step might fail. The system should end up in a consistent state either way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: Saga. Each step is a local transaction in its own service. For each step, you also define a &lt;em&gt;compensating&lt;/em&gt; step that undoes it. If a step fails partway through, you run compensations for the earlier steps in reverse:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBPIGFzIE9yY2hlc3RyYXRvcgogICAgcGFydGljaXBhbnQgT3JkZXIgYXMgT3JkZXIgU2VydmljZQogICAgcGFydGljaXBhbnQgSW52IGFzIEludmVudG9yeQogICAgcGFydGljaXBhbnQgUGF5IGFzIFBheW1lbnQKCiAgICBPLT4-T3JkZXI6IENyZWF0ZSBvcmRlcgogICAgT3JkZXItLT4-Tzogb2sKICAgIE8tPj5JbnY6IFJlc2VydmUgaW52ZW50b3J5CiAgICBJbnYtLT4-Tzogb2sKICAgIE8tPj5QYXk6IENoYXJnZSBwYXltZW50CiAgICBQYXktLT4-TzogZmFpbGVkCgogICAgTm90ZSBvdmVyIE8sUGF5OiBSdW4gY29tcGVuc2F0aW9ucyBpbiByZXZlcnNlCiAgICBPLT4-SW52OiBSZWxlYXNlIGludmVudG9yeQogICAgSW52LS0-Pk86IG9rCiAgICBPLT4-T3JkZXI6IENhbmNlbCBvcmRlcgogICAgT3JkZXItLT4-Tzogb2s%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBPIGFzIE9yY2hlc3RyYXRvcgogICAgcGFydGljaXBhbnQgT3JkZXIgYXMgT3JkZXIgU2VydmljZQogICAgcGFydGljaXBhbnQgSW52IGFzIEludmVudG9yeQogICAgcGFydGljaXBhbnQgUGF5IGFzIFBheW1lbnQKCiAgICBPLT4-T3JkZXI6IENyZWF0ZSBvcmRlcgogICAgT3JkZXItLT4-Tzogb2sKICAgIE8tPj5JbnY6IFJlc2VydmUgaW52ZW50b3J5CiAgICBJbnYtLT4-Tzogb2sKICAgIE8tPj5QYXk6IENoYXJnZSBwYXltZW50CiAgICBQYXktLT4-TzogZmFpbGVkCgogICAgTm90ZSBvdmVyIE8sUGF5OiBSdW4gY29tcGVuc2F0aW9ucyBpbiByZXZlcnNlCiAgICBPLT4-SW52OiBSZWxlYXNlIGludmVudG9yeQogICAgSW52LS0-Pk86IG9rCiAgICBPLT4-T3JkZXI6IENhbmNlbCBvcmRlcgogICAgT3JkZXItLT4-Tzogb2s%3D" alt="sequenceDiagram" width="850" height="664"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not all compensations are symmetric — you can't un-send an email, you can't un-refund a payment. But for most business flows, you can design compensations that leave the system in a consistent-enough state.&lt;/p&gt;

&lt;p&gt;The alternative — 2PC (two-phase commit) across all services — is real but rarely used. 2PC requires every participant to support the protocol, holds locks while waiting, and blocks the whole transaction if any participant is slow or down. For services owned by different teams on different storage engines, 2PC doesn't scale.&lt;/p&gt;

&lt;p&gt;Saga engineering concerns: saga orchestrators (a coordinator service that runs the state machine) vs saga choreography (each service emits events that trigger the next). Orchestrators are simpler to reason about. Choreography scales further but can produce spaghetti.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Meta-Rule
&lt;/h2&gt;

&lt;p&gt;Walking through those ten: the choice isn't really "which consistency model is best for my system." It's "which consistency model does this specific operation need, given what users expect to see."&lt;/p&gt;

&lt;p&gt;Most production systems use all of the following, in different places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong/linearizable consistency for anything money-related.&lt;/li&gt;
&lt;li&gt;Read-your-writes for user-visible writes that users need to see immediately.&lt;/li&gt;
&lt;li&gt;Causal consistency for feed-like data where ordering matters.&lt;/li&gt;
&lt;li&gt;Eventual consistency for counters, analytics, and anything where approximate-and-fast beats exact-and-slow.&lt;/li&gt;
&lt;li&gt;CRDTs (narrowly) for collaborative editing and specific offline-first features.&lt;/li&gt;
&lt;li&gt;Saga for cross-service business flows.&lt;/li&gt;
&lt;li&gt;Consensus (Zookeeper/etcd) for the very few things that actually need leader election or distributed locks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The engineering decision is not "pick a consistency level for the whole system." It's "for this specific feature, what consistency level does the user need to experience, what trade-offs does the stronger version cost, and can we engineer the weaker version to feel as good?"&lt;/p&gt;

&lt;p&gt;That last clause matters. A read-your-writes layer on top of eventual consistency often &lt;em&gt;feels&lt;/em&gt; strongly consistent to users while actually being cheap to operate. Users don't experience consistency models; they experience whether their updates show up, whether their comments appear in order, whether their refund matches what they expected. Engineering consistency is about closing the gap between the model you can afford and the experience the user requires.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Anti-Patterns
&lt;/h2&gt;

&lt;p&gt;A few shapes that show up repeatedly in code reviews:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Retry until consistent."&lt;/strong&gt; Seen in code that does a write, then reads from a secondary and loops until it sees the write. Works on the happy path, deadlocks on partition, creates unbounded retry storms under load. Use read-your-writes through a session token instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"We'll use eventual consistency for speed."&lt;/strong&gt; Used as a justification for skipping engineering. Yes, eventual is faster. The engineering to make it &lt;em&gt;feel&lt;/em&gt; correct (causal ordering, conflict resolution, read-your-writes fallback) is what you're skipping — and users will notice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Just use Redis for leader election."&lt;/strong&gt; Already mentioned. Redlock is not safe. If you're doing anything correctness-critical with leader election, use a real consensus system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Saga with no compensations."&lt;/strong&gt; "What happens if step 3 fails?" "Oh, we'll fix it manually." That's a saga you haven't designed. It's a half-finished state machine waiting to corrupt data. Design the compensations before you ship.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Strong consistency everywhere, for safety."&lt;/strong&gt; Default-safe sounds responsible. It also means your read latency is 50ms minimum, you can't serve a region during a partition, and the cost per query is high. Users rarely need strong consistency everywhere. They need it in a few specific places.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Senior Move
&lt;/h2&gt;

&lt;p&gt;Consistency is a user-experience feature, not a system property. The right question at design time isn't "what consistency model does our database provide" — it's "what does the user need to see, in what order, with what freshness, with what tolerance for partial failure."&lt;/p&gt;

&lt;p&gt;Most of the work in a well-designed distributed system is engineering &lt;em&gt;around&lt;/em&gt; the consistency model the storage layer provides: sticky reads, session tokens, version vectors, compensating actions, explicit ordering, user-visible "your change is saved" confirmations. The model is the floor; the engineering lifts the experience to what users actually expect.&lt;/p&gt;

&lt;p&gt;The difference between senior and junior distributed-systems work often shows up here. Junior picks a model and fights everything else to conform. Senior picks the model per-feature, builds the engineering scaffolding that closes the gap, and ships something that feels right to users — even though underneath, ten different operations run on five different consistency levels.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/fail-fast-bounded-resilience-distributed-systems/" rel="noopener noreferrer"&gt;Why Your "Fail-Fast" Strategy is Killing Your Distributed System&lt;/a&gt; — another "the system property is not the user experience" essay.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/rpc-vs-nats-who-owns-completion/" rel="noopener noreferrer"&gt;RPC vs NATS: It's Not About Sync vs Async — It's About Who Owns Completion&lt;/a&gt; — who owns completion is one of the things consistency models don't address.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/four-pillars-modern-concurrency-locks-to-actors/" rel="noopener noreferrer"&gt;From Locks to Actors: The Four Pillars of Modern Concurrency&lt;/a&gt; — the concurrency side of the same general question.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/nats-kafka-mqtt-same-category-different-jobs/" rel="noopener noreferrer"&gt;NATS vs Kafka vs MQTT: Same Category, Very Different Jobs&lt;/a&gt; — the durability choice that enables some of the consistency patterns here.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>distributedsystems</category>
      <category>consistency</category>
      <category>saga</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Don't Pick One AI. Run Three Against Each Other.</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Sun, 03 May 2026 19:18:57 +0000</pubDate>
      <link>https://dev.to/harrisonsec/dont-pick-one-ai-run-three-against-each-other-3d27</link>
      <guid>https://dev.to/harrisonsec/dont-pick-one-ai-run-three-against-each-other-3d27</guid>
      <description>&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;AI can write code, generate content, analyze data, design systems, and manage projects. It's getting better every month. The natural question: what's left for humans?&lt;/p&gt;

&lt;p&gt;The wrong answer: "AI will replace us."&lt;br&gt;
The other wrong answer: "AI is just a tool, nothing changes."&lt;/p&gt;

&lt;p&gt;The right answer is uncomfortable: stop picking the best AI. Run multiple AIs in competition, and become the judge.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Tournament Model
&lt;/h2&gt;

&lt;p&gt;Three rules, learned the hard way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple advisors, competing opinions.&lt;/strong&gt; Don't bind to one AI — its bias becomes yours. Three models running the same task surface blind spots no single model catches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You decide.&lt;/strong&gt; After the AIs argue, you make the call. Not the smartest model — you. The one with context they don't have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Results judge everyone.&lt;/strong&gt; Did the call work? Keep it. Did it fail? Learn and move on. Never blame the AI — you chose to follow that advice.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the operating system for the AI age.&lt;/p&gt;
&lt;h2&gt;
  
  
  In Practice: Three AIs in One Window
&lt;/h2&gt;

&lt;p&gt;Theory is cheap. The reason most people don't run multiple AIs is friction — opening three terminals, signing in to three CLIs, and pasting the same prompt three times kills the loop after one day.&lt;/p&gt;

&lt;p&gt;So I wrote two small scripts and one tmux config to remove the friction. They live in this repo: &lt;strong&gt;&lt;a href="https://github.com/harrison001/ai-tournament" rel="noopener noreferrer"&gt;harrison001/ai-tournament&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;script&lt;/th&gt;
&lt;th&gt;what it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;prj&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;one command opens a tmux window with &lt;strong&gt;codex&lt;/strong&gt;, &lt;strong&gt;claude&lt;/strong&gt;, and &lt;strong&gt;gemini&lt;/strong&gt; running side-by-side, plus a shell pane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tmx&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;fzf picker to switch between tournament sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tmux.conf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;binds &lt;code&gt;&amp;lt;prefix&amp;gt; b&lt;/code&gt; to broadcast — type once, all three AIs receive it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ prj ai-tournament       # opens tmux: codex | claude | gemini | shell
$ &amp;lt;prefix&amp;gt; b              # broadcast ON — status bar turns red
$ &amp;lt;type the question&amp;gt;     # all three receive the same prompt
$ &amp;lt;prefix&amp;gt; b              # broadcast OFF, read each answer, pick the winner
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A real run
&lt;/h3&gt;

&lt;p&gt;To test the loop I asked all three the same dumb question I had asked myself five minutes earlier:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I want to wash my car. The car wash is 30 meters from my house. Should I drive or walk?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here is what came back, unedited:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddkoqng29q5ybem8rcx6.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddkoqng29q5ybem8rcx6.webp" alt="Three AIs answering the same question in tiled tmux panes — codex and claude both say " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;codex&lt;/strong&gt;: &lt;em&gt;"Walk. For 30 meters, driving adds unnecessary startup, parking, and maneuvering, and it is barely faster unless you need to move supplies or the car itself into a wash bay."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;claude&lt;/strong&gt;: &lt;em&gt;"Walk. At 30 meters, driving takes longer once you factor in starting the car, and you'd just be sitting in a freshly washed car driving it home through whatever's on the road. Walk over, drive back clean."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gemini&lt;/strong&gt;: &lt;em&gt;"You should drive. Although 30 meters is a very short distance to walk, your car needs to be physically present at the facility to be cleaned."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two out of three confidently gave a wrong answer. They optimized the surface question — &lt;em&gt;short distance, walking is fine&lt;/em&gt; — and missed the function of a car wash. If I had asked only the popular one, I would have walked over to wait in line for a service that requires a car.&lt;/p&gt;

&lt;p&gt;Only &lt;strong&gt;gemini&lt;/strong&gt; caught the obvious thing: the car has to be there.&lt;/p&gt;

&lt;p&gt;This is what the tournament model is for. It is not "three AIs are smarter than one." Two of them were less smart than one. The point is &lt;strong&gt;the divergence becomes visible&lt;/strong&gt;, and the human is the one who picks. With a single AI, you never see the disagreement — you just inherit whichever bias that model happened to have.&lt;/p&gt;

&lt;p&gt;The car wash is a toy example. Replace it with &lt;em&gt;"should we go gRPC, NATS, or HTTP for service-to-service?"&lt;/em&gt; and the same pattern holds — except the cost of picking the confident-but-wrong answer is no longer a wasted afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Principles
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use Multiple AIs — Don't Bind to One
&lt;/h3&gt;

&lt;p&gt;Claude, Gemini, GPT, Codex — they're all advisors. Each has strengths. Each has blind spots. Using only one AI is like having only one advisor: you inherit all their biases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;One AI:     The model's bias becomes your bias
Three AIs:  Biases cancel out, blind spots get covered
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I write content using three AI models simultaneously. Same task, three outputs. I don't ask them to divide the work — I ask them to &lt;strong&gt;compete&lt;/strong&gt;. The best output wins. The others get discarded.&lt;/p&gt;

&lt;p&gt;This is not "AI-assisted writing." This is a tournament where AI models compete and the human judges.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Compete, Don't Divide
&lt;/h3&gt;

&lt;p&gt;Most people who use multiple AIs assign each one a role: "Claude for writing, GPT for coding, Gemini for research." That's division of labor. It's a planned economy.&lt;/p&gt;

&lt;p&gt;The tournament model is a &lt;strong&gt;market economy&lt;/strong&gt;: same task to all, let results determine who's best.&lt;/p&gt;

&lt;p&gt;Why competition beats division:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Division relies on your judgment of which AI is better at what — and that judgment is constantly wrong as models update&lt;/li&gt;
&lt;li&gt;Competition is self-correcting — if GPT suddenly gets better at writing, it starts winning writing tasks. No reconfiguration needed&lt;/li&gt;
&lt;li&gt;You don't need to solve the impossible problem of "which AI is best" — let them prove it through results&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. The Human Decides — Judgment Is Not Outsourceable
&lt;/h3&gt;

&lt;p&gt;AI can analyze. AI can generate options. AI can evaluate tradeoffs. What AI cannot do: &lt;strong&gt;decide which tradeoff matters in this specific context for this specific person with these specific constraints.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three capabilities make human judgment irreplaceable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Insight&lt;/strong&gt; — Knowing what question to ask. AI can answer any question, but it can't know which question matters right now. Insight comes from understanding the problem deeply enough to ask the question that unlocks everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Critical Thinking&lt;/strong&gt; — Knowing when AI is wrong. AI gives confident, articulate answers regardless of accuracy. The human must evaluate: does this make sense? Is this consistent with what I know? Is there a blind spot?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result Evaluation&lt;/strong&gt; — Knowing if the outcome is good enough. AI can generate a technically correct solution that's wrong for your context. Only the human who understands the full picture — users, business constraints, team dynamics, market timing — can judge whether the output actually serves the goal.&lt;/p&gt;

&lt;p&gt;These three form a loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Insight → Ask the right question
  ↓
AI gives analysis
  ↓
Critical Thinking → Is this analysis trustworthy?
  ↓
Choose and execute
  ↓
Result Evaluation → Did it work?
  ↓
Insight → Why did it work / not work? → Better questions next time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. No Blind Faith, No Emotions — Results Are the Only Standard
&lt;/h3&gt;

&lt;p&gt;Two temptations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI agrees with you → "See, I was right." (Confirmation bias)&lt;/li&gt;
&lt;li&gt;AI disagrees with you → "AI doesn't understand my situation." (Emotional rejection)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tournament model rejects both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI agrees with me    → Good, but does the result confirm it?
AI disagrees with me → Interesting. Let me verify before judging.
Made a choice        → Own the outcome. Right? Improve. Wrong? Learn. Never blame the AI.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Practice as the sole test of truth. Not who said it. Not how confident it sounded. Did it work?&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Human Drives AI, Not the Other Way Around
&lt;/h3&gt;

&lt;p&gt;AI is an amplifier. The question is: amplifying what?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No insight + good AI tools = efficiently producing mediocrity
Good insight + no AI tools = good ideas, slow execution
Good insight + tournament model = insight amplified 10x
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The human provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direction&lt;/strong&gt; — what to work on (insight)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality standard&lt;/strong&gt; — what "good" looks like (evaluation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context&lt;/strong&gt; — the constraints AI doesn't see (judgment)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accountability&lt;/strong&gt; — willingness to own the outcome (leadership)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt; — generate options fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breadth&lt;/strong&gt; — consider more possibilities than a human can&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt; — apply the same standard across large volumes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge&lt;/strong&gt; — access more information than any person can hold&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The human's role isn't to do AI's job slowly. It's to do the job AI can't do at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applied to Real Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Content Creation
&lt;/h3&gt;

&lt;p&gt;The temptation: let AI generate content and publish automatically. Maximum output, minimum effort.&lt;/p&gt;

&lt;p&gt;The result: a flood of mediocre, AI-flavored content. No differentiation. No personal perspective. Platforms and audiences both learn to ignore it.&lt;/p&gt;

&lt;p&gt;The tournament approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Three AI models generate competing drafts on the same topic&lt;/li&gt;
&lt;li&gt;The human evaluates: which captured the insight? Which missed the point?&lt;/li&gt;
&lt;li&gt;The winning draft gets refined — the human adds what AI can't: personal experience, controversial opinion, industry context&lt;/li&gt;
&lt;li&gt;Publication decision: is this good enough to attach my name to?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The output isn't "AI content." It's human content, produced at AI speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Decisions
&lt;/h3&gt;

&lt;p&gt;The temptation: ask one AI "should I use vector databases for agent memory?" and follow its recommendation.&lt;/p&gt;

&lt;p&gt;The result: you inherit that model's training bias. Claude might favor simplicity (it was trained by Anthropic, who chose Markdown files). GPT might favor complexity (it's aligned with enterprise patterns).&lt;/p&gt;

&lt;p&gt;The tournament approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ask all three: "What are the tradeoffs between Markdown files, SQLite + vectors, and self-evolving skills for agent memory?"&lt;/li&gt;
&lt;li&gt;Each gives a different analysis weighted by its own biases&lt;/li&gt;
&lt;li&gt;The human evaluates against the &lt;strong&gt;actual constraints&lt;/strong&gt;: deployment model, team size, user count, latency requirements&lt;/li&gt;
&lt;li&gt;The decision accounts for context that no AI has — your specific situation&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Career Strategy
&lt;/h3&gt;

&lt;p&gt;The temptation: "AI will replace developers, I need to switch careers."&lt;/p&gt;

&lt;p&gt;The reality: AI replaces tasks, not roles. The question is which tasks become your competitive advantage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For employees:  Agent engineering skills (the 90% problem) — because companies 
                have data and scenarios, but need people who can build reliable agents

For founders:   Data + scenario moats — because agent engineering can be hired,
                but proprietary data and deep domain knowledge can't
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In both cases, the competitive advantage is &lt;strong&gt;insight&lt;/strong&gt; — understanding what matters in your specific domain well enough to direct AI effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Anti-Patterns
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Anti-Pattern&lt;/th&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Tournament Alternative&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Only use one AI&lt;/td&gt;
&lt;td&gt;Single advisor's bias = your bias&lt;/td&gt;
&lt;td&gt;Multiple AIs competing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Follow AI blindly&lt;/td&gt;
&lt;td&gt;Lose judgment over time&lt;/td&gt;
&lt;td&gt;AI advises, human decides&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reject AI when it disagrees&lt;/td&gt;
&lt;td&gt;Miss good ideas out of ego&lt;/td&gt;
&lt;td&gt;No emotions, evaluate by results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automate everything&lt;/td&gt;
&lt;td&gt;No quality control, garbage output&lt;/td&gt;
&lt;td&gt;Human at quality gates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Treat AI as just a tool&lt;/td&gt;
&lt;td&gt;Waste AI's analytical capability&lt;/td&gt;
&lt;td&gt;Treat AIs as competing advisors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Test
&lt;/h2&gt;

&lt;p&gt;Here's how to know if you're using AI well:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad sign:&lt;/strong&gt; You can't explain why you chose AI's suggestion over the alternatives.&lt;br&gt;
&lt;strong&gt;Good sign:&lt;/strong&gt; You can articulate the tradeoff — what you gained and what you gave up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad sign:&lt;/strong&gt; You use the same AI for everything.&lt;br&gt;
&lt;strong&gt;Good sign:&lt;/strong&gt; You use different AIs for the same task and pick the best output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad sign:&lt;/strong&gt; You haven't disagreed with AI in the past week.&lt;br&gt;
&lt;strong&gt;Good sign:&lt;/strong&gt; You regularly override AI when your insight says otherwise — and you're right more than you're wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad sign:&lt;/strong&gt; You can't tell the difference between AI output and human output.&lt;br&gt;
&lt;strong&gt;Good sign:&lt;/strong&gt; You use AI for speed and breadth, then add what only you can: context, judgment, and accountability.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Sentence
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;In the AI age, run AIs like a tournament: many compete, you decide, results judge everyone. Your insight is the one thing that scales with AI instead of being replaced by it.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the AI Agent Architecture series. For the technical deep dive behind these ideas: &lt;a href="https://harrisonsec.com/blog/ai-agent-90-percent-problem/" rel="noopener noreferrer"&gt;The 90% Problem&lt;/a&gt; and &lt;a href="https://harrisonsec.com/blog/claude-code-context-engineering-compression-pipeline/" rel="noopener noreferrer"&gt;Claude Code Deep Dive&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiagents</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>Node Turns Waiting Into Events. Go Moves Context Switching Into User Space.</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Tue, 28 Apr 2026 18:01:12 +0000</pubDate>
      <link>https://dev.to/harrisonsec/node-turns-waiting-into-events-go-moves-context-switching-into-user-space-58ik</link>
      <guid>https://dev.to/harrisonsec/node-turns-waiting-into-events-go-moves-context-switching-into-user-space-58ik</guid>
      <description>&lt;p&gt;Most discussions of TypeScript/Node vs Go concurrency stop at the surface: &lt;em&gt;Node is async, Go is threaded.&lt;/em&gt; That framing isn't wrong — it just isn't deep enough to be useful when you're picking a runtime, debugging a tail-latency problem, or explaining to your team why one of the services keeps falling over under CPU load.&lt;/p&gt;

&lt;p&gt;The real difference is not async vs threaded. It's a question about where, in the system, suspended work lives — and what shape it takes when it's resumed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Both Node and Go refuse to let the CPU sit idle while a request waits on I/O. They disagree on the unit of scheduling. Node's unit is the &lt;em&gt;continuation&lt;/em&gt; — the tail of an async function captured as a heap closure. Go's unit is the &lt;em&gt;goroutine&lt;/em&gt; — a full call stack the runtime can suspend and resume in user space. That single decision cascades into every other property of each runtime.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Wrong Question
&lt;/h2&gt;

&lt;p&gt;"Async vs threaded" is the wrong frame because it makes you think the choice is between paradigms. It isn't. Both runtimes have already made the &lt;em&gt;same&lt;/em&gt; fundamental decision: do not block an OS thread waiting for slow external work. The interesting choice is &lt;em&gt;how&lt;/em&gt; they implement that.&lt;/p&gt;

&lt;p&gt;The actually useful question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When a request is waiting for I/O — for a database, an HTTP call, a Redis round-trip, a file read — &lt;strong&gt;what does the CPU do, and where does the suspended state of that request live?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you frame it that way, Node and Go aren't opposites. They're two answers to the same question — and each answer cascades into a different language shape, a different library style, and a different failure mode under load.&lt;/p&gt;

&lt;p&gt;The naive blocking model answers the question with "an OS thread waits for the syscall to return." That model collapses around a few thousand concurrent connections — memory per thread, scheduler overhead, kernel context-switch cost. By 40,000 connections you're out of RAM, not CPU. Node and Go both refuse to do this. They diverge on &lt;em&gt;which resource gets freed up&lt;/em&gt; and &lt;em&gt;how the suspended work is captured for later resumption.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Node's Answer: Turn Waiting Into an Event
&lt;/h2&gt;

&lt;p&gt;Node's model can be summarized in one line: &lt;strong&gt;the JS main thread only executes code that's already ready to run.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Look at this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It reads as if the function is paused, blocking on the database. It isn't. Here's what V8 actually does at the bytecode level when it compiles an &lt;code&gt;async&lt;/code&gt; function: it rewrites the body into a state machine, with each &lt;code&gt;await&lt;/code&gt; becoming a state transition.&lt;/p&gt;

&lt;p&gt;The function above gets transformed into something equivalent to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;asyncFn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;promise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;closure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;                  &lt;span class="c1"&gt;// heap object holding locals&lt;/span&gt;

    &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
          &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;step&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;     &lt;span class="c1"&gt;// await → register continuation&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                         &lt;span class="c1"&gt;// ← function POPS here&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="nx"&gt;closure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;// resume: locals live in closure&lt;/span&gt;
          &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;closure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things to notice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;await&lt;/code&gt; is not a pause.&lt;/strong&gt; It's the point at which V8 returns from the function and pops the JS stack frame. The "rest of the function" is captured as a continuation registered on the awaited Promise via &lt;code&gt;.then&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local variables move to the heap.&lt;/strong&gt; Because the stack frame is gone, locals (&lt;code&gt;user&lt;/code&gt; here) live in a heap closure, accessible only when the state machine resumes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Each &lt;code&gt;await&lt;/code&gt; slices the function into another state.&lt;/strong&gt; A function with two &lt;code&gt;await&lt;/code&gt;s runs in three event-loop turns, with three independently-pushed JS frames, with all live state stored in heap closures between them.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That third point is the most non-obvious. A single &lt;code&gt;async&lt;/code&gt; function is &lt;strong&gt;not&lt;/strong&gt; one unit of execution — it's a sequence of fresh frames separated by event-loop turns:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBFTCBhcyBFdmVudCBMb29wIChsaWJ1dikKICAgIHBhcnRpY2lwYW50IEpTIGFzIEpTIE1haW4gVGhyZWFkIChWOCkKICAgIHBhcnRpY2lwYW50IEggYXMgSGVhcCAoY2xvc3VyZXMpCiAgICBwYXJ0aWNpcGFudCBLIGFzIEtlcm5lbCAvIEkvTwoKICAgIHJlY3QgcmdiKDI1NCwgMjQzLCAxOTkpCiAgICBOb3RlIG92ZXIgRUwsSzogVHVybiAxCiAgICBFTC0-PkpTOiBkaXNwYXRjaCBoYW5kbGVyKCkKICAgIGFjdGl2YXRlIEpTCiAgICBOb3RlIG92ZXIgSlM6IGNvbnN0IGEgPSAxCiAgICBKUy0-PkpTOiBjYWxsIGNvbXB1dGUxKCkg4oaSIHJldHVybnMgUHJvbWlzZQogICAgSlMtPj5IOiBWOCBzdG9yZXMgY2xvc3VyZSB7c3RhdGU6MSwgYX0KICAgIEpTLT4-SDogcmVnaXN0ZXIgc3RlcCBhcyAudGhlbiBoYW5kbGVyCiAgICBKUy0tPj5FTDogaGFuZGxlciBmcmFtZSBQT1BQRUQsIHJldHVybnMgUHJvbWlzZQogICAgZGVhY3RpdmF0ZSBKUwogICAgZW5kCgogICAgRUwtPj5LOiBlcG9sbF93YWl0IChubyBtaWNyb3Rhc2tzKQogICAgTm90ZSBvdmVyIEVMLEs6IC4uLiB0aW1lIHBhc3NlcywgT1MgdGhyZWFkIHBhcmtlZCAuLi4KICAgIEstLT4-RUw6IEkvTyByZWFkeSAoY29tcHV0ZTEgcmVzb2x2ZWQpCiAgICBFTC0-PkVMOiBlbnF1ZXVlIHN0ZXAgaW4gVjggbWljcm90YXNrIHF1ZXVlCgogICAgcmVjdCByZ2IoMjE5LCAyMzQsIDI1NCkKICAgIE5vdGUgb3ZlciBFTCxLOiBUdXJuIDIKICAgIEVMLT4-SlM6IGludm9rZSBzdGVwKHZhbHVlKSDigJQgTkVXIGZyYW1lCiAgICBhY3RpdmF0ZSBKUwogICAgSlMtPj5IOiBsb2FkIGNsb3N1cmUge3N0YXRlOjEsIGF9CiAgICBOb3RlIG92ZXIgSlM6IHggPSB2YWx1ZSwgc3RhdGUg4oaSIDIKICAgIEpTLT4-SlM6IGNhbGwgY29tcHV0ZTIoKSDihpIgcmV0dXJucyBQcm9taXNlCiAgICBKUy0-Pkg6IHJlZ2lzdGVyIHN0ZXAgKG5leHQgc3RhdGUpCiAgICBKUy0tPj5FTDogZnJhbWUgUE9QUEVEIGFnYWluCiAgICBkZWFjdGl2YXRlIEpTCiAgICBlbmQKCiAgICBLLS0-PkVMOiBjb21wdXRlMiByZXNvbHZlZAogICAgRUwtPj5FTDogZW5xdWV1ZSBzdGVwCgogICAgcmVjdCByZ2IoMjIwLCAyNTIsIDIzMSkKICAgIE5vdGUgb3ZlciBFTCxLOiBUdXJuIDMKICAgIEVMLT4-SlM6IGludm9rZSBzdGVwKHZhbHVlKSDigJQgeWV0IGFub3RoZXIgbmV3IGZyYW1lCiAgICBhY3RpdmF0ZSBKUwogICAgSlMtPj5IOiBsb2FkIGNsb3N1cmUge3N0YXRlOjIsIGEsIHh9CiAgICBOb3RlIG92ZXIgSlM6IHkgPSB2YWx1ZSwgc3RhdGUg4oaSIGRvbmUKICAgIEpTLT4-SlM6IHJlcy5qc29uKGEgKyB4ICsgeSkKICAgIEpTLS0-PkVMOiBoYW5kbGVyJ3MgUHJvbWlzZSByZXNvbHZlZAogICAgZGVhY3RpdmF0ZSBKUwogICAgZW5k" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBFTCBhcyBFdmVudCBMb29wIChsaWJ1dikKICAgIHBhcnRpY2lwYW50IEpTIGFzIEpTIE1haW4gVGhyZWFkIChWOCkKICAgIHBhcnRpY2lwYW50IEggYXMgSGVhcCAoY2xvc3VyZXMpCiAgICBwYXJ0aWNpcGFudCBLIGFzIEtlcm5lbCAvIEkvTwoKICAgIHJlY3QgcmdiKDI1NCwgMjQzLCAxOTkpCiAgICBOb3RlIG92ZXIgRUwsSzogVHVybiAxCiAgICBFTC0-PkpTOiBkaXNwYXRjaCBoYW5kbGVyKCkKICAgIGFjdGl2YXRlIEpTCiAgICBOb3RlIG92ZXIgSlM6IGNvbnN0IGEgPSAxCiAgICBKUy0-PkpTOiBjYWxsIGNvbXB1dGUxKCkg4oaSIHJldHVybnMgUHJvbWlzZQogICAgSlMtPj5IOiBWOCBzdG9yZXMgY2xvc3VyZSB7c3RhdGU6MSwgYX0KICAgIEpTLT4-SDogcmVnaXN0ZXIgc3RlcCBhcyAudGhlbiBoYW5kbGVyCiAgICBKUy0tPj5FTDogaGFuZGxlciBmcmFtZSBQT1BQRUQsIHJldHVybnMgUHJvbWlzZQogICAgZGVhY3RpdmF0ZSBKUwogICAgZW5kCgogICAgRUwtPj5LOiBlcG9sbF93YWl0IChubyBtaWNyb3Rhc2tzKQogICAgTm90ZSBvdmVyIEVMLEs6IC4uLiB0aW1lIHBhc3NlcywgT1MgdGhyZWFkIHBhcmtlZCAuLi4KICAgIEstLT4-RUw6IEkvTyByZWFkeSAoY29tcHV0ZTEgcmVzb2x2ZWQpCiAgICBFTC0-PkVMOiBlbnF1ZXVlIHN0ZXAgaW4gVjggbWljcm90YXNrIHF1ZXVlCgogICAgcmVjdCByZ2IoMjE5LCAyMzQsIDI1NCkKICAgIE5vdGUgb3ZlciBFTCxLOiBUdXJuIDIKICAgIEVMLT4-SlM6IGludm9rZSBzdGVwKHZhbHVlKSDigJQgTkVXIGZyYW1lCiAgICBhY3RpdmF0ZSBKUwogICAgSlMtPj5IOiBsb2FkIGNsb3N1cmUge3N0YXRlOjEsIGF9CiAgICBOb3RlIG92ZXIgSlM6IHggPSB2YWx1ZSwgc3RhdGUg4oaSIDIKICAgIEpTLT4-SlM6IGNhbGwgY29tcHV0ZTIoKSDihpIgcmV0dXJucyBQcm9taXNlCiAgICBKUy0-Pkg6IHJlZ2lzdGVyIHN0ZXAgKG5leHQgc3RhdGUpCiAgICBKUy0tPj5FTDogZnJhbWUgUE9QUEVEIGFnYWluCiAgICBkZWFjdGl2YXRlIEpTCiAgICBlbmQKCiAgICBLLS0-PkVMOiBjb21wdXRlMiByZXNvbHZlZAogICAgRUwtPj5FTDogZW5xdWV1ZSBzdGVwCgogICAgcmVjdCByZ2IoMjIwLCAyNTIsIDIzMSkKICAgIE5vdGUgb3ZlciBFTCxLOiBUdXJuIDMKICAgIEVMLT4-SlM6IGludm9rZSBzdGVwKHZhbHVlKSDigJQgeWV0IGFub3RoZXIgbmV3IGZyYW1lCiAgICBhY3RpdmF0ZSBKUwogICAgSlMtPj5IOiBsb2FkIGNsb3N1cmUge3N0YXRlOjIsIGEsIHh9CiAgICBOb3RlIG92ZXIgSlM6IHkgPSB2YWx1ZSwgc3RhdGUg4oaSIGRvbmUKICAgIEpTLT4-SlM6IHJlcy5qc29uKGEgKyB4ICsgeSkKICAgIEpTLS0-PkVMOiBoYW5kbGVyJ3MgUHJvbWlzZSByZXNvbHZlZAogICAgZGVhY3RpdmF0ZSBKUwogICAgZW5k" alt="sequenceDiagram" width="1103" height="1604"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is no "paused" function. There are only &lt;em&gt;captured continuations&lt;/em&gt; and &lt;em&gt;fresh frames that resume them&lt;/em&gt;. The event loop is the dispatcher: it watches for I/O readiness via libuv, for resolved Promises (via V8's microtask queue), for timers — and pulls the corresponding continuation onto the JS thread when it's ready to run. One thread can manage tens of thousands of concurrent connections, because at any moment only a handful of them have work that's actually ready.&lt;/p&gt;

&lt;p&gt;This is event-driven concurrency in its precise sense — the runtime turns "waiting" into a registered event, and only resumes the captured continuation when the event fires.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Visible Side Effect: Function Color
&lt;/h3&gt;

&lt;p&gt;Because the suspension point has to be marked at compile time, async-ness becomes part of the function's &lt;em&gt;type&lt;/em&gt;. A function that does I/O returns &lt;code&gt;Promise&amp;lt;T&amp;gt;&lt;/code&gt;. Its callers must &lt;code&gt;await&lt;/code&gt; it. Once they &lt;code&gt;await&lt;/code&gt;, they themselves return &lt;code&gt;Promise&amp;lt;T&amp;gt;&lt;/code&gt;. The "color" propagates up the call stack until you hit an async-aware entry point — typically the top of an HTTP handler or the event loop itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/" rel="noopener noreferrer"&gt;Bob Nystrom named this the function color problem&lt;/a&gt; in 2015. It's not a notation choice — it's a &lt;strong&gt;logical consequence of the stackless coroutine model&lt;/strong&gt;. V8 cannot save and restore arbitrary JS call stacks. The only way to express suspension is "return a Promise and be marked &lt;code&gt;async&lt;/code&gt;," and once one function does that, every function on the way up has to do the same.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBOb2RlWyI8Yj5Ob2RlIOKAlCBDb2xvciBDYXNjYWRlcyBVcCB0aGUgQ2FsbCBTdGFjazwvYj4iXQogICAgICAgIGRpcmVjdGlvbiBUQgogICAgICAgIG4xWyI8Yj5yZWFkRnJvbURCKCk8L2I-IPCfn6U8YnIvPuKGkiBQcm9taXNlJmx0O0RhdGEmZ3Q7PGJyLz48Yj5kb2VzIEkvTzwvYj4iXQogICAgICAgIG4yWyI8Yj5mZXRjaFVzZXIoKTwvYj4g8J-fpTxici8-4oaSIFByb21pc2UmbHQ7VXNlciZndDs8YnIvPjxiPm11c3QgYXdhaXQgcmVhZEZyb21EQjwvYj4iXQogICAgICAgIG4zWyI8Yj5oYW5kbGVSZXF1ZXN0KCk8L2I-IPCfn6U8YnIvPuKGkiBQcm9taXNlJmx0O1Jlc3BvbnNlJmd0Ozxici8-PGI-bXVzdCBhd2FpdCBmZXRjaFVzZXI8L2I-Il0KICAgICAgICBuNFsiPGI-cm91dGUoJy91c2VyJywgaGFuZGxlcik8L2I-IPCfn6U8YnIvPjxiPm11c3QgYWNjZXB0IFByb21pc2UgcmV0dXJuPC9iPiJdCiAgICAgICAgbjVbIjxiPm1haW4oKTwvYj4g8J-fpTxici8-4oaSIFByb21pc2UmbHQ7dm9pZCZndDs8YnIvPjxiPnRvcC1sZXZlbCBuZWVkcyBhd2FpdDwvYj4iXQogICAgICAgIG4xIC0uY29sb3IgaW5mZWN0cy4tPiBuMgogICAgICAgIG4yIC0uY29sb3IgaW5mZWN0cy4tPiBuMwogICAgICAgIG4zIC0uY29sb3IgaW5mZWN0cy4tPiBuNAogICAgICAgIG40IC0uY29sb3IgaW5mZWN0cy4tPiBuNQogICAgZW5kCgogICAgc3ViZ3JhcGggR29bIjxiPkdvIOKAlCBObyBDb2xvciwgTm8gQ2FzY2FkZTwvYj4iXQogICAgICAgIGRpcmVjdGlvbiBUQgogICAgICAgIGcxWyI8Yj5yZWFkRnJvbURCKCk8L2I-IOKsnDxici8-4oaSIERhdGE8YnIvPjxiPmJsb2NrcyBvbiBJL08gaW50ZXJuYWxseTwvYj4iXQogICAgICAgIGcyWyI8Yj5mZXRjaFVzZXIoKTwvYj4g4qycPGJyLz7ihpIgVXNlcjxici8-PGI-cGxhaW4gY2FsbDwvYj4iXQogICAgICAgIGczWyI8Yj5oYW5kbGVSZXF1ZXN0KCk8L2I-IOKsnDxici8-4oaSIFJlc3BvbnNlPGJyLz48Yj5wbGFpbiBjYWxsPC9iPiJdCiAgICAgICAgZzRbIjxiPnJvdXRlKCcvdXNlcicsIGhhbmRsZXIpPC9iPiDirJw8YnIvPjxiPmhhbmRsZXIgaXMgYSBwbGFpbiBmdW5jPC9iPiJdCiAgICAgICAgZzVbIjxiPm1haW4oKTwvYj4g4qycPGJyLz48Yj5wbGFpbiBmdW5jPC9iPiJdCiAgICAgICAgZzEgLS0-IGcyCiAgICAgICAgZzIgLS0-IGczCiAgICAgICAgZzMgLS0-IGc0CiAgICAgICAgZzQgLS0-IGc1CiAgICBlbmQKCiAgICBOb2RlIH5-fiBHbwoKICAgIGNsYXNzRGVmIHJlZENsYXNzIGZpbGw6I2ZlZTJlMixzdHJva2U6I2RjMjYyNixzdHJva2Utd2lkdGg6MnB4LGNvbG9yOiM3ZjFkMWQKICAgIGNsYXNzRGVmIHBsYWluQ2xhc3MgZmlsbDojZjNmNGY2LHN0cm9rZTojNmI3MjgwLHN0cm9rZS13aWR0aDoycHgsY29sb3I6IzExMTgyNwoKICAgIGNsYXNzIG4xLG4yLG4zLG40LG41IHJlZENsYXNzCiAgICBjbGFzcyBnMSxnMixnMyxnNCxnNSBwbGFpbkNsYXNz" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBOb2RlWyI8Yj5Ob2RlIOKAlCBDb2xvciBDYXNjYWRlcyBVcCB0aGUgQ2FsbCBTdGFjazwvYj4iXQogICAgICAgIGRpcmVjdGlvbiBUQgogICAgICAgIG4xWyI8Yj5yZWFkRnJvbURCKCk8L2I-IPCfn6U8YnIvPuKGkiBQcm9taXNlJmx0O0RhdGEmZ3Q7PGJyLz48Yj5kb2VzIEkvTzwvYj4iXQogICAgICAgIG4yWyI8Yj5mZXRjaFVzZXIoKTwvYj4g8J-fpTxici8-4oaSIFByb21pc2UmbHQ7VXNlciZndDs8YnIvPjxiPm11c3QgYXdhaXQgcmVhZEZyb21EQjwvYj4iXQogICAgICAgIG4zWyI8Yj5oYW5kbGVSZXF1ZXN0KCk8L2I-IPCfn6U8YnIvPuKGkiBQcm9taXNlJmx0O1Jlc3BvbnNlJmd0Ozxici8-PGI-bXVzdCBhd2FpdCBmZXRjaFVzZXI8L2I-Il0KICAgICAgICBuNFsiPGI-cm91dGUoJy91c2VyJywgaGFuZGxlcik8L2I-IPCfn6U8YnIvPjxiPm11c3QgYWNjZXB0IFByb21pc2UgcmV0dXJuPC9iPiJdCiAgICAgICAgbjVbIjxiPm1haW4oKTwvYj4g8J-fpTxici8-4oaSIFByb21pc2UmbHQ7dm9pZCZndDs8YnIvPjxiPnRvcC1sZXZlbCBuZWVkcyBhd2FpdDwvYj4iXQogICAgICAgIG4xIC0uY29sb3IgaW5mZWN0cy4tPiBuMgogICAgICAgIG4yIC0uY29sb3IgaW5mZWN0cy4tPiBuMwogICAgICAgIG4zIC0uY29sb3IgaW5mZWN0cy4tPiBuNAogICAgICAgIG40IC0uY29sb3IgaW5mZWN0cy4tPiBuNQogICAgZW5kCgogICAgc3ViZ3JhcGggR29bIjxiPkdvIOKAlCBObyBDb2xvciwgTm8gQ2FzY2FkZTwvYj4iXQogICAgICAgIGRpcmVjdGlvbiBUQgogICAgICAgIGcxWyI8Yj5yZWFkRnJvbURCKCk8L2I-IOKsnDxici8-4oaSIERhdGE8YnIvPjxiPmJsb2NrcyBvbiBJL08gaW50ZXJuYWxseTwvYj4iXQogICAgICAgIGcyWyI8Yj5mZXRjaFVzZXIoKTwvYj4g4qycPGJyLz7ihpIgVXNlcjxici8-PGI-cGxhaW4gY2FsbDwvYj4iXQogICAgICAgIGczWyI8Yj5oYW5kbGVSZXF1ZXN0KCk8L2I-IOKsnDxici8-4oaSIFJlc3BvbnNlPGJyLz48Yj5wbGFpbiBjYWxsPC9iPiJdCiAgICAgICAgZzRbIjxiPnJvdXRlKCcvdXNlcicsIGhhbmRsZXIpPC9iPiDirJw8YnIvPjxiPmhhbmRsZXIgaXMgYSBwbGFpbiBmdW5jPC9iPiJdCiAgICAgICAgZzVbIjxiPm1haW4oKTwvYj4g4qycPGJyLz48Yj5wbGFpbiBmdW5jPC9iPiJdCiAgICAgICAgZzEgLS0-IGcyCiAgICAgICAgZzIgLS0-IGczCiAgICAgICAgZzMgLS0-IGc0CiAgICAgICAgZzQgLS0-IGc1CiAgICBlbmQKCiAgICBOb2RlIH5-fiBHbwoKICAgIGNsYXNzRGVmIHJlZENsYXNzIGZpbGw6I2ZlZTJlMixzdHJva2U6I2RjMjYyNixzdHJva2Utd2lkdGg6MnB4LGNvbG9yOiM3ZjFkMWQKICAgIGNsYXNzRGVmIHBsYWluQ2xhc3MgZmlsbDojZjNmNGY2LHN0cm9rZTojNmI3MjgwLHN0cm9rZS13aWR0aDoycHgsY29sb3I6IzExMTgyNwoKICAgIGNsYXNzIG4xLG4yLG4zLG40LG41IHJlZENsYXNzCiAgICBjbGFzcyBnMSxnMixnMyxnNCxnNSBwbGFpbkNsYXNz" alt="flowchart LR" width="714" height="997"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Hard Limit
&lt;/h3&gt;

&lt;p&gt;The model fails the moment your code stops waiting. A single CPU-bound operation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* heavy work */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…holds the JS main thread, and &lt;em&gt;every other request on this process is dead&lt;/em&gt; until it returns. The event loop has nowhere else to go. Worker threads, child processes, or splitting CPU work into a separate service are real fixes, but they're escape hatches — they exist because the core model has only one main thread executing JS, and there is exactly one of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Go's Answer: Move Context Switching Into User Space
&lt;/h2&gt;

&lt;p&gt;Go writes synchronous code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sendResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no &lt;code&gt;await&lt;/code&gt;. There is no callback. The function looks like it blocks on the database. And yet the program scales to hundreds of thousands of concurrent operations on modest hardware.&lt;/p&gt;

&lt;p&gt;The trick is that the &lt;em&gt;scheduling boundary has been moved.&lt;/em&gt; Where Node has the programmer mark the suspension point with &lt;code&gt;await&lt;/code&gt; and the runtime captures a continuation, Go lets the programmer write straight-line code and has the &lt;em&gt;runtime&lt;/em&gt; suspend the entire goroutine when it hits a blocking I/O call.&lt;/p&gt;

&lt;p&gt;This is the central insight, and the cleanest one-line statement of Go's concurrency model:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Go's essence is the user-space-ification of context switching.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A goroutine isn't an OS thread. It's a small (initially 2 KB) growable stack and a register snapshot, managed by the Go runtime. The runtime maps a large number of goroutines (G) onto a small number of OS threads (M) using scheduling contexts (P). This is the GMP model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;G&lt;/strong&gt; — a goroutine. The unit of scheduling. Cheap to create, cheap to suspend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;M&lt;/strong&gt; — an OS thread. Usually only &lt;code&gt;GOMAXPROCS&lt;/code&gt; of them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P&lt;/strong&gt; — a scheduling context. Decides which G runs on which M.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;many G  →  Go scheduler  →  few M  →  CPU cores
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a goroutine hits a blocking syscall or a channel wait, the Go runtime suspends the goroutine — saves its stack and registers — detaches it from the current M, and schedules another runnable goroutine onto that M. When the original goroutine's wait completes, it's marked runnable again, and some M eventually picks it up and resumes execution from the suspension point. &lt;strong&gt;None of this enters the kernel.&lt;/strong&gt; No &lt;code&gt;clone(2)&lt;/code&gt;, no kernel-mediated thread switch, no kernel scheduler queue. The bookkeeping is all in user space.&lt;/p&gt;

&lt;p&gt;That's the user-space-ification. The CPU still has to switch contexts when work shifts between goroutines, but the cost is roughly a function call plus a stack swap — not a kernel-mediated thread switch.&lt;/p&gt;

&lt;p&gt;The key contrast with Node's model is in &lt;em&gt;where the suspended state lives:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBOb2RlWyI8Yj5Ob2RlIOKAlCBTdGFja2xlc3MgQ29yb3V0aW5lPC9iPiJdCiAgICAgICAgZGlyZWN0aW9uIFRCCiAgICAgICAgblN0YWNrWyI8Yj5KUyBDYWxsIFN0YWNrPC9iPjxici8-KG9uZSBmcmFtZSBhdCBhIHRpbWUpPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPuKaoCA8Yj5jdXJyZW50bHkgZW1wdHk8L2I-PGJyLz4oYWxsIGFzeW5jIGZucyBwb3BwZWQsPGJyLz53YWl0aW5nIGluIGV2ZW50IGxvb3ApIl0KICAgICAgICBuSGVhcFsiPGI-SGVhcDwvYj4iXQogICAgICAgIG5DMVsiPGI-Y29udGludWF0aW9uICMxPC9iPjxici8-eyBzdGF0ZTogMSw8YnIvPiZuYnNwOyZuYnNwO2xvY2Fsczoge3JlcSwgcmVzLCBhfSw8YnIvPiZuYnNwOyZuYnNwO3N0ZXA6IGZuIHB0ciB9Il0KICAgICAgICBuQzJbIjxiPmNvbnRpbnVhdGlvbiAjMjwvYj48YnIvPnsgc3RhdGU6IDAsIC4uLiB9Il0KICAgICAgICBuQzNbIjxiPmNvbnRpbnVhdGlvbiAjMzwvYj48YnIvPnsgc3RhdGU6IDIsIC4uLiB9Il0KICAgICAgICBuSGVhcCAtLT4gbkMxCiAgICAgICAgbkhlYXAgLS0-IG5DMgogICAgICAgIG5IZWFwIC0tPiBuQzMKICAgICAgICBuTm90ZVsiPGI-RWFjaCA8Y29kZT5hd2FpdDwvY29kZT4gcG9wcyB0aGUgZnJhbWUuPC9iPjxici8-U3RhdGUgbGl2ZXMgb25seSBpbiBoZWFwIGNsb3N1cmVzLjxici8-U3RhY2sgaXMgcmV1c2VkIGFjcm9zcyBhbGwgdHVybnMuIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIEdvWyI8Yj5HbyDigJQgU3RhY2tmdWwgQ29yb3V0aW5lPC9iPiJdCiAgICAgICAgZGlyZWN0aW9uIFRCCiAgICAgICAgZ01bIjxiPk9TIFRocmVhZCAoTSk8L2I-PGJyLz5jdXJyZW50bHkgcnVubmluZyBHMyDilrYiXQogICAgICAgIGdIZWFwWyI8Yj5IZWFwPC9iPiJdCiAgICAgICAgZ0cxWyI8Yj5nb3JvdXRpbmUgRzE8L2I-ICgyIEtCIHN0YWNrKTxici8-4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSBPGJyLz5wcm9jZXNzKCk8YnIvPiZuYnNwOyZuYnNwO-KGsyBzbG93RG91YmxlKCk8YnIvPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwO-KGsyB0aW1lLlNsZWVwKCkg4piFcGFya2VkIl0KICAgICAgICBnRzJbIjxiPmdvcm91dGluZSBHMjwvYj4gKDIgS0Igc3RhY2spPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPmhhbmRsZXIoKTxici8-Jm5ic3A7Jm5ic3A74oazIGRiLlF1ZXJ5KCkg4piFcGFya2VkIl0KICAgICAgICBnRzNbIjxiPmdvcm91dGluZSBHMzwvYj4gKDIgS0Igc3RhY2spPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPmN1cnJlbnRseSBvbiBNIOKWtiJdCiAgICAgICAgZ0hlYXAgLS0-IGdHMQogICAgICAgIGdIZWFwIC0tPiBnRzIKICAgICAgICBnSGVhcCAtLT4gZ0czCiAgICAgICAgZ05vdGVbIjxiPkVhY2ggZ29yb3V0aW5lIG93bnMgaXRzIGZ1bGwgc3RhY2suPC9iPjxici8-UnVudGltZSBzYXZlcy9yZXN0b3JlcyBlbnRpcmUgc3RhY2s8YnIvPm9uIHN1c3BlbmQuIE5vIGZyYW1lIHBvcCBuZWVkZWQuIl0KICAgIGVuZAoKICAgIE5vZGUgfn5-IEdvCgogICAgY2xhc3NEZWYgbm9kZUFsZXJ0IGZpbGw6I2ZlZTJlMixzdHJva2U6I2RjMjYyNixzdHJva2Utd2lkdGg6M3B4LGNvbG9yOiM3ZjFkMWQKICAgIGNsYXNzRGVmIG5vZGVDbGFzcyBmaWxsOiNmZWYzYzcsc3Ryb2tlOiNkOTc3MDYsY29sb3I6IzExMTgyNwogICAgY2xhc3NEZWYgZ29DbGFzcyBmaWxsOiNkYmVhZmUsc3Ryb2tlOiMyNTYzZWIsY29sb3I6IzExMTgyNwogICAgY2xhc3NEZWYgbm90ZUNsYXNzIGZpbGw6I2ZmZmZmZixzdHJva2U6IzM3NDE1MSxzdHJva2Utd2lkdGg6MS41cHgsY29sb3I6IzExMTgyNwoKICAgIGNsYXNzIG5TdGFjayBub2RlQWxlcnQKICAgIGNsYXNzIG5IZWFwLG5DMSxuQzIsbkMzIG5vZGVDbGFzcwogICAgY2xhc3MgZ00sZ0hlYXAsZ0cxLGdHMixnRzMgZ29DbGFzcwogICAgY2xhc3Mgbk5vdGUsZ05vdGUgbm90ZUNsYXNz" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBOb2RlWyI8Yj5Ob2RlIOKAlCBTdGFja2xlc3MgQ29yb3V0aW5lPC9iPiJdCiAgICAgICAgZGlyZWN0aW9uIFRCCiAgICAgICAgblN0YWNrWyI8Yj5KUyBDYWxsIFN0YWNrPC9iPjxici8-KG9uZSBmcmFtZSBhdCBhIHRpbWUpPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPuKaoCA8Yj5jdXJyZW50bHkgZW1wdHk8L2I-PGJyLz4oYWxsIGFzeW5jIGZucyBwb3BwZWQsPGJyLz53YWl0aW5nIGluIGV2ZW50IGxvb3ApIl0KICAgICAgICBuSGVhcFsiPGI-SGVhcDwvYj4iXQogICAgICAgIG5DMVsiPGI-Y29udGludWF0aW9uICMxPC9iPjxici8-eyBzdGF0ZTogMSw8YnIvPiZuYnNwOyZuYnNwO2xvY2Fsczoge3JlcSwgcmVzLCBhfSw8YnIvPiZuYnNwOyZuYnNwO3N0ZXA6IGZuIHB0ciB9Il0KICAgICAgICBuQzJbIjxiPmNvbnRpbnVhdGlvbiAjMjwvYj48YnIvPnsgc3RhdGU6IDAsIC4uLiB9Il0KICAgICAgICBuQzNbIjxiPmNvbnRpbnVhdGlvbiAjMzwvYj48YnIvPnsgc3RhdGU6IDIsIC4uLiB9Il0KICAgICAgICBuSGVhcCAtLT4gbkMxCiAgICAgICAgbkhlYXAgLS0-IG5DMgogICAgICAgIG5IZWFwIC0tPiBuQzMKICAgICAgICBuTm90ZVsiPGI-RWFjaCA8Y29kZT5hd2FpdDwvY29kZT4gcG9wcyB0aGUgZnJhbWUuPC9iPjxici8-U3RhdGUgbGl2ZXMgb25seSBpbiBoZWFwIGNsb3N1cmVzLjxici8-U3RhY2sgaXMgcmV1c2VkIGFjcm9zcyBhbGwgdHVybnMuIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIEdvWyI8Yj5HbyDigJQgU3RhY2tmdWwgQ29yb3V0aW5lPC9iPiJdCiAgICAgICAgZGlyZWN0aW9uIFRCCiAgICAgICAgZ01bIjxiPk9TIFRocmVhZCAoTSk8L2I-PGJyLz5jdXJyZW50bHkgcnVubmluZyBHMyDilrYiXQogICAgICAgIGdIZWFwWyI8Yj5IZWFwPC9iPiJdCiAgICAgICAgZ0cxWyI8Yj5nb3JvdXRpbmUgRzE8L2I-ICgyIEtCIHN0YWNrKTxici8-4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSBPGJyLz5wcm9jZXNzKCk8YnIvPiZuYnNwOyZuYnNwO-KGsyBzbG93RG91YmxlKCk8YnIvPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwO-KGsyB0aW1lLlNsZWVwKCkg4piFcGFya2VkIl0KICAgICAgICBnRzJbIjxiPmdvcm91dGluZSBHMjwvYj4gKDIgS0Igc3RhY2spPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPmhhbmRsZXIoKTxici8-Jm5ic3A7Jm5ic3A74oazIGRiLlF1ZXJ5KCkg4piFcGFya2VkIl0KICAgICAgICBnRzNbIjxiPmdvcm91dGluZSBHMzwvYj4gKDIgS0Igc3RhY2spPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPmN1cnJlbnRseSBvbiBNIOKWtiJdCiAgICAgICAgZ0hlYXAgLS0-IGdHMQogICAgICAgIGdIZWFwIC0tPiBnRzIKICAgICAgICBnSGVhcCAtLT4gZ0czCiAgICAgICAgZ05vdGVbIjxiPkVhY2ggZ29yb3V0aW5lIG93bnMgaXRzIGZ1bGwgc3RhY2suPC9iPjxici8-UnVudGltZSBzYXZlcy9yZXN0b3JlcyBlbnRpcmUgc3RhY2s8YnIvPm9uIHN1c3BlbmQuIE5vIGZyYW1lIHBvcCBuZWVkZWQuIl0KICAgIGVuZAoKICAgIE5vZGUgfn5-IEdvCgogICAgY2xhc3NEZWYgbm9kZUFsZXJ0IGZpbGw6I2ZlZTJlMixzdHJva2U6I2RjMjYyNixzdHJva2Utd2lkdGg6M3B4LGNvbG9yOiM3ZjFkMWQKICAgIGNsYXNzRGVmIG5vZGVDbGFzcyBmaWxsOiNmZWYzYzcsc3Ryb2tlOiNkOTc3MDYsY29sb3I6IzExMTgyNwogICAgY2xhc3NEZWYgZ29DbGFzcyBmaWxsOiNkYmVhZmUsc3Ryb2tlOiMyNTYzZWIsY29sb3I6IzExMTgyNwogICAgY2xhc3NEZWYgbm90ZUNsYXNzIGZpbGw6I2ZmZmZmZixzdHJva2U6IzM3NDE1MSxzdHJva2Utd2lkdGg6MS41cHgsY29sb3I6IzExMTgyNwoKICAgIGNsYXNzIG5TdGFjayBub2RlQWxlcnQKICAgIGNsYXNzIG5IZWFwLG5DMSxuQzIsbkMzIG5vZGVDbGFzcwogICAgY2xhc3MgZ00sZ0hlYXAsZ0cxLGdHMixnRzMgZ29DbGFzcwogICAgY2xhc3Mgbk5vdGUsZ05vdGUgbm90ZUNsYXNz" alt="flowchart LR" width="1904" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Node, the JS call stack is shared and almost always near-empty — every async function in flight has already popped, with its state sitting in a heap closure. In Go, every goroutine owns its full call chain on its own heap-allocated stack; suspended goroutines look like frozen frames waiting for the runtime to resume them on some OS thread.&lt;/p&gt;

&lt;p&gt;This is also why neither language can simply borrow the other's model. &lt;strong&gt;Node runs on V8&lt;/strong&gt;, which was designed in 2008 for browser JS — single call stack, synchronous semantics, no concept of saving stacks across yields. Adding stackful coroutines would mean rewriting the engine, which is roughly what Java's Project Loom did to the JVM at huge cost. &lt;strong&gt;Go was designed from scratch&lt;/strong&gt; with a runtime that owns stacks, can grow them, and can save them. The choice is locked in by runtime architecture, not language taste.&lt;/p&gt;




&lt;h2&gt;
  
  
  What "User-Space" Actually Buys You
&lt;/h2&gt;

&lt;p&gt;The slogan only matters if user-space context switching is meaningfully cheaper than the kernel-mediated kind. It is — by more than an order of magnitude.&lt;/p&gt;

&lt;p&gt;Two goroutines pinned to one OS thread (&lt;code&gt;GOMAXPROCS=1&lt;/code&gt;), ping-ponging via &lt;code&gt;runtime.Gosched()&lt;/code&gt; and via an unbuffered channel. Two pthreads pinned to one core (&lt;code&gt;taskset -c 0&lt;/code&gt;), ping-ponging via &lt;code&gt;pthread_mutex&lt;/code&gt; + &lt;code&gt;pthread_cond&lt;/code&gt;. (Reproduction code at the end of the post.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measured on Intel N100, Ubuntu 24.04 (kernel 6.8.0), Go 1.23.4, gcc 13.3:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;ns / switch&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Goroutine yield (&lt;code&gt;runtime.Gosched&lt;/code&gt;, GOMAXPROCS=1)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~102 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Goroutine round-trip via unbuffered channel&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~436 ns&lt;/strong&gt; (≈218 ns per G-switch + channel coordination)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pthread switch (mutex+cond ping-pong, single core)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~2,900 ns&lt;/strong&gt; (range 2,818–3,611 across 5 runs of 2M iterations)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Ratio: roughly &lt;strong&gt;28× cheaper&lt;/strong&gt; for the bare scheduler yield, &lt;strong&gt;~13× cheaper&lt;/strong&gt; for the apples-to-apples synchronized round-trip.&lt;/p&gt;

&lt;p&gt;Where the gap comes from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mode switch.&lt;/strong&gt; The user → kernel → user round-trip alone is ~100 ns of entry/exit and ABI-mandated register save/restore. A goroutine switch never crosses that line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduler work in kernel space.&lt;/strong&gt; Linux CFS maintains a red-black tree of runnable threads with locked, cross-CPU runqueues. The Go scheduler does the same job in user space with per-P local runqueues and lock-free fast paths — and skips the kernel locks entirely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache and TLB effects.&lt;/strong&gt; A kernel scheduler may migrate a thread to a different core, costing you cold L1/L2 and an instruction-cache reload. Goroutines normally stay on the same M, so the cache stays warm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What the model does &lt;em&gt;not&lt;/em&gt; buy you: a goroutine that makes a real blocking syscall still pays for a real OS thread switch — the runtime detaches the G from its M and may spin up another M so the rest of the goroutines keep running. Async preemption (Go 1.14+, signal-based) is the runtime's answer to tight loops that never yield, and it has its own cost. Once you saturate &lt;code&gt;GOMAXPROCS&lt;/code&gt;, the user-space runqueue itself starts to show up in profiles.&lt;/p&gt;

&lt;p&gt;The "user-space-ification" buys you &lt;strong&gt;cheap G-to-G switching on a hot M.&lt;/strong&gt; That's where the order-of-magnitude lives. The syscalls, the M-to-M handoffs, the actual kernel work — those are still as expensive as they always were. The model wins by making the &lt;em&gt;common case&lt;/em&gt; — many concurrent goroutines, mostly waiting, occasionally running — almost free.&lt;/p&gt;

&lt;p&gt;(N100 is a low-power Alder Lake-N E-core; absolute numbers will be smaller on a server-class Xeon or EPYC, but the ratio is expected to hold.)&lt;/p&gt;




&lt;h2&gt;
  
  
  The Unit of Scheduling
&lt;/h2&gt;

&lt;p&gt;The cleanest comparison is to ask what each runtime actually schedules:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Node / TypeScript&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Go&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Unit of scheduling&lt;/td&gt;
&lt;td&gt;callback / Promise continuation&lt;/td&gt;
&lt;td&gt;goroutine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What's captured at suspension&lt;/td&gt;
&lt;td&gt;tail of an async function as a heap closure&lt;/td&gt;
&lt;td&gt;full call stack + registers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How code looks&lt;/td&gt;
&lt;td&gt;explicit &lt;code&gt;async&lt;/code&gt;/&lt;code&gt;await&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;straight-line synchronous&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Suspension marked by&lt;/td&gt;
&lt;td&gt;the programmer (&lt;code&gt;await&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;the runtime (any blocking op)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Suspended state lives in&lt;/td&gt;
&lt;td&gt;V8 microtask queue + heap closure&lt;/td&gt;
&lt;td&gt;goroutine stack on the user-space heap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel involvement&lt;/td&gt;
&lt;td&gt;epoll/kqueue/IOCP via libuv&lt;/td&gt;
&lt;td&gt;epoll/kqueue/IOCP via netpoller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU parallelism&lt;/td&gt;
&lt;td&gt;one main JS thread; needs workers/cluster for cores&lt;/td&gt;
&lt;td&gt;M:N scheduler runs goroutines across cores natively&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Function color&lt;/td&gt;
&lt;td&gt;yes (Promise infects up the call stack)&lt;/td&gt;
&lt;td&gt;no (any function may block)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What breaks under CPU load&lt;/td&gt;
&lt;td&gt;the entire event loop&lt;/td&gt;
&lt;td&gt;nothing — scheduler runs another G on another M&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The two columns describe deeply different mental models, but they belong to the same family. They are both &lt;em&gt;user-space concurrency runtimes that avoid kernel thread-per-request.&lt;/em&gt; They differ in where the suspension is captured (the language vs. the call stack) and how broad the scheduler's mandate is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the Boundaries Diverge: CPU-Bound Work
&lt;/h2&gt;

&lt;p&gt;Node and Go look interchangeable on I/O-bound workloads. They diverge sharply the moment CPU work enters the picture.&lt;/p&gt;

&lt;p&gt;Node's event loop has one job: dispatch ready callbacks onto a single JS thread. If a callback runs for 200 ms doing JSON parsing or hashing, the loop is &lt;em&gt;frozen&lt;/em&gt; for those 200 ms. Every other suspended continuation has to wait. Throughput collapses.&lt;/p&gt;

&lt;p&gt;Go's runtime has a different mandate. It doesn't only manage waiting — it also manages execution. If you spawn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;task1&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;task2&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;task3&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…the scheduler is happy to put each goroutine on a different M, run them on different cores in true parallel, and preempt long-running goroutines so they don't starve the rest of the runtime. CPU-bound goroutines aren't a special case to work around. They're just goroutines.&lt;/p&gt;

&lt;p&gt;That's why Go's concurrency model covers more ground:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Node's model mainly solves non-CPU-bound concurrency — network I/O, database waits, downstream API calls. Go's model solves I/O waiting &lt;em&gt;and&lt;/em&gt; CPU parallelism with the same primitive.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't a knock on Node. The event loop is brilliant at what it's designed for: lots of slow waits, light per-request CPU. It's the natural shape of API gateways, BFFs, websocket hubs, real-time aggregation, and most of the JSON-shuffling that makes up modern web backends. But sustained CPU work, mixed CPU + I/O pipelines, long-lived infrastructure services — those are workloads where Go's scheduler-driven model has more headroom built in.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Answers to the Same Question
&lt;/h2&gt;

&lt;p&gt;Strip away the implementation details and the two runtimes are answering the same question with different abstractions:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Concurrency at scale is the problem of what to do with the CPU while a request waits on I/O.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Node's answer: turn the wait into an event, capture the rest of the function as a continuation, resume the continuation when the event fires. One thread cycling through ready continuations.&lt;/p&gt;

&lt;p&gt;Go's answer: run the request on a goroutine, suspend the goroutine in user space when it blocks, schedule another runnable goroutine onto the OS thread, resume the original when its wait completes.&lt;/p&gt;

&lt;p&gt;Two ways of solving the same waste. One state-machines it. The other lowers the cost of context switching far enough that you can afford to keep one execution flow per request.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Two answers to one question: one is events, implemented as a state machine. The other is low-cost user-space context switching.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But there's a deeper layer worth surfacing. The two answers also disagree about &lt;em&gt;whether suspension should be visible in the type system.&lt;/em&gt; Node says yes — &lt;code&gt;Promise&amp;lt;T&amp;gt;&lt;/code&gt; is part of the signature, &lt;code&gt;async&lt;/code&gt; is part of the contract, function color propagates. Go says no — any function may block, and the type doesn't carry that information.&lt;/p&gt;

&lt;p&gt;This visibility-vs-uniformity trade-off shows up far beyond Node and Go. It's the same shape as monadic IO vs implicit IO in Haskell, checked vs unchecked exceptions in Java, capability-based security vs ambient authority. Each pair makes the same trade: composable static reasoning vs ergonomic uniform code. Node and Go are picking sides of a much bigger question.&lt;/p&gt;

&lt;p&gt;You see the consequence in the libraries. Node libraries publish &lt;code&gt;fs.readFile&lt;/code&gt; &lt;em&gt;and&lt;/em&gt; &lt;code&gt;fs.readFileSync&lt;/code&gt;, two retry helpers (one for sync ops, one for async), &lt;code&gt;p-limit&lt;/code&gt;-style bounded-concurrency wrappers around &lt;code&gt;Promise.all&lt;/code&gt;. Go libraries publish &lt;code&gt;os.ReadFile&lt;/code&gt; (one function), one &lt;code&gt;Retry(op func() error, n int) error&lt;/code&gt;, twenty lines of &lt;code&gt;chan&lt;/code&gt; + &lt;code&gt;WaitGroup&lt;/code&gt; for bounded concurrency. The Go versions aren't simpler because Go developers are smarter — they're simpler because the runtime hides the same complexity that Node's type system insists on exposing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Closing Line
&lt;/h2&gt;

&lt;p&gt;If you remember one thing from this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Node turns waiting into events. Go turns execution flows into schedulable units. Both refuse to let the CPU sit idle while I/O blocks — they just disagree on what the unit of scheduling should be.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or, if you want the deeper layer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Node makes "this function might suspend" visible at the type level. Go makes it invisible.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the whole story. Everything else — &lt;code&gt;await&lt;/code&gt; vs &lt;code&gt;go&lt;/code&gt;, libuv vs the netpoller, V8's microtask queue vs GMP, single-thread bottleneck vs CPU-bound resilience, libraries that look complicated vs libraries that look simple — falls out of that one disagreement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: Reproduce the Benchmark
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;goroutine_switch_test.go&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;GOMAXPROCS=1 go test -bench=. -benchtime=5s -count=5&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;bench&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"runtime"&lt;/span&gt;
    &lt;span class="s"&gt;"sync"&lt;/span&gt;
    &lt;span class="s"&gt;"testing"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Channel ping-pong: each iter is a full round-trip = 2 G-switches.&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkGoroutineSwitchChannel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="n"&gt;done&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}{}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResetTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}{}&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StopTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Bare scheduler yield. Each iter ≈ 1 G-switch.&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkGoroutineSwitchGosched&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;half&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;half&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Gosched&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResetTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;half&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Gosched&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;pthread_switch.c&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;gcc -O2 -o pthread_switch pthread_switch.c -lpthread &amp;amp;&amp;amp; taskset -c 0 ./pthread_switch 2000000&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define _GNU_SOURCE
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;pthread.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;time.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;pthread_mutex_t&lt;/span&gt; &lt;span class="n"&gt;mu&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PTHREAD_MUTEX_INITIALIZER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;pthread_cond_t&lt;/span&gt;  &lt;span class="n"&gt;cv&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PTHREAD_COND_INITIALIZER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;    &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;            &lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nf"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;my_turn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;intptr_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;pthread_mutex_lock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;my_turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;pthread_cond_wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;my_turn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;pthread_cond_broadcast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;pthread_mutex_unlock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="nf"&gt;now_ns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;timespec&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;clock_gettime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CLOCK_MONOTONIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tv_sec&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1e9&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tv_nsec&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;iters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;atol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000000L&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;pthread_t&lt;/span&gt; &lt;span class="n"&gt;t0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now_ns&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;pthread_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;t0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;intptr_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;pthread_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;intptr_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;pthread_join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;pthread_join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now_ns&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ns / switch: %.1f&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;GOMAXPROCS=1&lt;/code&gt; forces both goroutines onto the same M so we measure pure G-to-G switching, not cross-core migration. &lt;code&gt;taskset -c 0&lt;/code&gt; pins both pthreads to one CPU so they actually have to context-switch (otherwise they run in parallel on two cores and there is nothing to measure). Both benches do the simplest possible synchronized hand-off — no I/O, no real work — so what is left is the cost of the switch itself.&lt;/p&gt;

</description>
      <category>go</category>
      <category>node</category>
      <category>concurrency</category>
      <category>javascript</category>
    </item>
    <item>
      <title>gRPC Interceptors in Production: Design Patterns That Survive Real Load</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 17:02:20 +0000</pubDate>
      <link>https://dev.to/harrisonsec/grpc-interceptors-in-production-design-patterns-that-survive-real-load-372h</link>
      <guid>https://dev.to/harrisonsec/grpc-interceptors-in-production-design-patterns-that-survive-real-load-372h</guid>
      <description>&lt;p&gt;gRPC interceptors are the middleware pattern, specialized for gRPC. If you've written HTTP middleware before, the shape is familiar — a function that wraps a call, can observe or modify the request, pass to the next handler, then observe or modify the response. The difference: gRPC's type system makes the flavors (unary, server-stream, client-stream, bidi) explicit, and chain ordering matters more than most people realize.&lt;/p&gt;

&lt;p&gt;Most online examples show a single toy interceptor. Production systems stack five to ten of them per service. Getting the composition right — ordering, concern separation, testability — is half of running a gRPC-based microservice well.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — gRPC interceptors are middleware with more explicit types. Chain them outside-in: observability wraps everything, then throttling, then auth, then retry, then the actual service. Keep each interceptor focused on one concern; the moment an interceptor does two things you're writing coupled middleware. Stream interceptors are trickier than unary — don't copy-paste unary logic into stream without thinking. Test the chain composition with bufconn, not just each interceptor in isolation.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Four Interceptor Types
&lt;/h2&gt;

&lt;p&gt;gRPC has four interceptor signatures, two for client, two for server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unary server interceptor&lt;/strong&gt;: wraps a single request → single response call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream server interceptor&lt;/strong&gt;: wraps streaming RPCs (server-stream, client-stream, bidi).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unary client interceptor&lt;/strong&gt;: wraps the client side of a unary call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream client interceptor&lt;/strong&gt;: wraps the client side of a streaming call.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unary interceptors are easy. Stream interceptors are harder because you're wrapping a bidirectional wire, not a single call.&lt;/p&gt;

&lt;p&gt;Example unary server interceptor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;loggingInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"method=%s duration=%s err=%v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Since&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loggingInterceptor&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Straightforward. Now stack five of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chaining and Order
&lt;/h2&gt;

&lt;p&gt;Real services need multiple interceptors. gRPC's standard library gives you &lt;code&gt;grpc.ChainUnaryInterceptor(...)&lt;/code&gt; (since 1.25), or you can use &lt;code&gt;google.golang.org/grpc/interceptor&lt;/code&gt; helpers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChainUnaryInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;observabilityInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c"&gt;// outermost&lt;/span&gt;
        &lt;span class="n"&gt;rateLimitInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;authInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;validationInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;businessLogicContextInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c"&gt;// innermost&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Chain order matters &lt;em&gt;enormously&lt;/em&gt;. Interceptors execute outside-in on the way to the handler, inside-out on the way back. Put the wrong interceptor outside the wrong one and you get bugs that are hard to debug.&lt;/p&gt;

&lt;p&gt;Canonical order I use:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBDbGllbnQoW2dSUEMgY2xpZW50XSkgLS0-IEkxCiAgICBJMVsiT2JzZXJ2YWJpbGl0eTxici8-dHJhY2luZyDCtyBtZXRyaWNzIMK3IGxvZ2dpbmciXSAtLT4gSTIKICAgIEkyWyJSYXRlIGxpbWl0aW5nIC8gcXVvdGEiXSAtLT4gSTMKICAgIEkzWyJBdXRoPGJyLz5hdXRobiDCtyBhdXRoeiJdIC0tPiBJNAogICAgSTRbIlZhbGlkYXRpb24iXSAtLT4gSTUKICAgIEk1WyJSZXRyeSAvIGlkZW1wb3RlbmN5Il0gLS0-IEk2CiAgICBJNlsiQ29udGV4dCBlbnJpY2htZW50Il0gLS0-IEhhbmRsZXJ7eyJCdXNpbmVzcyBoYW5kbGVyIn19CgogICAgY2xhc3NEZWYgb3V0ZXIgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiBtaWQgZmlsbDojZThmNGY4LHN0cm9rZTojMmM1MjgyCiAgICBjbGFzc0RlZiBpbm5lciBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzIEkxIG91dGVyCiAgICBjbGFzcyBJMixJMyxJNCBtaWQKICAgIGNsYXNzIEk1LEk2IGlubmVy" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBDbGllbnQoW2dSUEMgY2xpZW50XSkgLS0-IEkxCiAgICBJMVsiT2JzZXJ2YWJpbGl0eTxici8-dHJhY2luZyDCtyBtZXRyaWNzIMK3IGxvZ2dpbmciXSAtLT4gSTIKICAgIEkyWyJSYXRlIGxpbWl0aW5nIC8gcXVvdGEiXSAtLT4gSTMKICAgIEkzWyJBdXRoPGJyLz5hdXRobiDCtyBhdXRoeiJdIC0tPiBJNAogICAgSTRbIlZhbGlkYXRpb24iXSAtLT4gSTUKICAgIEk1WyJSZXRyeSAvIGlkZW1wb3RlbmN5Il0gLS0-IEk2CiAgICBJNlsiQ29udGV4dCBlbnJpY2htZW50Il0gLS0-IEhhbmRsZXJ7eyJCdXNpbmVzcyBoYW5kbGVyIn19CgogICAgY2xhc3NEZWYgb3V0ZXIgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiBtaWQgZmlsbDojZThmNGY4LHN0cm9rZTojMmM1MjgyCiAgICBjbGFzc0RlZiBpbm5lciBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzIEkxIG91dGVyCiAgICBjbGFzcyBJMixJMyxJNCBtaWQKICAgIGNsYXNzIEk1LEk2IGlubmVy" alt="Client([gRPC client]) --&amp;gt; I1" width="1784" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Outside-in on the way to the handler, inside-out on the way back. Observability must wrap everything — so it sees every rejection, every rate-limit hit, every failed auth — otherwise you have operational blind spots. Details:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Observability (tracing + metrics + logging)&lt;/strong&gt; — outermost. You want to see every request, including the ones that get rejected by later interceptors. If observability is inside auth, unauth'd attempts are invisible — a security-relevant blind spot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rate limiting / quota&lt;/strong&gt; — before auth. Why? Because auth involves token verification (DB lookup, JWT parsing, external identity service), and you don't want unauthenticated requests to cost you CPU. Rate-limit first, authenticate second.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Auth (authentication + authorization)&lt;/strong&gt; — before business logic. Reject unauthenticated/unauthorized requests early.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Validation (request shape, basic sanity)&lt;/strong&gt; — before business logic. Catches malformed requests before they hit service code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retry / idempotency handling&lt;/strong&gt; — closer to business. Only retry what actually made it through auth.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Request context enrichment (trace IDs, user metadata)&lt;/strong&gt; — innermost. Populate context with validated data for the service to use.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Inverted order produces real bugs. I've seen auth outside observability (auth failures weren't logged). Retry outside rate limiter (a retry storm blew through the rate limit). Validation outside observability (validation failures invisible in metrics). Each one a real incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping Interceptors Focused
&lt;/h2&gt;

&lt;p&gt;The rule: &lt;strong&gt;one concern per interceptor&lt;/strong&gt;. The moment you have an "auth-and-logging" interceptor, you're coupling concerns that should evolve separately.&lt;/p&gt;

&lt;p&gt;Concretely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't: single "observability" interceptor that does tracing, metrics, and logging in one function.&lt;/li&gt;
&lt;li&gt;Do: three interceptors (&lt;code&gt;tracingInterceptor&lt;/code&gt;, &lt;code&gt;metricsInterceptor&lt;/code&gt;, &lt;code&gt;loggingInterceptor&lt;/code&gt;), chained.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cost: three function-call overheads instead of one. Marginal.&lt;/p&gt;

&lt;p&gt;Benefit: you can swap tracing backends without touching logging. You can disable metrics in tests without disabling tracing. Each interceptor is testable in isolation.&lt;/p&gt;

&lt;p&gt;This is the same argument for Unix pipes over monolithic commands. Composition beats monoliths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Interceptor Recipes
&lt;/h2&gt;

&lt;p&gt;Real interceptors I've written variants of many times:&lt;/p&gt;

&lt;h3&gt;
  
  
  Tracing (OpenTelemetry)
&lt;/h3&gt;

&lt;p&gt;Use the &lt;code&gt;otelgrpc&lt;/code&gt; integration from &lt;code&gt;go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc&lt;/code&gt;. Don't write your own — the ecosystem is mature. Current idiomatic setup uses a &lt;code&gt;StatsHandler&lt;/code&gt;, which hooks deeper than the interceptor chain and captures stream events correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatsHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;otelgrpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServerHandler&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChainUnaryInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c"&gt;/* your app interceptors */&lt;/span&gt; &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Older codebases still use &lt;code&gt;otelgrpc.UnaryServerInterceptor()&lt;/code&gt; and &lt;code&gt;otelgrpc.StreamServerInterceptor()&lt;/code&gt; — those are deprecated but still work. Migrate when convenient; don't rewrite in a panic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;p&gt;Prometheus histogram of request duration per method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;reqDuration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;promauto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewHistogramVec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;prometheus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HistogramOpts&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"grpc_server_request_duration_seconds"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Buckets&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prometheus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DefBuckets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;metricsInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;reqDuration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithLabelValues&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Since&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Seconds&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: cardinality of &lt;code&gt;method&lt;/code&gt; is bounded (you know your service's methods). Cardinality of &lt;code&gt;code&lt;/code&gt; is bounded (gRPC codes are a fixed enum). Don't add user-id or request-id as labels — that's cardinality-explosion territory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Auth
&lt;/h3&gt;

&lt;p&gt;Extract bearer token from metadata, verify, inject user context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;authInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FromIncomingContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unauthenticated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"no metadata"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"authorization"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unauthenticated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"no auth token"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;verifyToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unauthenticated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"invalid token"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Skip certain public methods&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isPublic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userCtxKey&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key detail: add the user context here, near the boundary. Service code reads it from context. You don't pass claims as argument through every service method.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rate limiting
&lt;/h3&gt;

&lt;p&gt;Token bucket per caller or per method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;rateLimitInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Limiter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInterceptor&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
        &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResourceExhausted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"rate limited"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Production rate limiting is fancier — per-tenant, distributed state in Redis, burst capacity — but the shape is the same. Reject with &lt;code&gt;ResourceExhausted&lt;/code&gt; before doing work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retry (client-side)
&lt;/h3&gt;

&lt;p&gt;Client interceptor that retries on transient errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;retryClientInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempts&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryClientInterceptor&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
        &lt;span class="n"&gt;cc&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ClientConn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;invoker&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryInvoker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CallOption&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;invoker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;isRetryable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;backoff&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;
            &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Retry is one of the most dangerous interceptors. Get it wrong (no idempotency keys, retry non-idempotent operations, retry storm during outage) and it causes more production incidents than it prevents. Pair with &lt;a href="https://github.com/grpc-ecosystem/go-grpc-middleware" rel="noopener noreferrer"&gt;&lt;code&gt;grpc-middleware/retry&lt;/code&gt;&lt;/a&gt; if you can; it's battle-tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stream Interceptor Trap
&lt;/h2&gt;

&lt;p&gt;Stream interceptors are harder. The interceptor signature gives you a &lt;code&gt;grpc.ServerStream&lt;/code&gt;, which is a bidirectional channel. Logging becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;loggingStreamInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;srv&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;ss&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServerStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StreamServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StreamHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;srv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"stream=%s duration=%s err=%v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Since&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This only logs at stream-end, not per message. If you want per-message observability, you need to wrap the &lt;code&gt;ServerStream&lt;/code&gt; itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;observedStream&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServerStream&lt;/span&gt;
    &lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recv&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;observedStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;SendMsg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddInt64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServerStream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SendMsg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;observedStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;RecvMsg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServerStream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RecvMsg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddInt64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;recv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then pass the wrapper to the handler. This is the pattern for any stream interceptor that needs per-message visibility.&lt;/p&gt;

&lt;p&gt;Common mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting to propagate context to the wrapper.&lt;/strong&gt; The wrapped stream's &lt;code&gt;Context()&lt;/code&gt; should be the enriched context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-message overhead blows up long streams.&lt;/strong&gt; A message-level log line is fine at 100 msgs/sec. At 100K msgs/sec, it's your dominant cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State in the wrapper not thread-safe.&lt;/strong&gt; Streams can be concurrent on the &lt;code&gt;Send&lt;/code&gt; and &lt;code&gt;Recv&lt;/code&gt; sides. Protect counters.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Testing Interceptor Chains
&lt;/h2&gt;

&lt;p&gt;Unit test each interceptor in isolation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestAuthInterceptor_NoToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c"&gt;// no metadata&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"/my.Service/Method"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"handler should not be called"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;authInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;require&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unauthenticated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integration-test the chain end-to-end using &lt;code&gt;bufconn&lt;/code&gt; (in-memory connection):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestChain_Ordering&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lis&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufconn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChainUnaryInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observability&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;business&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RegisterMyServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;realImpl&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bufnet"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithContextDialer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DialContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTransportCredentials&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;insecure&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewCredentials&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewMyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c"&gt;// assert on behavior end-to-end&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integration tests catch bugs that unit tests don't: metadata propagation, interceptor ordering, context enrichment visible to the handler. Don't skip them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patterns That Save Time
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;grpc-middleware/v2&lt;/code&gt;&lt;/strong&gt; (&lt;code&gt;github.com/grpc-ecosystem/go-grpc-middleware/v2&lt;/code&gt;) for chain helpers, recovery, and batteries-included interceptors. Don't reinvent every wheel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep error semantics consistent&lt;/strong&gt;. Every interceptor should return &lt;code&gt;status.Error(code, msg)&lt;/code&gt; for failures. Don't return raw Go errors — clients can't parse them properly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skip-list for public methods.&lt;/strong&gt; Auth and rate limit often need to skip health check and reflection endpoints. Keep the skip list in one place.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-service vs global interceptors&lt;/strong&gt;. Most interceptors are global (tracing, metrics, auth). A few might be per-service (e.g., a bespoke rate limiter for a specific hot endpoint). Compose accordingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Panic recovery at the outermost layer&lt;/strong&gt;. A panic in a handler shouldn't kill the server. Use the &lt;code&gt;recovery&lt;/code&gt; middleware from &lt;code&gt;grpc-middleware&lt;/code&gt; or write your own, and put it first in the chain.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Discipline That Makes This Work
&lt;/h2&gt;

&lt;p&gt;Interceptors are the right tool for cross-cutting concerns — the things every RPC needs but the service code shouldn't have to think about. The discipline is: one concern per interceptor, careful ordering, consistent error semantics, tested end-to-end.&lt;/p&gt;

&lt;p&gt;The services I've seen do this well have clean business logic (because the cross-cutting stuff is outside it) and reliable operational behavior (because the interceptor chain is tested as a unit, not just piece-by-piece). The services that do it poorly have auth logic sprinkled through their handlers, tracing that randomly misses requests, and rate limiters that let certain code paths bypass.&lt;/p&gt;

&lt;p&gt;Interceptor order is one of those details that looks tactical and turns out to be architectural. Get it right once; the service's behavior improves every release.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-context-distributed-systems-production/" rel="noopener noreferrer"&gt;Go Context in Distributed Systems: What Actually Works in Production&lt;/a&gt; — the context that flows through every interceptor.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/rpc-vs-nats-who-owns-completion/" rel="noopener noreferrer"&gt;RPC vs NATS: It's Not About Sync vs Async — It's About Who Owns Completion&lt;/a&gt; — the shape of gRPC calls as one side of the bigger messaging picture.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/observability-cost-attribution-dual-path-architecture/" rel="noopener noreferrer"&gt;Observability and Cost Attribution: Why One Pipeline Isn't Enough&lt;/a&gt; — why tracing interceptors alone aren't enough for business attribution.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>grpc</category>
      <category>interceptors</category>
    </item>
    <item>
      <title>Go Generics, One Year In: Which Promises Held, Which Didn't</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 16:30:05 +0000</pubDate>
      <link>https://dev.to/harrisonsec/go-generics-one-year-in-which-promises-held-which-didnt-44m7</link>
      <guid>https://dev.to/harrisonsec/go-generics-one-year-in-which-promises-held-which-didnt-44m7</guid>
      <description>&lt;p&gt;Go 1.18 shipped generics in March 2022. The two years before that were dominated by hopeful blog posts ("finally, a real type system!") and the two years after by the predictable backlash ("why did we even bother, Go was simpler"). I've written production Go before and after. The honest answer is somewhere in the middle and closer to "useful for a narrower set of problems than we expected."&lt;/p&gt;

&lt;p&gt;This is a look back from someone who has shipped generic code in anger and reviewed a lot more of it. What held up. What didn't. What habits to adopt and which to avoid.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Go generics are genuinely valuable for &lt;strong&gt;parametric operations on container-shaped types&lt;/strong&gt; — slices, maps, channels, any-key lookup tables, min/max/sum utilities. Less valuable for "clever abstractions" that dress up control flow as type magic. The clearest gains are in the standard library itself (&lt;code&gt;slices&lt;/code&gt;, &lt;code&gt;maps&lt;/code&gt;) and in domain-specific utility packages. Most application code didn't need generics before and doesn't need them after. The mistake is not using generics; it's using them for things interfaces already handled fine.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Generics Actually Are
&lt;/h2&gt;

&lt;p&gt;Go generics are &lt;strong&gt;type parameters on functions and types&lt;/strong&gt;. A function like &lt;code&gt;slices.Contains&lt;/code&gt; can be written once, work for any slice element type, and still be type-checked at compile time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt; &lt;span class="n"&gt;comparable&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three features you should know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Type parameters&lt;/strong&gt;: the &lt;code&gt;[E any]&lt;/code&gt; or &lt;code&gt;[E comparable]&lt;/code&gt; in brackets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constraints&lt;/strong&gt;: tell the compiler what operations the type parameter supports. &lt;code&gt;any&lt;/code&gt;, &lt;code&gt;comparable&lt;/code&gt;, or custom interfaces like &lt;code&gt;constraints.Ordered&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approximate constraints&lt;/strong&gt;: &lt;code&gt;~[]E&lt;/code&gt; means "any type whose underlying type is &lt;code&gt;[]E&lt;/code&gt;" — lets you be flexible about named slice types.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What they aren't: Java-style wildcards, C++ SFINAE, or anything that mimics variance. The design is deliberately narrower than most prior languages. It's more like Rust's generics, minus the trait system's complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Generics Clearly Win
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Standard-library style container and utility functions
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;slices&lt;/code&gt; and &lt;code&gt;maps&lt;/code&gt; packages in the standard library are the canonical example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;slices&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"alice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;slices&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before generics, these were either hand-written per-type (tedious, error-prone), done via &lt;code&gt;interface{}&lt;/code&gt; (type-unsafe, slow), or done via &lt;code&gt;reflect&lt;/code&gt; (slow and error-prone). Generics are strictly better for these.&lt;/p&gt;

&lt;p&gt;The same pattern shows up in third-party libraries: &lt;code&gt;samber/lo&lt;/code&gt; (JS-style utilities), &lt;code&gt;thoas/go-funk&lt;/code&gt; (functional helpers), and many domain-specific ones. If you reach for lodash-style helpers in JavaScript, you'll want similar in Go, and generics made that workable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Concurrency helpers
&lt;/h3&gt;

&lt;p&gt;Generic worker pools, futures, result types — these all benefit from generics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Future&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;done&lt;/span&gt; &lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;val&lt;/span&gt;  &lt;span class="n"&gt;T&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt;  &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Future&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;done&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before generics, you'd have had an &lt;code&gt;interface{}&lt;/code&gt; return and a type assertion at the call site. Now you can express "this future produces a T" in the type. Cleaner at the boundary, safer at the call site.&lt;/p&gt;

&lt;h3&gt;
  
  
  Typed collections
&lt;/h3&gt;

&lt;p&gt;If your system has a genuinely typed container use case — say, an ordered map keyed by a domain ID — generics let you write it once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;OrderedMap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;comparable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;  &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;V&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a rare case where "custom generic container" is the right tool. The majority of code doesn't need this. But when you do need it, the generics version is much better than the &lt;code&gt;interface{}&lt;/code&gt; alternative.&lt;/p&gt;

&lt;h3&gt;
  
  
  Numerical / algorithmic code
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;constraints.Ordered&lt;/code&gt; (or its post-1.21 replacement &lt;code&gt;cmp.Ordered&lt;/code&gt;) is the key constraint for "works for any numeric or ordered type":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Max&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;cmp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ordered&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Math helpers, min/max, sum, average — all cleanly generic. Readable, type-safe, performant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Generics Don't Help, Or Hurt
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "Generic services" and similar framework-y code
&lt;/h3&gt;

&lt;p&gt;I've seen codebases where someone wrote a generic "repository" type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Repository&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Repository&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="n"&gt;FindByID&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Repository&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="n"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The instinct — "all repositories do the same thing" — is mostly wrong. Real repositories differ in query shape, error cases, caching rules, transaction boundaries. Forcing them behind a generic interface either (a) produces a lowest-common-denominator API that doesn't fit any actual use, or (b) gets so many type parameters that readability collapses.&lt;/p&gt;

&lt;p&gt;The Go idiom is usually better: one non-generic &lt;code&gt;UserRepository&lt;/code&gt;, one &lt;code&gt;OrderRepository&lt;/code&gt;, etc. Each concrete, each tuned to its domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Over-constrained helpers
&lt;/h3&gt;

&lt;p&gt;If your "generic" function has five type parameters with custom constraints each, readability dies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Complicated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;comparable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;Hashable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;V&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;F&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;M&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is technically legal. Reading it, you realize it's a glorified map-with-cache-and-error. Interfaces or function types would have been clearer. Generics don't make complex APIs simple; they just let you make them complex in a type-checked way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Behavioral polymorphism
&lt;/h3&gt;

&lt;p&gt;Interfaces are still the right tool when different types have &lt;strong&gt;different behavior&lt;/strong&gt;. A generic &lt;code&gt;Process[T any](x T) error&lt;/code&gt; doesn't help if you actually want different logic per type. You want an interface with a method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Good use of interface&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Processor&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Bad use of generics&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;ProcessGeneric&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// can't actually differentiate behavior&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The separation: &lt;strong&gt;generics for parametric operations (same logic, any type), interfaces for polymorphic behavior (different logic per type).&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance: Usually a Wash
&lt;/h2&gt;

&lt;p&gt;The performance story is more nuanced than either "generics are slow" or "generics are free."&lt;/p&gt;

&lt;p&gt;Go's current generic implementation uses &lt;strong&gt;GCShape stenciling&lt;/strong&gt; — one compiled version per "GC shape" (roughly, per memory layout). This is between full monomorphization (one version per type, like Rust) and type-erased dispatch (one version total, like Java's reified-erased hybrid).&lt;/p&gt;

&lt;p&gt;Practical implications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small primitive types (int, int64)&lt;/strong&gt; often get specialized versions. Competitive with hand-written.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pointer-sized types (most structs, interfaces)&lt;/strong&gt; share code. Slightly slower than hand-written but usually faster than interface-based dispatch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Call overhead is similar to function calls&lt;/strong&gt;, not interface dispatch. No devirtualization issue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compile times increase&lt;/strong&gt;, especially for libraries with many instantiations. This is the real cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Benchmarks I've seen: generic versions are within 5% of hand-written equivalents, and consistently faster than &lt;code&gt;interface{}&lt;/code&gt;-based alternatives. Performance is almost never the deciding factor — readability and design fit matter more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Idioms That Emerged
&lt;/h2&gt;

&lt;p&gt;Over the years since 1.18, a few conventions have stuck:&lt;/p&gt;

&lt;h3&gt;
  
  
  Prefer &lt;code&gt;any&lt;/code&gt; to &lt;code&gt;interface{}&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;any&lt;/code&gt; is a type alias for &lt;code&gt;interface{}&lt;/code&gt; added in 1.18. Shorter, clearer. Use it everywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Single-letter type parameters for simple cases, descriptive for complex
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;T&lt;/code&gt;, &lt;code&gt;K&lt;/code&gt;, &lt;code&gt;V&lt;/code&gt; for the obvious cases. More descriptive when the role is specific:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Reduce&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Out&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;In&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;In&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;initial&lt;/span&gt; &lt;span class="n"&gt;Out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Out&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Put constraints in a dedicated package
&lt;/h3&gt;

&lt;p&gt;If you have several custom constraints, group them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;constraints&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Ordered&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Numeric&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The standard &lt;code&gt;golang.org/x/exp/constraints&lt;/code&gt; (and later &lt;code&gt;cmp.Ordered&lt;/code&gt; in 1.21) set the pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use &lt;code&gt;~T&lt;/code&gt; approximations for flexibility
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;~[]E&lt;/code&gt; includes named slice types. &lt;code&gt;~int&lt;/code&gt; includes &lt;code&gt;type MyInt int&lt;/code&gt;. Almost always the right choice for generic parametric code; refuses arbitrary extension.&lt;/p&gt;

&lt;h3&gt;
  
  
  Never overload generic helpers to do too much
&lt;/h3&gt;

&lt;p&gt;Each generic function should do one parametric thing. Generic helpers that try to be many things at once collapse under type-parameter weight.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Standard Library Won
&lt;/h2&gt;

&lt;p&gt;The clearest vindication of Go generics is what happened to the standard library. &lt;code&gt;slices&lt;/code&gt;, &lt;code&gt;maps&lt;/code&gt;, &lt;code&gt;cmp.Ordered&lt;/code&gt; — these additions are uncontroversially better than the pre-1.18 alternatives. A lot of code that used to be hand-rolled or based on &lt;code&gt;sort.Interface&lt;/code&gt; has cleaner replacements.&lt;/p&gt;

&lt;p&gt;The user-land picture is more mixed. Libraries that benefit from generics genuinely use them well (&lt;code&gt;samber/lo&lt;/code&gt;, &lt;code&gt;kelindar/column&lt;/code&gt;, many others). Libraries that don't need them mostly haven't been retrofitted with them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Do Now
&lt;/h2&gt;

&lt;p&gt;A few simple rules I apply:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Prefer standard library generic helpers over hand-rolled.&lt;/strong&gt; &lt;code&gt;slices.Contains&lt;/code&gt;, &lt;code&gt;slices.Sort&lt;/code&gt;, &lt;code&gt;maps.Keys&lt;/code&gt; — use them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write a generic helper only when I have at least two concrete use cases for it.&lt;/strong&gt; One use case is a pattern waiting to be born, not necessarily a generic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefer functions to methods on generic types&lt;/strong&gt; when possible. Generic methods have more friction (can't overload by type, can't add methods outside the defining package).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep constraints simple.&lt;/strong&gt; &lt;code&gt;any&lt;/code&gt;, &lt;code&gt;comparable&lt;/code&gt;, &lt;code&gt;cmp.Ordered&lt;/code&gt;, and domain-specific single-type-union constraints cover 95% of cases. More complex constraints usually mean the abstraction is wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never turn interfaces into generics just because you can.&lt;/strong&gt; If the types have genuinely different behavior, an interface is right.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where Generics Actually Sit Now
&lt;/h2&gt;

&lt;p&gt;Generics were oversold before they landed ("Go finally becomes a real language!") and oversampled in the aftermath ("generics everywhere!"). The truth is narrower and more boring: they're a useful addition for a specific class of problems, mostly centered on parametric operations over containers and numerics. They improved the standard library. They haven't changed the shape of most Go code.&lt;/p&gt;

&lt;p&gt;If you've been writing Go and wondering whether you're missing out by not using generics, the answer is almost certainly no. Code without them is still idiomatic. Code with them, when the use case fits, is cleaner. Neither is dominant. Both are fine.&lt;/p&gt;

&lt;p&gt;The one concrete thing I'd say: &lt;strong&gt;learn the generic parts of the standard library&lt;/strong&gt;. &lt;code&gt;slices&lt;/code&gt;, &lt;code&gt;maps&lt;/code&gt;, &lt;code&gt;cmp.Ordered&lt;/code&gt;. Use them reflexively. Stop hand-rolling &lt;code&gt;indexOf&lt;/code&gt; and &lt;code&gt;contains&lt;/code&gt;. Everything else can wait until you have a real problem that generics solve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-profiling-pprof-escape-analysis-inlining/" rel="noopener noreferrer"&gt;Go Profiling in Anger: pprof, Escape Analysis, and Inlining Without Magic&lt;/a&gt; — the performance toolchain that tells you whether your generic code actually matches the hand-written version.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-sync-pool-buffer-reuse-when-it-helps/" rel="noopener noreferrer"&gt;sync.Pool in Go: When It Actually Helps, and When It Quietly Hurts&lt;/a&gt; — another feature most commonly misapplied.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/scale-up-scale-out-every-language-wins-somewhere/" rel="noopener noreferrer"&gt;Scale-Up vs Scale-Out: Why Every Language Wins Somewhere&lt;/a&gt; — the meta-question behind every language-feature debate.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>generics</category>
      <category>typeparameters</category>
    </item>
    <item>
      <title>Go Profiling in Anger: pprof, Escape Analysis, and Inlining Without Magic</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 16:29:23 +0000</pubDate>
      <link>https://dev.to/harrisonsec/go-profiling-in-anger-pprof-escape-analysis-and-inlining-without-magic-3ij</link>
      <guid>https://dev.to/harrisonsec/go-profiling-in-anger-pprof-escape-analysis-and-inlining-without-magic-3ij</guid>
      <description>&lt;p&gt;Go's performance culture has a ritual quality. "Use sync.Pool." "Avoid interface boxing." "Preallocate slices." Copy-pasted from blog posts and applied without measurement. Sometimes helpful. Often hollow.&lt;/p&gt;

&lt;p&gt;The honest answer is that Go performance work is mostly &lt;strong&gt;just profiling&lt;/strong&gt;. Good profiling tells you what's actually slow. Bad profiling — or no profiling — leaves you guessing. The toolchain that Go ships with is genuinely excellent; more engineers should use it, and fewer should follow checklist optimizations they haven't measured.&lt;/p&gt;

&lt;p&gt;This is a practical, end-to-end guide to pprof, escape analysis, and inlining — the three Go-specific tools that answer most performance questions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Start every Go perf investigation with a CPU pprof of the hot path under realistic load. 80% of issues are obvious in the flame graph. For the remaining 20%, add a heap profile and look for allocation pressure driving GC. Only after you've localized the problem with real data should you reach for micro-optimizations: escape analysis via &lt;code&gt;-gcflags='-m'&lt;/code&gt;, inlining hints, and targeted benchmark-driven rewrites. Skip the profile step, and you are optimizing the wrong thing.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Investigation Flow
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBTdGFydChbUGVyZm9ybWFuY2UgY29uY2Vybl0pIC0tPiBDUFVbVGFrZSBDUFUgcHJvZmlsZTxici8-LWh0dHAgcHByb2YgwrcgMzBzIHVuZGVyIGxvYWRdCiAgICBDUFUgLS0-IEhvdHtIb3QgY29kZTxici8-b2J2aW91cz99CiAgICBIb3QgLS0-fFllc3wgRml4MVtGaXggdGhlIGhvdCBwYXRoIMK3IHJlLW1lYXN1cmVdCiAgICBIb3QgLS0-fE5vIMK3IEdDIGhpZ2h8IEhlYXBbVGFrZSBoZWFwIC8gYWxsb2MgcHJvZmlsZV0KICAgIEhlYXAgLS0-IEFsbG9jU2l0ZXtTcGVjaWZpYzxici8-YWxsb2Mgc2l0ZT99CiAgICBBbGxvY1NpdGUgLS0-fFllc3wgRXNjYXBlW0NoZWNrIC1nY2ZsYWdzPSctbSc8YnIvPmZvciB0aGF0IGZ1bmN0aW9uXQogICAgQWxsb2NTaXRlIC0tPnxOb3wgQmVuY2hNaWNyb1tJc29sYXRlIGluIGJlbmNobWFyazxici8-LWJlbmNobWVtIMK3IC1jb3VudD01XQogICAgRXNjYXBlIC0tPiBGaXgyW0ZpeCBhbGxvYyDCtyByZS1tZWFzdXJlXQogICAgQmVuY2hNaWNybyAtLT4gRml4M1tPcHRpbWl6ZSBvciBhY2NlcHRdCiAgICBGaXgxIC0tPiBWZXJpZnlbUHJvZmlsZSBhZ2FpbiDCtyBjb25maXJtXQogICAgRml4MiAtLT4gVmVyaWZ5CiAgICBGaXgzIC0tPiBWZXJpZnkKCiAgICBjbGFzc0RlZiBzdGFydCBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIGFjdGlvbiBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzRGVmIHZlcmlmeSBmaWxsOiNmZWY1ZTcsc3Ryb2tlOiNiNzc5MWYKICAgIGNsYXNzIFN0YXJ0IHN0YXJ0CiAgICBjbGFzcyBGaXgxLEZpeDIsRml4MyBhY3Rpb24KICAgIGNsYXNzIFZlcmlmeSB2ZXJpZnk%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBTdGFydChbUGVyZm9ybWFuY2UgY29uY2Vybl0pIC0tPiBDUFVbVGFrZSBDUFUgcHJvZmlsZTxici8-LWh0dHAgcHByb2YgwrcgMzBzIHVuZGVyIGxvYWRdCiAgICBDUFUgLS0-IEhvdHtIb3QgY29kZTxici8-b2J2aW91cz99CiAgICBIb3QgLS0-fFllc3wgRml4MVtGaXggdGhlIGhvdCBwYXRoIMK3IHJlLW1lYXN1cmVdCiAgICBIb3QgLS0-fE5vIMK3IEdDIGhpZ2h8IEhlYXBbVGFrZSBoZWFwIC8gYWxsb2MgcHJvZmlsZV0KICAgIEhlYXAgLS0-IEFsbG9jU2l0ZXtTcGVjaWZpYzxici8-YWxsb2Mgc2l0ZT99CiAgICBBbGxvY1NpdGUgLS0-fFllc3wgRXNjYXBlW0NoZWNrIC1nY2ZsYWdzPSctbSc8YnIvPmZvciB0aGF0IGZ1bmN0aW9uXQogICAgQWxsb2NTaXRlIC0tPnxOb3wgQmVuY2hNaWNyb1tJc29sYXRlIGluIGJlbmNobWFyazxici8-LWJlbmNobWVtIMK3IC1jb3VudD01XQogICAgRXNjYXBlIC0tPiBGaXgyW0ZpeCBhbGxvYyDCtyByZS1tZWFzdXJlXQogICAgQmVuY2hNaWNybyAtLT4gRml4M1tPcHRpbWl6ZSBvciBhY2NlcHRdCiAgICBGaXgxIC0tPiBWZXJpZnlbUHJvZmlsZSBhZ2FpbiDCtyBjb25maXJtXQogICAgRml4MiAtLT4gVmVyaWZ5CiAgICBGaXgzIC0tPiBWZXJpZnkKCiAgICBjbGFzc0RlZiBzdGFydCBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIGFjdGlvbiBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzRGVmIHZlcmlmeSBmaWxsOiNmZWY1ZTcsc3Ryb2tlOiNiNzc5MWYKICAgIGNsYXNzIFN0YXJ0IHN0YXJ0CiAgICBjbGFzcyBGaXgxLEZpeDIsRml4MyBhY3Rpb24KICAgIGNsYXNzIFZlcmlmeSB2ZXJpZnk%3D" alt="Start([Performance concern]) --&amp;gt; CPU[Take CPU profile&amp;lt;br/&amp;gt;-http pprof · 30s under load]" width="803" height="1086"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  CPU Profiling: The First Thing, Always
&lt;/h2&gt;

&lt;p&gt;Every Go binary can expose a pprof HTTP endpoint in two lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="s"&gt;"net/http/pprof"&lt;/span&gt;
&lt;span class="c"&gt;// later&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"localhost:6060"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under load, grab a CPU profile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go tool pprof &lt;span class="nt"&gt;-http&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;:9999 http://localhost:6060/debug/pprof/profile?seconds&lt;span class="o"&gt;=&lt;/span&gt;30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens a flame graph in your browser. The wide blocks are where CPU time is spent. Usually the answer is immediate — "oh, JSON encoding is 40% of my CPU; let me switch to a faster encoder." Or "regex compilation is in the hot path because someone forgot to pre-compile."&lt;/p&gt;

&lt;p&gt;A few things that look surprising on first profile but shouldn't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;runtime.mallocgc&lt;/code&gt; taking 10%+&lt;/strong&gt; is GC pressure. You're allocating a lot. Look at heap profile next.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;runtime.schedule&lt;/code&gt; or &lt;code&gt;runtime.findrunnable&lt;/code&gt; taking 5%+&lt;/strong&gt; means you have too many goroutines churning. Check if you're spawning per-request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;syscall.Syscall&lt;/code&gt; high&lt;/strong&gt; means you're system-call-heavy — usually I/O. Either buffer/batch, or consider epoll-direct if it's in your hot path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;mutex.Lock&lt;/code&gt; visible&lt;/strong&gt; means contention. Either shrink the lock hold time or shard the lock.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don't guess your way through these. Click into each, read the stack, find the user code that caused it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Heap Profiling: When CPU Points to GC
&lt;/h2&gt;

&lt;p&gt;If &lt;code&gt;runtime.mallocgc&lt;/code&gt; shows up in your CPU profile as a non-trivial chunk, heap profile tells you why:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go tool pprof &lt;span class="nt"&gt;-http&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;:9999 http://localhost:6060/debug/pprof/heap
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go tool pprof &lt;span class="nt"&gt;-http&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;:9999 http://localhost:6060/debug/pprof/allocs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;heap&lt;/code&gt; shows current memory usage. &lt;code&gt;allocs&lt;/code&gt; shows cumulative allocations since program start — this is usually what you want to optimize.&lt;/p&gt;

&lt;p&gt;In the flame graph, look for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Specific allocation sites taking disproportionate share.&lt;/strong&gt; A single line of code creating 50% of allocations is an obvious target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calls to &lt;code&gt;makeslice&lt;/code&gt;, &lt;code&gt;makemap&lt;/code&gt;, &lt;code&gt;newobject&lt;/code&gt;&lt;/strong&gt; with known-size inputs. If you know the size, preallocate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interface boxing in hot paths.&lt;/strong&gt; Every time you pass a concrete type through an &lt;code&gt;interface{}&lt;/code&gt; argument in a tight loop, the runtime may heap-allocate the boxed value.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;String concatenation with &lt;code&gt;+&lt;/code&gt;.&lt;/strong&gt; This is the textbook preventable allocation — use &lt;code&gt;strings.Builder&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal isn't "zero allocations" — that's usually not practical. The goal is "allocations per operation in a tight, repeated path are bounded and understood."&lt;/p&gt;

&lt;h2&gt;
  
  
  Escape Analysis: The Compiler's Story
&lt;/h2&gt;

&lt;p&gt;Go's compiler decides at compile time whether a variable lives on the stack (free, garbage-collected with the function) or the heap (allocated, GC-tracked). This is called escape analysis.&lt;/p&gt;

&lt;p&gt;To see the analysis for your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go build &lt;span class="nt"&gt;-gcflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'-m'&lt;/span&gt; ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./foo.go:12:6: can inline hotFunction
./foo.go:15:10: &amp;amp;Thing{} escapes to heap
./foo.go:18:14: make([]int, 100) does not escape
./foo.go:22:6: parameter "x" escapes to heap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key things to read for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;escapes to heap&lt;/code&gt;&lt;/strong&gt; — this allocation is heap-allocated. If it's in a hot path, investigate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;does not escape&lt;/code&gt;&lt;/strong&gt; — stack-allocated, free. You want most short-lived locals to do this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;parameter escapes to heap&lt;/code&gt;&lt;/strong&gt; — the caller's passed value escapes because this function keeps a reference to it. Often fixable by taking a copy or not storing a reference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most common surprise: &lt;strong&gt;passing a value to a function that eventually hands it to &lt;code&gt;interface{}&lt;/code&gt; causes the value to escape&lt;/strong&gt;. A pattern like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;handleRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"got request"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// req.ID boxes to interface{} and may escape&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;req.ID&lt;/code&gt; escapes because of the &lt;code&gt;...interface{}&lt;/code&gt; argument. In a tight path, this is measurable. Fix: use a typed logger that takes concrete types, or accept the cost because logging on the hot path is usually not the hot path.&lt;/p&gt;

&lt;p&gt;Escape analysis is one of those things where reading the output a few times is worth it. You start seeing your code differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inlining: When the Compiler Eliminates the Call
&lt;/h2&gt;

&lt;p&gt;Go's compiler inlines small functions to avoid call overhead. Seeing what got inlined:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go build &lt;span class="nt"&gt;-gcflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'-m'&lt;/span&gt; ./... 2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'can inline|cannot inline'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./foo.go:12:6: can inline hotFunction
./foo.go:18:6: cannot inline bigFunction: function too complex: cost 117 exceeds budget 80
./foo.go:22:6: cannot inline interfacingFunction: call to unknown method
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Default budget is 80 AST nodes. Hard blockers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Calls through interfaces.&lt;/strong&gt; The compiler doesn't know what concrete method gets called. No inlining.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calls to functions that contain loops with &lt;code&gt;for range&lt;/code&gt; over a channel.&lt;/strong&gt; Historically blocked, though the mid-stack inliner has improved this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recursive functions.&lt;/strong&gt; Obvious.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functions over the budget.&lt;/strong&gt; Refactor smaller if the call is hot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When to care:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Never in normal code. Go inlines what it can; your code runs.&lt;/li&gt;
&lt;li&gt;Sometimes in tight hot loops where the call overhead is 10%+ of the total work. Benchmark shows it.&lt;/li&gt;
&lt;li&gt;Occasionally when you control an interface boundary and can replace it with a concrete type on a hot path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don't structure your code around inlining. Code readability beats hypothetical call-overhead wins in nearly every case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks: The Ground Truth
&lt;/h2&gt;

&lt;p&gt;Every perf claim should be backed by a benchmark. &lt;code&gt;testing.B&lt;/code&gt; is the tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkEncodeResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;newResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReportAllocs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResetTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-bench&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;BenchmarkEncode &lt;span class="nt"&gt;-benchmem&lt;/span&gt; &lt;span class="nt"&gt;-count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;-count=5&lt;/code&gt; runs each bench 5 times, so you can compare variance. Don't trust a single run. Hardware, OS scheduling, thermals — all add noise.&lt;/p&gt;

&lt;p&gt;For comparing two implementations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-bench&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;BenchmarkEncodeResponse &lt;span class="nt"&gt;-benchmem&lt;/span&gt; &lt;span class="nt"&gt;-count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10 ./... &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; old.txt
&lt;span class="gp"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;change code&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-bench&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;BenchmarkEncodeResponse &lt;span class="nt"&gt;-benchmem&lt;/span&gt; &lt;span class="nt"&gt;-count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10 ./... &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; new.txt
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;benchstat old.txt new.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;benchstat&lt;/code&gt; (&lt;code&gt;golang.org/x/perf/cmd/benchstat&lt;/code&gt;) gives you statistical significance. If the difference isn't statistically meaningful, you didn't actually improve anything — you just rolled the dice differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 80/20 of Go Performance
&lt;/h2&gt;

&lt;p&gt;After enough of this work, a few patterns dominate the real wins:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Query shape, not language.&lt;/strong&gt; A slow endpoint is usually doing 10 DB queries when it could do 1. Go is almost never the bottleneck; the data layer is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network hop count.&lt;/strong&gt; Every inter-service call adds latency. Merging two small services or co-locating tight integrations beats any language-level optimization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching at the right layer.&lt;/strong&gt; A well-placed LRU cache saves more than micro-optimizing the uncached path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preallocating known-size slices/maps.&lt;/strong&gt; &lt;code&gt;make([]int, 0, n)&lt;/code&gt; when you know n is almost free. The default &lt;code&gt;make([]int, 0)&lt;/code&gt; reallocates as you append.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoiding interface boxing in loops.&lt;/strong&gt; This is the one micro-optimization that regularly shows up in real profiles.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else — &lt;code&gt;sync.Pool&lt;/code&gt;, escape analysis hand-tuning, loop unrolling — is a long-tail optimization. Worth it when profiling tells you it is. Premature otherwise.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Habit I Recommend
&lt;/h2&gt;

&lt;p&gt;Before adding any optimization, do exactly three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take a profile with the optimization off. Save it.&lt;/li&gt;
&lt;li&gt;Apply the optimization.&lt;/li&gt;
&lt;li&gt;Take a profile with the optimization on. Compare.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the comparison doesn't show clear improvement on the metric you cared about, revert. Do not add complexity without evidence.&lt;/p&gt;

&lt;p&gt;This sounds obvious. Almost nobody does it. Most perf work in Go codebases accumulates dead optimizations that add nothing or actively hurt — but nobody knows which, because nobody benchmarked.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Habit That Compounds
&lt;/h2&gt;

&lt;p&gt;Go's performance tooling is better than Go's performance culture gives it credit for. pprof, escape analysis, inlining diagnostics, and benchmarks are built in. They're precise. They tell you the truth.&lt;/p&gt;

&lt;p&gt;The reason most Go code isn't as fast as it could be isn't that Go is slow (it isn't). It's that engineers copy-paste optimizations they haven't measured, call the work done, and move on. The few engineers who profile first and optimize second write code that's actually fast — and usually simpler than the ritual-heavy version.&lt;/p&gt;

&lt;p&gt;Profile first. Everything else follows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-sync-pool-buffer-reuse-when-it-helps/" rel="noopener noreferrer"&gt;sync.Pool in Go: When It Actually Helps, and When It Quietly Hurts&lt;/a&gt; — the one Go optimization most likely to be misapplied.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-millions-connections-user-space-context-switching/" rel="noopener noreferrer"&gt;Why Go Handles Millions of Connections: User-Space Context Switching, Explained&lt;/a&gt; — understanding the runtime is the prerequisite to understanding profiles.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/testing-real-world-go-backends/" rel="noopener noreferrer"&gt;Testing Real-World Go Backends Isn't What Many People Think&lt;/a&gt; — benchmarking is the last mile of testing.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>performance</category>
      <category>pprof</category>
    </item>
    <item>
      <title>sync.Pool in Go: When It Actually Helps, and When It Quietly Hurts</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 16:28:42 +0000</pubDate>
      <link>https://dev.to/harrisonsec/syncpool-in-go-when-it-actually-helps-and-when-it-quietly-hurts-2676</link>
      <guid>https://dev.to/harrisonsec/syncpool-in-go-when-it-actually-helps-and-when-it-quietly-hurts-2676</guid>
      <description>&lt;p&gt;&lt;code&gt;sync.Pool&lt;/code&gt; is one of those Go features that shows up prominently in "how to write fast Go" blog posts and then gets applied to everything. The result is a codebase sprinkled with pools that don't help and sometimes hurt. Most Go code I review does not need &lt;code&gt;sync.Pool&lt;/code&gt;. The code that does need it often uses it wrong.&lt;/p&gt;

&lt;p&gt;This is a working engineer's take on when pooling actually helps, when it's wasted effort, and the specific traps it creates.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — &lt;code&gt;sync.Pool&lt;/code&gt; is a GC-pressure reducer for workloads that allocate large-ish, short-lived objects at high frequency. It is not a general-purpose optimization. The cases where it clearly helps: per-request buffers in HTTP handlers, encoder/decoder instances, JSON buffers, protocol frame buffers. The cases where it hurts or is wasted: small objects, infrequent allocations, long-lived state, and any code that forgets to reset pooled items. Benchmark before and after — always.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What sync.Pool Actually Does
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;sync.Pool&lt;/code&gt; is a free-list for objects that the GC can clear. You &lt;code&gt;Get()&lt;/code&gt; an object (fresh or recycled). You use it. You &lt;code&gt;Put()&lt;/code&gt; it back. The runtime tries to give you a recycled one next time, but reserves the right to drop the whole pool on GC.&lt;/p&gt;

&lt;p&gt;Key properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GC clears pools on every cycle.&lt;/strong&gt; This is crucial. Pools are not a long-term cache — they're a hint to the runtime that "if you're going to collect these, wait a moment in case they get reused first."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-P (per-scheduler-thread) local storage.&lt;/strong&gt; Most &lt;code&gt;Get()&lt;/code&gt;/&lt;code&gt;Put()&lt;/code&gt; calls hit a goroutine-local pool with no contention. Scaling across cores is nearly free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No guarantees.&lt;/strong&gt; A &lt;code&gt;Get()&lt;/code&gt; might return a fresh object. A &lt;code&gt;Put()&lt;/code&gt; might be discarded if the pool is full or the GC just fired.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design is exactly right for "reusable scratch space." It's wrong for "cached resources I need to stay around" (use a real cache instead).&lt;/p&gt;

&lt;h2&gt;
  
  
  Should You Pool This?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBTdGFydChbQ29uc2lkZXJpbmcgc3luYy5Qb29sP10pIC0tPiBRMXtIYXZlIHlvdTxici8-YmVuY2htYXJrZWQ8YnIvPi1iZW5jaG1lbT99CiAgICBRMSAtLT58Tm98IFNraXAxW0JlbmNobWFyayBmaXJzdC48YnIvPk1vc3QgY29kZSBkb2Vzbid0IG5lZWQgdGhpcy5dCiAgICBRMSAtLT58WWVzfCBRMntPYmplY3Qgc2l6ZTxici8-PiAxIEtCP30KICAgIFEyIC0tPnxObyDCtyBzbWFsbCBvYmplY3R8IFNraXAyW1Bvb2wgb3ZlcmhlYWQgZXhjZWVkczxici8-YWxsb2MgY29zdC4gVXNlICduZXcnLl0KICAgIFEyIC0tPnxZZXN8IFEze0FsbG9jYXRpb25zPGJyLz5mcmVxdWVudD88YnIvPjEwMDBzL3NlY30KICAgIFEzIC0tPnxObyDCtyByYXJlfCBTa2lwM1tHQyBoYW5kbGVzIHRoaXMgZmluZS48YnIvPlNraXAuXQogICAgUTMgLS0-fFllc3wgUTR7U2hvcnQtbGl2ZWQ8YnIvPmFuZCBlYXNpbHk8YnIvPnJlc2V0P30KICAgIFE0IC0tPnxObyDCtyBsb25nLWxpdmVkfCBTa2lwNFtVc2UgYSByZWFsIGNhY2hlPGJyLz5vciByZXNvdXJjZSBwb29sLl0KICAgIFE0IC0tPnxZZXN8IFVzZVtVc2Ugc3luYy5Qb29sLjxici8-QWx3YXlzIFJlc2V0IG9uIEdldCBhbmQgUHV0Ll0KCiAgICBjbGFzc0RlZiBza2lwIGZpbGw6I2ZlZDdkNyxzdHJva2U6I2M1MzAzMAogICAgY2xhc3NEZWYgdXNlIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgU2tpcDEsU2tpcDIsU2tpcDMsU2tpcDQgc2tpcAogICAgY2xhc3MgVXNlIHVzZQ%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBTdGFydChbQ29uc2lkZXJpbmcgc3luYy5Qb29sP10pIC0tPiBRMXtIYXZlIHlvdTxici8-YmVuY2htYXJrZWQ8YnIvPi1iZW5jaG1lbT99CiAgICBRMSAtLT58Tm98IFNraXAxW0JlbmNobWFyayBmaXJzdC48YnIvPk1vc3QgY29kZSBkb2Vzbid0IG5lZWQgdGhpcy5dCiAgICBRMSAtLT58WWVzfCBRMntPYmplY3Qgc2l6ZTxici8-PiAxIEtCP30KICAgIFEyIC0tPnxObyDCtyBzbWFsbCBvYmplY3R8IFNraXAyW1Bvb2wgb3ZlcmhlYWQgZXhjZWVkczxici8-YWxsb2MgY29zdC4gVXNlICduZXcnLl0KICAgIFEyIC0tPnxZZXN8IFEze0FsbG9jYXRpb25zPGJyLz5mcmVxdWVudD88YnIvPjEwMDBzL3NlY30KICAgIFEzIC0tPnxObyDCtyByYXJlfCBTa2lwM1tHQyBoYW5kbGVzIHRoaXMgZmluZS48YnIvPlNraXAuXQogICAgUTMgLS0-fFllc3wgUTR7U2hvcnQtbGl2ZWQ8YnIvPmFuZCBlYXNpbHk8YnIvPnJlc2V0P30KICAgIFE0IC0tPnxObyDCtyBsb25nLWxpdmVkfCBTa2lwNFtVc2UgYSByZWFsIGNhY2hlPGJyLz5vciByZXNvdXJjZSBwb29sLl0KICAgIFE0IC0tPnxZZXN8IFVzZVtVc2Ugc3luYy5Qb29sLjxici8-QWx3YXlzIFJlc2V0IG9uIEdldCBhbmQgUHV0Ll0KCiAgICBjbGFzc0RlZiBza2lwIGZpbGw6I2ZlZDdkNyxzdHJva2U6I2M1MzAzMAogICAgY2xhc3NEZWYgdXNlIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgU2tpcDEsU2tpcDIsU2tpcDMsU2tpcDQgc2tpcAogICAgY2xhc3MgVXNlIHVzZQ%3D%3D" alt="Start([Considering sync.Pool?]) --&amp;gt; Q1{Have you&amp;lt;br/&amp;gt;benchmarked&amp;lt;br/&amp;gt;-benchmem?}" width="918" height="1217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most paths in real code exit this flow long before hitting "use". That's correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Pooling Helps: Per-Request Buffers
&lt;/h2&gt;

&lt;p&gt;Canonical case. An HTTP handler serializes a response to a buffer, writes the buffer, moves on. The next request does the same thing. Without pooling, the GC collects the buffer every request. With pooling, the buffer is reused:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;bufferPool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="n"&gt;writeResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under realistic load (thousands of requests per second), this typically reduces allocation pressure by 20-40% and measurably lowers GC pause times. The exact number depends on your allocation pattern, but the principle holds: &lt;strong&gt;large, frequent, short-lived allocations are exactly what pooling is for&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;What makes this the canonical case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Buffers are big enough (4KB initial) that the allocation actually matters.&lt;/li&gt;
&lt;li&gt;They're frequent — thousands per second.&lt;/li&gt;
&lt;li&gt;Short-lived — used within one request.&lt;/li&gt;
&lt;li&gt;Easy to reset — &lt;code&gt;buf.Reset()&lt;/code&gt; clears it cleanly.&lt;/li&gt;
&lt;li&gt;Same shape every time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you see a request-scoped buffer that fits all five, pooling almost always pays.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Pooling Is Wasted Effort
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Small objects.&lt;/strong&gt; Pooling a 24-byte struct with three fields is almost never worth it. The pool's own overhead (per-P lookup, interface boxing) is larger than the allocation. Benchmark to confirm — you'll see &lt;code&gt;allocs/op&lt;/code&gt; go down but &lt;code&gt;ns/op&lt;/code&gt; stay the same or go up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Not worth it:&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Small&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;smallPool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Small&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;

&lt;span class="c"&gt;// Just use new(Small) or &amp;amp;Small{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Infrequent allocations.&lt;/strong&gt; If your code path runs once an hour, pooling saves nothing meaningful. The GC handles a handful of allocations just fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-lived state.&lt;/strong&gt; Connection objects, database handles, caches. These shouldn't be in &lt;code&gt;sync.Pool&lt;/code&gt; — they should be in a proper cache or connection pool (like &lt;code&gt;*sql.DB&lt;/code&gt;, which internally manages connections without &lt;code&gt;sync.Pool&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anything you can't reliably reset.&lt;/strong&gt; If an object has state that needs to be "returned to zero," and you can forget to zero it, you're one typo away from data leaking between requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reset Trap
&lt;/h2&gt;

&lt;p&gt;The single most dangerous mistake with &lt;code&gt;sync.Pool&lt;/code&gt;: forgetting to reset the object before putting it back, or reusing it before clearing whatever was in it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Wrong:&lt;/span&gt;
&lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responseData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// might not start empty&lt;/span&gt;
&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// buf still has data; next caller might see it&lt;/span&gt;

&lt;span class="c"&gt;// Right:&lt;/span&gt;
&lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c"&gt;// ← explicit&lt;/span&gt;
&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responseData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This has caused real production incidents. Pooled buffers across request handlers have leaked bearer tokens, user PII, and password reset codes when a reset was missed. The runtime doesn't help — there's no "enforce reset" mechanism. You have to do it.&lt;/p&gt;

&lt;p&gt;Habits that reduce the risk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Always pair &lt;code&gt;Get&lt;/code&gt; with a &lt;code&gt;defer Reset+Put&lt;/code&gt;&lt;/strong&gt; at the top of the function.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reset at both ends&lt;/strong&gt; (on Get and on Put) — paranoid but effective.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For byte slices, shrink before return&lt;/strong&gt;: &lt;code&gt;buf.Reset()&lt;/code&gt; on a &lt;code&gt;bytes.Buffer&lt;/code&gt; resets length but keeps capacity — that's usually what you want. For a raw &lt;code&gt;[]byte&lt;/code&gt;, use &lt;code&gt;buf[:0]&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make your &lt;code&gt;New&lt;/code&gt; function return a pre-reset object.&lt;/strong&gt; Don't assume it's always "fresh."&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Alloc Benchmark Methodology
&lt;/h2&gt;

&lt;p&gt;The only honest way to know if pooling is helping is &lt;code&gt;go test -bench -benchmem&lt;/code&gt;. Here's what a useful benchmark looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkWithoutPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReportAllocs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResetTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;writeResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exampleRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkWithPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReportAllocs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResetTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;writeResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exampleRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-bench&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-benchmem&lt;/span&gt;
&lt;span class="go"&gt;BenchmarkWithoutPool-10    200000    8431 ns/op    4352 B/op    3 allocs/op
BenchmarkWithPool-10       500000    3214 ns/op     128 B/op    1 allocs/op
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;allocs/op&lt;/code&gt; drops significantly&lt;/strong&gt; (here: 3 → 1).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ns/op&lt;/code&gt; drops or stays flat&lt;/strong&gt; (here: 8431 → 3214).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If &lt;code&gt;allocs/op&lt;/code&gt; drops but &lt;code&gt;ns/op&lt;/code&gt; goes up, pooling is adding overhead without saving enough GC pressure to justify itself. That's the "wasted effort" signal.&lt;/p&gt;

&lt;p&gt;The benchmark alone isn't enough, though — you also need production evidence. pprof heap profiles before and after deployment should show reduced allocation. If the prod numbers don't match the benchmark, you're measuring the wrong thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Pattern That Actually Works: Scoped Pools
&lt;/h2&gt;

&lt;p&gt;One pattern I've found useful: &lt;strong&gt;scope the pool to the type of work it serves&lt;/strong&gt;. Don't have one giant pool that everything pulls from.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// JSON response buffer pool&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;jsonBufPool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Protocol frame buffer pool (different typical size)&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;frameBufPool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why separate pools matter: if you have one shared pool, you might &lt;code&gt;Get()&lt;/code&gt; a 64KB buffer when you needed a 4KB one and waste memory. Or worse, you might &lt;code&gt;Get()&lt;/code&gt; a 4KB one for a 64KB job and grow it (defeating pooling's purpose).&lt;/p&gt;

&lt;p&gt;Separate pools stay close to their intended sizes. Each pool's items are homogeneous. The New function's initial capacity reflects the typical workload.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Big Thing &lt;code&gt;sync.Pool&lt;/code&gt; Isn't
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;sync.Pool&lt;/code&gt; is not a replacement for bounded resource pools (database connections, HTTP clients, goroutine worker pools). Those need explicit lifecycle management, health checks, and non-discardable state. Use a real pool library for them.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sync.Pool&lt;/code&gt; is also not a cache. A cache holds items you want to find again. &lt;code&gt;sync.Pool&lt;/code&gt; holds items you might reuse if one's convenient, and discards them otherwise. Different primitive for a different problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Matters
&lt;/h2&gt;

&lt;p&gt;Most Go code is fast enough without pooling. Before adding &lt;code&gt;sync.Pool&lt;/code&gt; to your hot path, ask:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Have I actually benchmarked this with &lt;code&gt;-benchmem&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;Are the objects I'd pool both large and frequent?&lt;/li&gt;
&lt;li&gt;Can I reliably reset them?&lt;/li&gt;
&lt;li&gt;Is GC pressure in pprof profiles actually a problem?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If any answer is no, skip the pool. The simpler code is almost always the better code.&lt;/p&gt;

&lt;p&gt;The cases where pooling pays are real but narrower than internet wisdom suggests. Per-request buffers, protocol frame buffers, encoder/decoder state, crypto scratch space. Beyond that, the pool usually adds more lines of code than it saves nanoseconds — and each of those lines is one more place where a missing &lt;code&gt;Reset()&lt;/code&gt; can leak bytes between requests.&lt;/p&gt;

&lt;p&gt;Measure. Then decide.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-chan-context-structure-not-speed/" rel="noopener noreferrer"&gt;Go's Concurrency Is About Structure, Not Speed&lt;/a&gt; — the bigger principle: Go optimizes for correct structure, not raw speed.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/testing-real-world-go-backends/" rel="noopener noreferrer"&gt;Testing Real-World Go Backends Isn't What Many People Think&lt;/a&gt; — how to actually benchmark and prove a pool helps.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>performance</category>
      <category>syncpool</category>
    </item>
    <item>
      <title>IronSys: A Production Blueprint for Modern Concurrency</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 16:28:01 +0000</pubDate>
      <link>https://dev.to/harrisonsec/ironsys-a-production-blueprint-for-modern-concurrency-8c9</link>
      <guid>https://dev.to/harrisonsec/ironsys-a-production-blueprint-for-modern-concurrency-8c9</guid>
      <description>&lt;p&gt;In the last post I walked through the four concurrency pillars — shared memory + locks, CSP, actors, STM — and argued that real systems mix them on purpose. Someone reasonably asked: &lt;em&gt;okay, but what does that actually look like?&lt;/em&gt; Fair question. Abstract taxonomy is less useful than a worked example.&lt;/p&gt;

&lt;p&gt;IronSys is that worked example. It's a composite blueprint — not a real service, but representative of a class of services I've designed, helped design, or debugged in production. Let's say it's a mid-sized backend system: public API, stateful user sessions, streaming data in, aggregation and reporting out. The kind of thing that appears in the middle of any serious platform.&lt;/p&gt;

&lt;p&gt;The interesting part isn't the features. It's which concurrency primitive shows up where, and why.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — IronSys is a composite production blueprint: a multi-service Go backend with stateful user sessions, streaming ingest, and usage aggregation. It uses CSP channels for pipelines and coordination, a goroutine-per-entity actor pattern for stateful sessions, mutexes and atomics for hot shared counters, and durable queues for cross-service handoff. Each primitive is picked for a specific failure mode. The pattern is not "mix for variety"; it's "match the primitive to the work."&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The System Shape
&lt;/h2&gt;

&lt;p&gt;Before deciding on concurrency primitives, sketch the work shapes. IronSys has four:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Public API&lt;/strong&gt; — request/response, modest concurrency, latency-sensitive. The classic HTTP backend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live sessions&lt;/strong&gt; — stateful, long-lived per-user entities. Think multiplayer game server, collaborative editor, real-time dashboard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming ingest&lt;/strong&gt; — high-throughput events arriving over Kafka/NATS, fanned out to workers for processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch aggregation&lt;/strong&gt; — periodic rollup jobs that read from storage, compute, write back.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Four shapes, four concurrency patterns. The wrong design would apply the same primitive to all four. The right design picks each separately.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBTaGFwZXNbIldvcmsgc2hhcGVzIl0KICAgICAgICBTMVsiMS4gUHVibGljIEFQSTxici8-c3RhdGVsZXNzIMK3IHJlcXVlc3QvcmVzcG9uc2UiXQogICAgICAgIFMyWyIyLiBMaXZlIHNlc3Npb25zPGJyLz5zdGF0ZWZ1bCDCtyBsb25nLWxpdmVkIl0KICAgICAgICBTM1siMy4gU3RyZWFtaW5nIGluZ2VzdDxici8-aGlnaCB0aHJvdWdocHV0IMK3IHN0YXRlbGVzcyJdCiAgICAgICAgUzRbIjQuIEJhdGNoIGFnZ3JlZ2F0aW9uPGJyLz5waXBlbGluZSDCtyBzY2hlZHVsZWQiXQogICAgZW5kCgogICAgc3ViZ3JhcGggUHJpbWl0aXZlc1siQ29uY3VycmVuY3kgcHJpbWl0aXZlcyJdCiAgICAgICAgUDFbIkdvcm91dGluZSArIG11dGV4PGJyLz5wZXItcmVxdWVzdCBoYW5kbGVyIl0KICAgICAgICBQMlsiR29yb3V0aW5lLXBlci1lbnRpdHk8YnIvPmFjdG9yLWxpa2UgwrcgcHJpdmF0ZSBzdGF0ZSJdCiAgICAgICAgUDNbIkJvdW5kZWQgY2hhbm5lbCArIHdvcmtlciBwb29sPGJyLz5DU1AgwrcgYmFja3ByZXNzdXJlIl0KICAgICAgICBQNFsiQ1NQIHBpcGVsaW5lICsgZXJyZ3JvdXA8YnIvPnN0YWdlZCDCtyBjYW5jZWxsYWJsZSJdCiAgICBlbmQKCiAgICBTMSAtLT4gUDEKICAgIFMyIC0tPiBQMgogICAgUzMgLS0-IFAzCiAgICBTNCAtLT4gUDQKCiAgICBjbGFzc0RlZiBzaGFwZSBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIHByaW0gZmlsbDojZjBmZmY0LHN0cm9rZTojMmY4NTVhCiAgICBjbGFzcyBTaGFwZXMgc2hhcGUKICAgIGNsYXNzIFByaW1pdGl2ZXMgcHJpbQ%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBTaGFwZXNbIldvcmsgc2hhcGVzIl0KICAgICAgICBTMVsiMS4gUHVibGljIEFQSTxici8-c3RhdGVsZXNzIMK3IHJlcXVlc3QvcmVzcG9uc2UiXQogICAgICAgIFMyWyIyLiBMaXZlIHNlc3Npb25zPGJyLz5zdGF0ZWZ1bCDCtyBsb25nLWxpdmVkIl0KICAgICAgICBTM1siMy4gU3RyZWFtaW5nIGluZ2VzdDxici8-aGlnaCB0aHJvdWdocHV0IMK3IHN0YXRlbGVzcyJdCiAgICAgICAgUzRbIjQuIEJhdGNoIGFnZ3JlZ2F0aW9uPGJyLz5waXBlbGluZSDCtyBzY2hlZHVsZWQiXQogICAgZW5kCgogICAgc3ViZ3JhcGggUHJpbWl0aXZlc1siQ29uY3VycmVuY3kgcHJpbWl0aXZlcyJdCiAgICAgICAgUDFbIkdvcm91dGluZSArIG11dGV4PGJyLz5wZXItcmVxdWVzdCBoYW5kbGVyIl0KICAgICAgICBQMlsiR29yb3V0aW5lLXBlci1lbnRpdHk8YnIvPmFjdG9yLWxpa2UgwrcgcHJpdmF0ZSBzdGF0ZSJdCiAgICAgICAgUDNbIkJvdW5kZWQgY2hhbm5lbCArIHdvcmtlciBwb29sPGJyLz5DU1AgwrcgYmFja3ByZXNzdXJlIl0KICAgICAgICBQNFsiQ1NQIHBpcGVsaW5lICsgZXJyZ3JvdXA8YnIvPnN0YWdlZCDCtyBjYW5jZWxsYWJsZSJdCiAgICBlbmQKCiAgICBTMSAtLT4gUDEKICAgIFMyIC0tPiBQMgogICAgUzMgLS0-IFAzCiAgICBTNCAtLT4gUDQKCiAgICBjbGFzc0RlZiBzaGFwZSBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIHByaW0gZmlsbDojZjBmZmY0LHN0cm9rZTojMmY4NTVhCiAgICBjbGFzcyBTaGFwZXMgc2hhcGUKICAgIGNsYXNzIFByaW1pdGl2ZXMgcHJpbQ%3D%3D" alt="S1[" width="686" height="596"&gt;&lt;/a&gt;stateless · request/response"]"/&amp;gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The API Handlers
&lt;/h2&gt;

&lt;p&gt;Nothing fancy. Stock Go HTTP server. Each request is its own goroutine (Go's runtime does this automatically). Shared state — rate limiters, cache, config — is protected by mutexes or atomics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;RateLimiter&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mu&lt;/span&gt;      &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mutex&lt;/span&gt;
    &lt;span class="n"&gt;buckets&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;RateLimiter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;newBucket&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;allow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Obvious choice. The contention is bounded by request rate, the state is small, a mutex is the simplest possible tool. Over-engineering here — sharded maps, lock-free data structures — buys nothing.&lt;/p&gt;

&lt;p&gt;What IronSys does here that many teams miss: &lt;strong&gt;every handler is context-aware from request entry&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;HandleFoo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parseReq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;writeResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Context flows everywhere downstream. The handler layer is boring; that's the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Live Sessions — Actor Pattern in Go
&lt;/h2&gt;

&lt;p&gt;Each active user session is a long-lived goroutine with an inbox channel. I call this the &lt;strong&gt;goroutine-per-entity pattern&lt;/strong&gt; — it's Erlang actors without the runtime, built from Go primitives.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Session&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;       &lt;span class="n"&gt;SessionID&lt;/span&gt;
    &lt;span class="n"&gt;mailbox&lt;/span&gt;  &lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;SessionCmd&lt;/span&gt;  &lt;span class="c"&gt;// the "actor" inbox&lt;/span&gt;
    &lt;span class="n"&gt;shutdown&lt;/span&gt; &lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;    &lt;span class="n"&gt;sessionState&lt;/span&gt;      &lt;span class="c"&gt;// private to this goroutine&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;SessionCmd&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;op&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt;   &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;reply&lt;/span&gt;  &lt;span class="k"&gt;chan&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;SessionReply&lt;/span&gt; &lt;span class="c"&gt;// optional reply channel&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;runSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mailbox&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mailbox&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shutdown&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c"&gt;// persist final state&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this pattern, not "session is a struct with a mutex"?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;State is private to one goroutine&lt;/strong&gt;. No sharing, no locks, no lock-ordering bugs. The session state is accessed by exactly one execution context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serial message processing&lt;/strong&gt;. Commands process one at a time, in FIFO order. Business invariants hold naturally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Natural location for cross-session coordination&lt;/strong&gt;. Each session is a message destination. Broadcasting to all sessions, or routing a command to a specific session, is just "send on its inbox."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean lifecycle&lt;/strong&gt;. The goroutine runs until &lt;code&gt;shutdown&lt;/code&gt; or &lt;code&gt;ctx.Done&lt;/code&gt;. State is flushed once, on exit. No race between "is this session still alive" and "did we finish writing its state."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The manager that creates and routes to sessions looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;SessionManager&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mu&lt;/span&gt;       &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RWMutex&lt;/span&gt;
    &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SessionID&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Session&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;SessionManager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;SessionID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RLock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RUnlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;SessionManager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;SessionID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Session&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;newSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;runSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// supervisor goroutine&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the mixing: the manager uses a mutex-protected map (shared state with a clear owner), individual sessions use the actor pattern (isolated state, message-passing). Two primitives, picked per-job.&lt;/p&gt;

&lt;p&gt;This pattern scales to millions of sessions because goroutines are cheap. I've seen this exact pattern serve 400K concurrent sessions on a single pod.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Streaming Ingest — Bounded Worker Pool (CSP)
&lt;/h2&gt;

&lt;p&gt;Kafka consumer feeding a worker pool. Canonical CSP territory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;runConsumer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cons&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Consumer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;

    &lt;span class="c"&gt;// Fixed worker pool&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;workerCount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Producer&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;cons&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The bounded channel is the concurrency clamp. Kafka can push as fast as it wants; the worker pool consumes at its own pace; backpressure propagates back to Kafka's consumer offset naturally.&lt;/p&gt;

&lt;p&gt;Why not actors here? Because the work items are stateless — you're processing events, not maintaining per-entity state. The overhead of an actor (mailbox, dispatch, ownership) is unjustified. CSP is the right fit.&lt;/p&gt;

&lt;p&gt;Why not mutex + a worker loop? You could, but the channel primitive is exactly the right shape — bounded capacity + safe cross-goroutine handoff + graceful shutdown — without needing to build those three features yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Batch Aggregation — Pipelines + errgroup
&lt;/h2&gt;

&lt;p&gt;Nightly rollup: read from storage, compute per-account aggregates, write back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;runRollup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;errgroup&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// Stage 1: parse&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;ParsedEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Go&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;parseStage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c"&gt;// Stage 2: aggregate (keyed by account)&lt;/span&gt;
    &lt;span class="n"&gt;agged&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Aggregate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Go&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agged&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;aggregateStage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agged&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c"&gt;// Stage 3: persist&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Go&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;persistStage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agged&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three stages in a pipeline. Each stage is a goroutine, connected by bounded channels. &lt;code&gt;errgroup&lt;/code&gt; ties them together: first error cancels the whole pipeline.&lt;/p&gt;

&lt;p&gt;The aggregation stage internally uses a map protected by a mutex, because it's a single goroutine reading the map — no contention at all, but still safe if a future change introduces more readers.&lt;/p&gt;

&lt;p&gt;This is textbook CSP: &lt;em&gt;the topology of channels is the architecture&lt;/em&gt;. Read the code and the shape of the computation is obvious.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cross-Service Handoff — Durable Queues
&lt;/h2&gt;

&lt;p&gt;IronSys talks to two other services: a billing service (async, eventually consistent) and an auth service (sync, immediate).&lt;/p&gt;

&lt;p&gt;For billing: a dedicated NATS JetStream subject with at-least-once delivery. Usage events go in one end; the billing service reads them. The emission codepath has a local write-ahead log so that if NATS is briefly down, events buffer on disk and replay when the connection recovers.&lt;/p&gt;

&lt;p&gt;For auth: gRPC with tight timeouts. Caller owns completion. If auth is slow, the API handler's deadline fires and the request fails fast.&lt;/p&gt;

&lt;p&gt;Two different ownership models for two different shapes of work. See: &lt;a href="https://harrisonsec.com/blog/rpc-vs-nats-who-owns-completion/" rel="noopener noreferrer"&gt;RPC vs NATS: Who Owns Completion&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Primitives Map
&lt;/h2&gt;

&lt;p&gt;Summarizing which primitive serves which job in IronSys:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Work shape&lt;/th&gt;
&lt;th&gt;Primitive&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HTTP request handling&lt;/td&gt;
&lt;td&gt;Stock &lt;code&gt;net/http&lt;/code&gt; + goroutine per request&lt;/td&gt;
&lt;td&gt;Language default, right for stateless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hot shared state (rate limiter, cache)&lt;/td&gt;
&lt;td&gt;Mutex / atomic&lt;/td&gt;
&lt;td&gt;Simplest primitive that works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stateful user sessions&lt;/td&gt;
&lt;td&gt;Goroutine-per-entity (actor-like)&lt;/td&gt;
&lt;td&gt;Isolated state, message-passing, serial processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session directory&lt;/td&gt;
&lt;td&gt;RWMutex-protected map&lt;/td&gt;
&lt;td&gt;Shared lookup, read-heavy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming event processing&lt;/td&gt;
&lt;td&gt;Bounded channel + worker pool (CSP)&lt;/td&gt;
&lt;td&gt;Backpressure, parallelism, graceful shutdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-stage data pipeline&lt;/td&gt;
&lt;td&gt;CSP pipeline + errgroup&lt;/td&gt;
&lt;td&gt;Stage topology = architecture; first-error cancels all&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async cross-service handoff&lt;/td&gt;
&lt;td&gt;Durable queue (NATS JetStream / Kafka)&lt;/td&gt;
&lt;td&gt;Receiver owns completion, at-least-once delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sync cross-service call&lt;/td&gt;
&lt;td&gt;gRPC with ctx timeout&lt;/td&gt;
&lt;td&gt;Caller owns completion, fast failure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice: &lt;strong&gt;all four concurrency pillars show up&lt;/strong&gt;. Mutexes in the rate limiter. CSP in the event pipeline. Actors (in pattern) in the session runtime. (STM is missing; it would show up if I were doing this in Clojure or Haskell.)&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Architecture Gets Wrong
&lt;/h2&gt;

&lt;p&gt;Every architecture has weaknesses. IronSys's are real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The actor pattern isn't real actors.&lt;/strong&gt; Without Erlang-style supervision, if a session goroutine panics, Go's default behavior is to kill the process. Adding panic recovery per-session is easy but not free. In practice, most teams hit this 6 months in, add a recovery wrapper, and move on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bounded channels can mask slow downstream.&lt;/strong&gt; If a channel fills up and the producer blocks, that's backpressure — great. But if the channel is buffered too large, you can buffer a lot of work into memory before realizing downstream is slow. Tune buffer sizes with measurements, not guesses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goroutine-per-entity has a per-session baseline cost.&lt;/strong&gt; Cheap but not free. A million sessions is ~2.5GB of goroutine stacks. For services where most entities are inactive, a lazy pattern (spin up on activity, suspend to disk on idle) is better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixing paradigms cognitively.&lt;/strong&gt; New engineers have to learn four patterns instead of one. The productivity hit is real for the first two weeks; the payoff is in the next two years.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What This Blueprint Is Really Selling
&lt;/h2&gt;

&lt;p&gt;A system with four work shapes should have four concurrency patterns, not one stretched to cover everything. The four pillars aren't theoretical; they map to real design decisions, and production Go services that use them deliberately are easier to reason about than those that don't.&lt;/p&gt;

&lt;p&gt;What IronSys is really selling is &lt;strong&gt;intentional heterogeneity&lt;/strong&gt;. Every primitive is there for a reason. Every reason is traceable to a specific failure mode you want to prevent. The architecture should be legible — a new engineer reading the code should understand why a channel is there instead of a mutex, why a session has its own goroutine instead of being a struct in a shared map, why billing goes through a durable queue instead of a gRPC call.&lt;/p&gt;

&lt;p&gt;If you can't answer "why this primitive here," the code isn't finished. It's just working, for now.&lt;/p&gt;

&lt;p&gt;Blueprints are useful precisely because they're generic. The specifics of your system will be different. But the decision framework — what's the work shape, what's the failure mode, what's the right primitive — is the same every time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/four-pillars-modern-concurrency-locks-to-actors/" rel="noopener noreferrer"&gt;From Locks to Actors: The Four Pillars of Modern Concurrency&lt;/a&gt; — the taxonomy behind the choices in IronSys.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-chan-context-structure-not-speed/" rel="noopener noreferrer"&gt;Go's Concurrency Is About Structure, Not Speed&lt;/a&gt; — chan and context as the glue across all of these.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/rpc-vs-nats-who-owns-completion/" rel="noopener noreferrer"&gt;RPC vs NATS: It's Not About Sync vs Async — It's About Who Owns Completion&lt;/a&gt; — the cross-service handoff choices.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/testing-real-world-go-backends/" rel="noopener noreferrer"&gt;Testing Real-World Go Backends Isn't What Many People Think&lt;/a&gt; — how you verify a system like this actually holds up.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>concurrency</category>
      <category>systemdesign</category>
      <category>go</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Docker Kubernetes: What They Really Changed (It's Not What You Think)</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 07:13:19 +0000</pubDate>
      <link>https://dev.to/harrisonsec/docker-x-kubernetes-what-they-really-changed-its-not-what-you-think-1972</link>
      <guid>https://dev.to/harrisonsec/docker-x-kubernetes-what-they-really-changed-its-not-what-you-think-1972</guid>
      <description>&lt;p&gt;"A Docker container is basically a lightweight VM, right?" No. That sentence alone causes more architectural misunderstandings than any other in modern backend engineering. A VM virtualizes hardware. A container is a set of Linux kernel features — namespaces, cgroups, overlay filesystems — wrapped in a nicer CLI. Same host kernel, same memory space, same attack surface if the kernel has a bug. The marketing that says otherwise has cost teams real money in misconfigured production.&lt;/p&gt;

&lt;p&gt;Kubernetes gets the same treatment. "It's a tool for running containers." Also not really. Kubernetes is a distributed scheduler, service mesh, declarative control plane, and reconciliation engine. Containers are one of the things it happens to run. Treating Kubernetes as "container orchestration" produces systems that break in predictable, frustrating ways — because the team never learned that the reconciliation loop, not the container, is the thing that actually matters.&lt;/p&gt;

&lt;p&gt;This is a working engineer's re-read of what Docker and Kubernetes actually changed. Not the marketing story. The underneath-the-hood story that tells you when to reach for them and when they're overkill.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Docker didn't invent Linux namespaces, cgroups, or filesystem layering; it packaged them into a developer-friendly workflow. That workflow is what changed. Kubernetes didn't invent distributed scheduling, service discovery, or rolling deployments; it standardized the declarative, reconciliation-loop pattern for all of them. That pattern is what changed. Understanding these primitives (namespaces + cgroups + reconciliation loops) tells you when to reach for the tools and when the tools are overkill.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Docker Actually Is
&lt;/h2&gt;

&lt;p&gt;Docker is a set of Linux kernel features wrapped in a nice CLI and an image format. The features existed before Docker; they just weren't accessible.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linux namespaces&lt;/strong&gt; — process, mount, network, IPC, UTS, user, cgroup. Each namespace gives a process its own view of that resource. When your container thinks it has PID 1, it really thinks so; inside its PID namespace, the host's init is invisible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cgroups (v1/v2)&lt;/strong&gt; — resource accounting and limits. How much CPU, memory, I/O bandwidth a group of processes can use. This is why a misconfigured container can eat a host's memory and take everything else down.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Union / overlay filesystems&lt;/strong&gt; — the thing that lets you stack "base image" + "layer 1" + "layer 2" without copying. OverlayFS on modern kernels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image format (OCI)&lt;/strong&gt; — a standard way to package a root filesystem plus metadata into something reproducible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker's innovation was not inventing any of this. It was making them &lt;strong&gt;accessible&lt;/strong&gt;. &lt;code&gt;docker run -p 8080:80 nginx&lt;/code&gt; hides a beautiful horror of namespace creation, iptables rules, virtual ethernet pairs, overlay mounts, and cgroup assignment. Before Docker, you'd have spent a week reading &lt;code&gt;unshare(2)&lt;/code&gt; and &lt;code&gt;ip netns add&lt;/code&gt; to reproduce this. After Docker, you did it in a workshop afternoon.&lt;/p&gt;

&lt;p&gt;What actually changed: &lt;strong&gt;deployments became reproducible&lt;/strong&gt;. The image you built on your laptop contained everything needed to run — OS libraries, Python version, environment. "Works on my machine" stopped being a coping mechanism and started being a legitimate development artifact. That's the Docker revolution. Not containers. Reproducible, portable environments.&lt;/p&gt;

&lt;p&gt;The thing that is &lt;em&gt;not&lt;/em&gt; true, despite the marketing: Docker containers are not VMs. They share the host kernel. A kernel exploit in one container can reach the host and other containers. Containers are a soft isolation — good enough for most production multi-tenant workloads, not good enough for hostile tenants.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Kubernetes Actually Is
&lt;/h2&gt;

&lt;p&gt;Kubernetes is a declarative control plane built on the &lt;strong&gt;reconciliation loop&lt;/strong&gt; pattern. This is the single most important idea to internalize.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You write a manifest describing the &lt;strong&gt;desired state&lt;/strong&gt;: "three replicas of this deployment, exposed through this service, attached to this config."&lt;/li&gt;
&lt;li&gt;You hand the manifest to the control plane: "make it so."&lt;/li&gt;
&lt;li&gt;Kubernetes runs an unending loop: observe the current state, compare to desired, take actions to close the gap.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything Kubernetes does follows this pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Deployment&lt;/code&gt; controllers watch the pod count, scale up if low, scale down if high.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ReplicaSet&lt;/code&gt; controllers ensure N identical pods exist.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Service&lt;/code&gt; controllers maintain the iptables / IPVS / eBPF rules that route virtual IPs.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Ingress&lt;/code&gt; controllers watch Ingress resources and configure the edge proxy.&lt;/li&gt;
&lt;li&gt;The scheduler watches for unscheduled pods and binds them to nodes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Node&lt;/code&gt; controller watches node health and evicts pods from unhealthy nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your application is just the &lt;strong&gt;data&lt;/strong&gt; in the reconciliation loop. The loops run forever, closing gaps. That's Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBHaXRbKEdpdCDCtyBtYW5pZmVzdHM8YnIvPnNvdXJjZSBvZiB0cnV0aCldIC0tPiBBUElbS3ViZXJuZXRlcyBBUEkgc2VydmVyXQoKICAgIHN1YmdyYXBoIExvb3BbIlJlY29uY2lsaWF0aW9uIGxvb3AgwrcgZm9yZXZlciJdCiAgICAgICAgRGVzaXJlZFsiRGVzaXJlZCBzdGF0ZTxici8-ZnJvbSBtYW5pZmVzdCJdIC0tPiBDb21wYXJle01hdGNoP30KICAgICAgICBPYnNlcnZlZFsiT2JzZXJ2ZWQgc3RhdGU8YnIvPmZyb20gY2x1c3RlciJdIC0tPiBDb21wYXJlCiAgICAgICAgQ29tcGFyZSAtLT58Tm8gwrcgYWN0fCBBY3Rpb25bIkNvbnRyb2xsZXIgdGFrZXMgYWN0aW9uPGJyLz5zY2FsZSDCtyBzY2hlZHVsZSDCtyBldmljdCDCtyByb3V0ZSJdCiAgICAgICAgQWN0aW9uIC0tPiBPYnNlcnZlZAogICAgICAgIENvbXBhcmUgLS0-fFllcyDCtyB3YWl0fCBPYnNlcnZlZAogICAgZW5kCgogICAgQVBJIC0tPiBEZXNpcmVkCiAgICBBUEkgLS0-IE9ic2VydmVkCgogICAgVXNlcihbWW91IMK3IGt1YmVjdGwgYXBwbHldKSAtLT58dXBkYXRlIG1hbmlmZXN0fCBHaXQKCiAgICBjbGFzc0RlZiBsb29wIGZpbGw6I2U4ZjRmOCxzdHJva2U6IzJjNTI4MixzdHJva2Utd2lkdGg6MnB4CiAgICBjbGFzcyBMb29wIGxvb3A%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBHaXRbKEdpdCDCtyBtYW5pZmVzdHM8YnIvPnNvdXJjZSBvZiB0cnV0aCldIC0tPiBBUElbS3ViZXJuZXRlcyBBUEkgc2VydmVyXQoKICAgIHN1YmdyYXBoIExvb3BbIlJlY29uY2lsaWF0aW9uIGxvb3AgwrcgZm9yZXZlciJdCiAgICAgICAgRGVzaXJlZFsiRGVzaXJlZCBzdGF0ZTxici8-ZnJvbSBtYW5pZmVzdCJdIC0tPiBDb21wYXJle01hdGNoP30KICAgICAgICBPYnNlcnZlZFsiT2JzZXJ2ZWQgc3RhdGU8YnIvPmZyb20gY2x1c3RlciJdIC0tPiBDb21wYXJlCiAgICAgICAgQ29tcGFyZSAtLT58Tm8gwrcgYWN0fCBBY3Rpb25bIkNvbnRyb2xsZXIgdGFrZXMgYWN0aW9uPGJyLz5zY2FsZSDCtyBzY2hlZHVsZSDCtyBldmljdCDCtyByb3V0ZSJdCiAgICAgICAgQWN0aW9uIC0tPiBPYnNlcnZlZAogICAgICAgIENvbXBhcmUgLS0-fFllcyDCtyB3YWl0fCBPYnNlcnZlZAogICAgZW5kCgogICAgQVBJIC0tPiBEZXNpcmVkCiAgICBBUEkgLS0-IE9ic2VydmVkCgogICAgVXNlcihbWW91IMK3IGt1YmVjdGwgYXBwbHldKSAtLT58dXBkYXRlIG1hbmlmZXN0fCBHaXQKCiAgICBjbGFzc0RlZiBsb29wIGZpbGw6I2U4ZjRmOCxzdHJva2U6IzJjNTI4MixzdHJva2Utd2lkdGg6MnB4CiAgICBjbGFzcyBMb29wIGxvb3A%3D" alt="Git[(Git · manifests&amp;lt;br/&amp;gt;source of truth)] --&amp;gt; API[Kubernetes API server]" width="1723" height="308"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every Kubernetes feature — Deployments, Services, Ingresses, HPAs, CronJobs, StatefulSets — is some controller running this exact pattern. Once you see it, the platform stops being magic.&lt;/p&gt;

&lt;p&gt;What actually changed because of this: &lt;strong&gt;the operational model became shared across companies&lt;/strong&gt;. Before Kubernetes, every engineering team had a bespoke orchestration system: a collection of Chef/Puppet/Ansible recipes, some custom scripts, a deploy button, and a few senior engineers who knew which knobs to turn during incidents. Different at every company. Opaque to new hires. Sensitive to key-person risk.&lt;/p&gt;

&lt;p&gt;Kubernetes is many things, but the single biggest thing it did was replace a hundred bespoke orchestration glues with one standard. It's not the best tool for every problem — Nomad is simpler, ECS is more managed, Cloud Run hides the thing entirely — but it's the standard, and "it's the standard" has real value: hires know it, vendors build for it, books exist, the job market is liquid.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mental Model Most People Miss
&lt;/h2&gt;

&lt;p&gt;Once you see "reconciliation loop," you stop asking questions Kubernetes doesn't answer.&lt;/p&gt;

&lt;p&gt;"How do I deploy?" You don't. You update a manifest. A controller observes the change and reconciles.&lt;/p&gt;

&lt;p&gt;"How do I roll back?" You don't. You update the manifest back. A controller observes the change and reconciles in the other direction.&lt;/p&gt;

&lt;p&gt;"Why did my pod get killed?" Because a controller decided the current state (this pod is here, on this node) didn't match the desired state (node is draining, or pod is over its memory limit, or a replica count decreased). It closed the gap.&lt;/p&gt;

&lt;p&gt;"Why can't I SSH in and hand-edit things?" Because the next reconcile loop will undo your edit. The manifest is the source of truth. If you want to change behavior, change the manifest.&lt;/p&gt;

&lt;p&gt;This is a shift from imperative ops ("run these commands to deploy") to declarative ops ("the system should look like this; make it so"). Git becomes the history of what your infrastructure should be. Time travel works. Change review works. Disaster recovery becomes "re-apply the manifests to a new cluster." When it clicks, you stop fighting the platform.&lt;/p&gt;

&lt;p&gt;Until it clicks, the platform feels maddening. "I just want to run a container" — yes, but the platform doesn't care what you want to do once. It cares about the continuous state. Every action through kubectl apply is a statement of desired state, not an imperative command.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed in Practice
&lt;/h2&gt;

&lt;p&gt;Concretely, what looks different on a team that's moved from "SSH into the box and &lt;code&gt;systemctl restart&lt;/code&gt;" to a reconciled-state model:&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment became a git push
&lt;/h3&gt;

&lt;p&gt;Before: log into the bastion, pull the latest build, restart the service, watch the log.&lt;br&gt;
After: merge to main, CI pushes image to registry, ArgoCD/Flux observes the manifest change, the Deployment controller updates the ReplicaSet, pods roll gradually.&lt;/p&gt;

&lt;p&gt;Benefits: change review, audit trail, rollback by git revert, consistent deploys across teams.&lt;br&gt;
Costs: debugging a broken deploy requires understanding the CD pipeline, the manifest, and the controller that's reconciling. The failure mode surface is wider.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling became a number in a file
&lt;/h3&gt;

&lt;p&gt;Before: write a script that watches metrics, calls the cloud API, hopes for the best.&lt;br&gt;
After: &lt;code&gt;replicas: 10&lt;/code&gt; in a manifest, or an HPA (Horizontal Pod Autoscaler) that watches metrics and adjusts the Deployment.&lt;/p&gt;

&lt;p&gt;Benefits: declarative, versioned, reproducible.&lt;br&gt;
Costs: HPA behavior is subtle — wrong thresholds cause thrashing, wrong metrics cause over/underscaling. Many teams never invest in tuning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Service discovery became DNS
&lt;/h3&gt;

&lt;p&gt;Before: register in Consul, read from Consul, have a catalog. Or hardcode IPs. Or service registry.&lt;br&gt;
After: &lt;code&gt;my-service.my-namespace.svc.cluster.local&lt;/code&gt; resolves to a stable virtual IP. Kube-proxy or CNI load-balances to healthy pods.&lt;/p&gt;

&lt;p&gt;Benefits: services don't need to know how other services run. Standard DNS.&lt;br&gt;
Costs: the DNS / networking layer is one of the hardest parts of Kubernetes to debug. When service discovery breaks, you're reading iptables or eBPF maps, not a Consul dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration became a manifest
&lt;/h3&gt;

&lt;p&gt;Before: environment variables, .env files, maybe Consul KV.&lt;br&gt;
After: ConfigMaps and Secrets, mounted as env vars or volumes.&lt;/p&gt;

&lt;p&gt;Benefits: versioned, reviewed, separate from code.&lt;br&gt;
Costs: changing a ConfigMap doesn't automatically restart pods. You have to annotate the Deployment or use something like reloader. New users get bitten by this constantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Kubernetes Is Overkill
&lt;/h2&gt;

&lt;p&gt;I'll say it directly: most teams adopting Kubernetes for the first time don't need it.&lt;/p&gt;

&lt;p&gt;Rules of thumb:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Two or three services, one team&lt;/strong&gt;: you don't need Kubernetes. ECS, Nomad, Cloud Run, or even systemd + Ansible will do. The operational overhead of Kubernetes exceeds its benefit at this scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ten to twenty services, small team&lt;/strong&gt;: Kubernetes starts breaking even if you pick a managed service (EKS, GKE, AKS). Don't run your own control plane.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fifty+ services, multiple teams, serious release engineering needs&lt;/strong&gt;: Kubernetes is probably the right call. The cost of complexity is amortized over the benefits of a shared declarative platform.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The dangerous zone is 5-15 services on a small team. At that scale, Kubernetes often wins the resume-driven-development vote and loses the actual-outcomes vote. Pick a simpler tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Kubernetes Is the Right Answer
&lt;/h2&gt;

&lt;p&gt;The jobs where Kubernetes genuinely shines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-service, multi-team engineering orgs&lt;/strong&gt; where consistency matters more than per-service optimality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale-out workloads with heterogeneous shapes&lt;/strong&gt; — web apps, job runners, ML batch jobs, stateful databases, all on one platform.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Teams that want declarative infrastructure&lt;/strong&gt; — GitOps via ArgoCD/Flux, infra PRs reviewed like code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workloads with nontrivial scheduling&lt;/strong&gt; — affinity rules, taints, GPU allocation, spot instances.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operators ecosystem&lt;/strong&gt; — Kubernetes operators (Prometheus operator, cert-manager, etc.) let you extend the same reconciliation model to application-specific concerns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice the pattern: Kubernetes wins when you want the platform's primitives — declarative state, reconciliation, operators — beyond just container scheduling. If you only want "run my container," you're buying a jumbo jet to fly to the next town.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Tell a Team Starting Fresh
&lt;/h2&gt;

&lt;p&gt;Two concrete takeaways I'd hand to engineers thinking about Docker and Kubernetes.&lt;/p&gt;

&lt;p&gt;For Docker: the image isn't the point. Reproducibility is. An image built on your laptop that runs unchanged in CI and production — that's the contract you got. Break it (say, by mutating state inside the running container) and you lose the value. The container is a delivery mechanism for a reproducible environment.&lt;/p&gt;

&lt;p&gt;For Kubernetes: the manifest is the source of truth. Every piece of your infrastructure — deployments, services, secrets, ingresses, policies — lives in git. Every change is a git change. Every rollback is a git revert. If you find yourself running &lt;code&gt;kubectl edit&lt;/code&gt; on production, something is wrong with your workflow, not with Kubernetes.&lt;/p&gt;

&lt;p&gt;Both tools won because they codified patterns that were already emerging in sophisticated shops. They didn't invent the patterns. They made them accessible, portable, and standard. That's the fifteen-year revolution. Not containers. Not YAML. The standardization of patterns that used to require a senior infrastructure team to implement from scratch at every company.&lt;/p&gt;

&lt;p&gt;When you work with the grain of the pattern — reproducible environments for Docker, reconciled declarative state for Kubernetes — both tools get out of the way. When you fight the grain, they fight back.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-millions-connections-user-space-context-switching/" rel="noopener noreferrer"&gt;Why Go Handles Millions of Connections&lt;/a&gt; — Linux primitives that Docker is built on, seen from the language side.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/observability-cost-attribution-dual-path-architecture/" rel="noopener noreferrer"&gt;Observability and Cost Attribution: Why One Pipeline Isn't Enough&lt;/a&gt; — what happens to operational complexity when you have dozens of services.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/scale-up-scale-out-every-language-wins-somewhere/" rel="noopener noreferrer"&gt;Scale-Up vs Scale-Out: Why Every Language Wins Somewhere&lt;/a&gt; — the architectural decision that drives whether you need Kubernetes at all.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docker</category>
      <category>kubernetes</category>
      <category>containers</category>
      <category>devops</category>
    </item>
    <item>
      <title>Observability and Cost Attribution: Why One Pipeline Isn't Enough</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 07:13:17 +0000</pubDate>
      <link>https://dev.to/harrisonsec/observability-and-cost-attribution-why-one-pipeline-isnt-enough-1283</link>
      <guid>https://dev.to/harrisonsec/observability-and-cost-attribution-why-one-pipeline-isnt-enough-1283</guid>
      <description>&lt;p&gt;A team I worked with tried to build their billing system on top of their tracing pipeline. The idea was clean: every operation already generates a span; spans already have duration and attributes; adding &lt;code&gt;user_id&lt;/code&gt; and &lt;code&gt;billable_units&lt;/code&gt; to each span lets finance query the trace store to compute invoices. One pipeline, less infrastructure. Beautiful.&lt;/p&gt;

&lt;p&gt;Six weeks before the first billing cycle, the wheels came off. The tracing system was sampling at 10% because full-capture was too expensive. The sampler was head-based, meaning whether a trace got kept was decided at request entry, long before the code knew whether the request was billable. Some users got charged for 10% of their actual usage; others got free service. Nobody's invoice agreed with the other team's report.&lt;/p&gt;

&lt;p&gt;The workaround — "don't sample billable traces" — sounded reasonable, broke the tracing pipeline's cost model immediately, and created a dozen new edge cases around which requests counted as "billable." Within a month the team was reluctantly building a second pipeline for billing. They still had the first one for traces. Now they had two pipelines that disagreed with each other.&lt;/p&gt;

&lt;p&gt;The postmortem landed on a single sentence: &lt;strong&gt;observability and cost attribution aren't the same problem, and pretending they are is expensive twice.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Tracing and metrics optimize for signal-to-noise — you want the interesting outliers, sampling is OK, dropping data is tolerable. Billing optimizes for completeness and auditability — every event must be captured and durably recorded, end of story. The two pipelines have opposite trade-offs on sampling, retention, schema evolution, and cost. Building them as one pipeline forces one of the two to lose. Build them as two, share primitives where possible, let each specialize where it must.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why They Look Alike
&lt;/h2&gt;

&lt;p&gt;Observability pipelines and billing pipelines do look eerily similar from a distance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Both capture events from production systems.&lt;/li&gt;
&lt;li&gt;Both attach metadata to those events.&lt;/li&gt;
&lt;li&gt;Both aggregate events over time windows.&lt;/li&gt;
&lt;li&gt;Both export to a query layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's tempting — especially to engineers who like clean architecture — to say &lt;em&gt;these are the same problem&lt;/em&gt; and build one system. The similarity is surface. The constraints are opposite.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;Observability&lt;/th&gt;
&lt;th&gt;Billing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Loss tolerance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (sampling is fine)&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency tolerance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Seconds to minutes&lt;/td&gt;
&lt;td&gt;Minutes to hours is fine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retention&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Days to weeks&lt;/td&gt;
&lt;td&gt;Years&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema evolution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast, frequent&lt;/td&gt;
&lt;td&gt;Slow, with audit trail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cardinality profile&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low cardinality on hot dims&lt;/td&gt;
&lt;td&gt;Arbitrary (per user, per resource)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consumers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SRE, engineering, on-call&lt;/td&gt;
&lt;td&gt;Finance, legal, customer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Failure mode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Blind spot in a dashboard&lt;/td&gt;
&lt;td&gt;Wrong invoice, legal exposure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The one that really matters: &lt;strong&gt;loss tolerance&lt;/strong&gt;. Everything else follows from it.&lt;/p&gt;

&lt;p&gt;A tracing pipeline that drops 10% of spans is fine. You still see the outliers. You still find the slow paths. The system does its job.&lt;/p&gt;

&lt;p&gt;A billing pipeline that drops 10% of events is a disaster. Some users underpay. Some users overpay. Finance reconciliation fails. You end up manually auditing transactions for weeks.&lt;/p&gt;

&lt;p&gt;The moment one pipeline has to satisfy zero-loss and the other can tolerate 90% sampling, you have two different systems whether you wanted one or two.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dual-Path Architecture
&lt;/h2&gt;

&lt;p&gt;The design I keep reaching back to is straightforward: &lt;strong&gt;two pipelines, shared ingest, separate durability and query paths&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBBcHBbQXBwbGljYXRpb24gY29kZV0gLS0-IHxPVExQIHNwYW5zfCBUcmFjZUNvbAogICAgQXBwIC0tPiB8c3RydWN0dXJlZCB1c2FnZSBldmVudHM8YnIvPnVuc2FtcGxlZHwgVXNhZ2VRCgogICAgc3ViZ3JhcGggVHJhY2VQYXRoWyJUcmFjaW5nIHBhdGgg4oCUIGxvc3MtdG9sZXJhbnQsIGZhc3QiXQogICAgICAgIFRyYWNlQ29sWyJUcmFjaW5nIGNvbGxlY3RvciJdIC0tPiBTYW1wbGVyWyJTYW1wbGVyPGJyLz5oZWFkIG9yIHRhaWwgwrcgfjEwJSJdCiAgICAgICAgU2FtcGxlciAtLT4gSG90U3RvcmVbIkhvdCB0cmFjZSBzdG9yZTxici8-VGVtcG8gLyBKYWVnZXI8YnIvPmRheXMgcmV0ZW50aW9uIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIEJpbGxpbmdQYXRoWyJCaWxsaW5nIHBhdGgg4oCUIHplcm8tbG9zcywgYXVkaXRhYmxlIl0KICAgICAgICBVc2FnZVFbIkR1cmFibGUgcXVldWU8YnIvPkthZmthIC8gTkFUUyBKZXRTdHJlYW08YnIvPldBTC1kdXJhYmxlIl0gLS0-IFdhcmVob3VzZVsiQ29sdW1uYXIgd2FyZWhvdXNlPGJyLz5CaWdRdWVyeSAvIFNub3dmbGFrZSAvIENsaWNrSG91c2U8YnIvPnllYXJzIHJldGVudGlvbiJdCiAgICBlbmQKCiAgICBjbGFzc0RlZiB0cmFjZSBmaWxsOiNmZWY1ZTcsc3Ryb2tlOiNiNzc5MWYKICAgIGNsYXNzRGVmIGJpbGwgZmlsbDojZjBmZmY0LHN0cm9rZTojMmY4NTVhLHN0cm9rZS13aWR0aDoycHgKICAgIGNsYXNzIFRyYWNlUGF0aCB0cmFjZQogICAgY2xhc3MgQmlsbGluZ1BhdGggYmlsbA%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBBcHBbQXBwbGljYXRpb24gY29kZV0gLS0-IHxPVExQIHNwYW5zfCBUcmFjZUNvbAogICAgQXBwIC0tPiB8c3RydWN0dXJlZCB1c2FnZSBldmVudHM8YnIvPnVuc2FtcGxlZHwgVXNhZ2VRCgogICAgc3ViZ3JhcGggVHJhY2VQYXRoWyJUcmFjaW5nIHBhdGgg4oCUIGxvc3MtdG9sZXJhbnQsIGZhc3QiXQogICAgICAgIFRyYWNlQ29sWyJUcmFjaW5nIGNvbGxlY3RvciJdIC0tPiBTYW1wbGVyWyJTYW1wbGVyPGJyLz5oZWFkIG9yIHRhaWwgwrcgfjEwJSJdCiAgICAgICAgU2FtcGxlciAtLT4gSG90U3RvcmVbIkhvdCB0cmFjZSBzdG9yZTxici8-VGVtcG8gLyBKYWVnZXI8YnIvPmRheXMgcmV0ZW50aW9uIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIEJpbGxpbmdQYXRoWyJCaWxsaW5nIHBhdGgg4oCUIHplcm8tbG9zcywgYXVkaXRhYmxlIl0KICAgICAgICBVc2FnZVFbIkR1cmFibGUgcXVldWU8YnIvPkthZmthIC8gTkFUUyBKZXRTdHJlYW08YnIvPldBTC1kdXJhYmxlIl0gLS0-IFdhcmVob3VzZVsiQ29sdW1uYXIgd2FyZWhvdXNlPGJyLz5CaWdRdWVyeSAvIFNub3dmbGFrZSAvIENsaWNrSG91c2U8YnIvPnllYXJzIHJldGVudGlvbiJdCiAgICBlbmQKCiAgICBjbGFzc0RlZiB0cmFjZSBmaWxsOiNmZWY1ZTcsc3Ryb2tlOiNiNzc5MWYKICAgIGNsYXNzRGVmIGJpbGwgZmlsbDojZjBmZmY0LHN0cm9rZTojMmY4NTVhLHN0cm9rZS13aWR0aDoycHgKICAgIGNsYXNzIFRyYWNlUGF0aCB0cmFjZQogICAgY2xhc3MgQmlsbGluZ1BhdGggYmlsbA%3D%3D" alt="App[Application code] --&amp;gt; |OTLP spans| TraceCol" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two emission paths from the application. Two pipelines behind them. Each tuned for its job.&lt;/p&gt;

&lt;h3&gt;
  
  
  The tracing path
&lt;/h3&gt;

&lt;p&gt;Stays conventional. OpenTelemetry SDK emits spans. Collector applies head-based or tail-based sampling. Hot store (Tempo, Jaeger, Grafana Cloud) gets 10-20% of the volume. Retention a few days to a few weeks. Query layer is for engineers debugging incidents.&lt;/p&gt;

&lt;p&gt;What I optimize for here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost per span&lt;/strong&gt; — you're keeping billions; every byte matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query latency&lt;/strong&gt; — on-call wants answers in seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-instrumentation coverage&lt;/strong&gt; — the fewer things you have to manually instrument, the better.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I don't care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full capture. Sampling is fine.&lt;/li&gt;
&lt;li&gt;Long retention. You're debugging last Tuesday, not last fiscal year.&lt;/li&gt;
&lt;li&gt;Per-user accuracy. If a single user's trace got dropped, nobody cares.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The usage-event path
&lt;/h3&gt;

&lt;p&gt;The dedicated billing pipeline. Every billable operation emits a &lt;strong&gt;usage event&lt;/strong&gt; — a small, structured record with everything finance needs and nothing it doesn't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ue_01HFNGR..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"occurred_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-14T18:22:30.145Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"account_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acc_12345"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resource_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"res_6789"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"operation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"api.request"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dimensions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"standard"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"units"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"requests"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cpu_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;147&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"egress_bytes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8342&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"idempotency_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"req_abc_20260214182230"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rules on this path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unsampled.&lt;/strong&gt; Every billable operation emits exactly one event. No head sampling. No tail sampling. No "approximate."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Durable writes.&lt;/strong&gt; Emitter has a local write-ahead log or durable queue. If the downstream is down, events buffer locally until delivery. No dropped events under partial failure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency keys.&lt;/strong&gt; Every event has a unique ID (or composite key) so downstream dedup is trivial. This lets you retry safely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema versioned and immutable.&lt;/strong&gt; Once an event shape is shipped, it doesn't mutate. New fields add a new version. Old versions keep working until you intentionally deprecate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long retention.&lt;/strong&gt; Years, usually. Auditors ask for 2023's data in 2027.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The downstream infrastructure matches: &lt;strong&gt;Kafka or NATS JetStream with high replication factor&lt;/strong&gt; for ingest, &lt;strong&gt;columnar warehouse&lt;/strong&gt; (BigQuery, Snowflake, ClickHouse) for aggregation and query, &lt;strong&gt;separate auth and access control&lt;/strong&gt; from engineering-facing tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the two paths share
&lt;/h3&gt;

&lt;p&gt;Not nothing. They share:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The trace/request ID.&lt;/strong&gt; Usage events include the trace ID of the request that generated them. This is the &lt;em&gt;one&lt;/em&gt; cross-pipeline link that matters — when finance escalates "this user says they were charged for X requests but they swear they only made Y," you want to be able to find the traces of those Y requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenTelemetry as the emission library.&lt;/strong&gt; OTel can emit both spans and custom events. Using it for both keeps the instrumentation codepaths uniform. But the pipelines behind the emitter are different.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The application's definition of an "operation."&lt;/strong&gt; Both pipelines have opinions about what counts as one operation. Keep that definition single-source.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Head-Sampling Kills Billing
&lt;/h2&gt;

&lt;p&gt;Worth dwelling on the specific thing that breaks when you try to unify.&lt;/p&gt;

&lt;p&gt;Head-based sampling decides whether to record a trace at entry, based on trace ID. It's O(1), stateless, and fair across traffic shapes — the standard default.&lt;/p&gt;

&lt;p&gt;The failure: &lt;strong&gt;at entry time, the system has no idea whether this request will be billable.&lt;/strong&gt; The sampler doesn't know if the user is on a paid plan, if the request will succeed, if it will hit a billable feature. It just picks randomly.&lt;/p&gt;

&lt;p&gt;Tail-based sampling fixes part of this — you decide after the fact, based on span attributes. Now you can keep all errors, all slow requests, all requests from paid users. Better, but still subject to buffering limits. Heavy tail-based samplers sit in front of your trace ingest pipeline and drop spans when buffers fill, which still gives you lossy billing during traffic bursts.&lt;/p&gt;

&lt;p&gt;The only sampler that's correct for billing is "capture everything." And "capture everything" is what the tracing pipeline tries to avoid, because that's what makes it expensive.&lt;/p&gt;

&lt;p&gt;You can do "capture everything for billable operations, sample everything else" in one pipeline. It works. It also ends up being the most complex sampler you've ever written, with an exception branch that duplicates the decision logic from your actual billing code. The dedicated usage-event path is simpler.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cardinality and the Per-User Problem
&lt;/h2&gt;

&lt;p&gt;A related anti-pattern: attaching user ID as a Prometheus label.&lt;/p&gt;

&lt;p&gt;Prometheus (and most metrics systems) store one time series per label combination. Add a &lt;code&gt;user_id&lt;/code&gt; label to a metric that ten thousand users hit, and you just created ten thousand time series. Add a &lt;code&gt;request_type&lt;/code&gt; label alongside, and that's ten thousand × request-type-count. Cardinality explodes. Your metrics storage bill goes with it.&lt;/p&gt;

&lt;p&gt;The instinct is fine — "I want to track per-user throughput" — the mechanism is wrong. Metrics with high-cardinality labels are the square peg. Usage events are the round hole. Emit a usage event with &lt;code&gt;account_id&lt;/code&gt; as a dimension, aggregate per-user in the warehouse at query time.&lt;/p&gt;

&lt;p&gt;Rule I use: &lt;strong&gt;metrics for engineering-facing dashboards, events for business-facing attribution&lt;/strong&gt;. If the label cardinality could exceed ~1,000 distinct values, it belongs in an event, not a label.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Boring Operational Details
&lt;/h2&gt;

&lt;p&gt;Where the two pipelines actually differ in day-to-day ops:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retention&lt;/strong&gt;. Tracing a few weeks, maybe. Billing store, years. Warehouse partitioning by date and account_id makes multi-year queries practical. Archive older partitions to object storage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access control&lt;/strong&gt;. Traces: engineers. Billing events: accounting + support + an audit-only read path for legal. Not the same principals, not the same ACL model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema governance&lt;/strong&gt;. Traces: OTel semantic conventions, loose. Billing events: your own schema with a proto or Avro definition, version bumps tracked in a migration log, additive only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reconciliation&lt;/strong&gt;. Billing needs to agree with itself. Daily reconciliation job that asserts "yesterday's event count per user equals the sum of the per-hour counts" catches silent drops early. No equivalent makes sense for tracing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replay&lt;/strong&gt;. When a billing bug is discovered, you need to replay historical events through a fixed pipeline. Kafka's offset model makes this natural; NATS JetStream has it too. The tracing pipeline rarely needs replay — if the last two weeks of traces have a bug, you shrug and fix forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  When You Can Get Away With One
&lt;/h2&gt;

&lt;p&gt;Small workloads with no audit requirement, usage-based pricing below ~$1/user, and a team of three — one pipeline is fine. Add user attributes to spans, store them all, build a nightly aggregation job, call it billing. It works.&lt;/p&gt;

&lt;p&gt;The threshold where it stops working is somewhere around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Revenue per customer exceeds the cost of a mistake.&lt;/strong&gt; At $10k/month per customer, a dropped event is a $10k issue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The first auditor asks "show me exactly what this customer used in March 2024."&lt;/strong&gt; Unsampled, durable, retrievable, signed — that's the table stakes for audit-grade billing, and sampled traces can't meet any of those.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engineering starts wanting cheaper traces.&lt;/strong&gt; When the tracing pipeline outgrows your budget and someone proposes "let's sample more aggressively," you're about to break billing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When any of those lights up, separate the pipelines. The cheapest time to separate is before you've built tools on top of the unified one.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Invest in Splitting the Pipelines
&lt;/h2&gt;

&lt;p&gt;Observability and cost attribution are adjacent problems that optimize for opposite things. A tracing pipeline that compromises on completeness becomes a bad billing pipeline. A billing pipeline that compromises on cardinality and retention becomes a bad tracing pipeline. Building one system that satisfies both usually produces two systems that satisfy neither.&lt;/p&gt;

&lt;p&gt;The dual-path design isn't more complex. It's just &lt;em&gt;honest&lt;/em&gt; about the constraints. Same emission library, same operation definition, two paths behind the emitter, each tuned for its job.&lt;/p&gt;

&lt;p&gt;If you're about to launch usage-based pricing and you're planning to compute invoices from your trace store, rethink it now. The sooner you split, the cheaper the split.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/nats-kafka-mqtt-same-category-different-jobs/" rel="noopener noreferrer"&gt;NATS vs Kafka vs MQTT: Same Category, Very Different Jobs&lt;/a&gt; — why the durability choice on the billing path matters so much.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/rpc-vs-nats-who-owns-completion/" rel="noopener noreferrer"&gt;RPC vs NATS: It's Not About Sync vs Async — It's About Who Owns Completion&lt;/a&gt; — completion ownership applies to the emit path, too.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>observability</category>
      <category>billing</category>
      <category>costattribution</category>
      <category>opentelemetry</category>
    </item>
  </channel>
</rss>
