<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Harrison Guo</title>
    <description>The latest articles on DEV Community by Harrison Guo (@harrisonsec).</description>
    <link>https://dev.to/harrisonsec</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3809272%2F593698c5-7201-4bb0-898e-055cdbc0a2d2.png</url>
      <title>DEV Community: Harrison Guo</title>
      <link>https://dev.to/harrisonsec</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/harrisonsec"/>
    <language>en</language>
    <item>
      <title>Agent Retrieval Above the Crossover: A First-Principles Read of CodeGraph</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 08 Jun 2026 19:49:41 +0000</pubDate>
      <link>https://dev.to/harrisonsec/agent-retrieval-above-the-crossover-a-first-principles-read-of-codegraph-1gd7</link>
      <guid>https://dev.to/harrisonsec/agent-retrieval-above-the-crossover-a-first-principles-read-of-codegraph-1gd7</guid>
      <description>&lt;p&gt;The prior post in this series, &lt;em&gt;&lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;Agent Retrieval Is a Cost Curve Problem&lt;/a&gt;&lt;/em&gt;, argued that a viable LLM-symbol-graph would need to satisfy six specific conditions — and that no existing tool had hit all six. The post went live on 2026-05-25; seven days earlier, &lt;a href="https://github.com/colbymchenry/codegraph" rel="noopener noreferrer"&gt;CodeGraph&lt;/a&gt; had hit GitHub trending with exactly those six properties satisfied.&lt;/p&gt;

&lt;p&gt;That's the easy version of the update: framework predicted it, someone shipped it, here's the existence proof. The companion piece (&lt;em&gt;&lt;a href="https://harrisonsec.com/blog/i-tested-codegraph-on-hono-benchmark/" rel="noopener noreferrer"&gt;I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't.&lt;/a&gt;&lt;/em&gt;) handles the empirical half — 40 verified-connected runs, a decision matrix, the install-or-not call. Short version of that post: the tool-call savings reproduce on an independent repo (−55%), the &lt;strong&gt;cost&lt;/strong&gt; savings from the vendor benchmark don't (+7% at Hono's size). Fewer steps, not fewer dollars, until your repo is big enough.&lt;/p&gt;

&lt;p&gt;This post is the harder version of the update.&lt;/p&gt;

&lt;p&gt;The interesting question isn't whether CodeGraph works. The interesting question is &lt;strong&gt;why are its specific architectural choices right, and where does the abstraction inevitably leak?&lt;/strong&gt; Answering it gives you the lens for evaluating the next CodeGraph-class tool that ships — and there will be many — without redoing the benchmark each time.&lt;/p&gt;

&lt;p&gt;To answer it concretely rather than abstractly, I read CodeGraph against its own artifact: the SQLite database it writes to &lt;code&gt;.codegraph/codegraph.db&lt;/code&gt;. Every structural claim below is checked against the index it actually built for Hono (CodeGraph v0.9.7: 362 files, 4,128 nodes, 8,225 edges, a 7.4 MB database). The schema turns out to be the clearest statement of the architecture the tool's README never makes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — CodeGraph's architecture is right for three reasons that aren't obvious from the feature list, and all three are visible in its SQLite schema. (1) &lt;strong&gt;The AST extraction boundary&lt;/strong&gt;: tree-sitter takes what &lt;em&gt;syntax&lt;/em&gt; tells you (4,128 nodes across 13 kinds, 8,225 edges across 7 kinds) and leaves the rest to the LLM. The boundary is literal — references syntax can't resolve go into an &lt;code&gt;unresolved_refs&lt;/code&gt; table instead of becoming fake edges. (2) &lt;strong&gt;SQLite + FTS5, not a vector DB&lt;/strong&gt;: the index is plain relational tables plus a full-text table over symbol names. Zero embedding columns. The queries are exact lookups that B-tree indexes answer in log time; vector search would be solving a harder problem the workload never asks. This is the prior post's cost curve, recursed onto the index tool itself. (3) &lt;strong&gt;The abstraction leaks where syntax diverges from runtime semantics&lt;/strong&gt; — macros, metaprogramming, codegen, JIT binding. CodeGraph tags its few guessed edges with a &lt;code&gt;heuristic&lt;/code&gt; provenance flag (7 of 8,225 on Hono), which is honest; but what tree-sitter can't see at all gets no edge and no flag. Knowing that boundary is what separates a tool you trust from one you cargo-cult.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why this is a first-principles question, not a tool review
&lt;/h2&gt;

&lt;p&gt;Most coverage of CodeGraph reads like &lt;em&gt;"19k stars in a week, here's the install script."&lt;/em&gt; That's news; it isn't analysis. The same coverage will get written for every CodeGraph-class tool that ships in the next 18 months, because the pattern — tree-sitter + local index + MCP server + an instruction snippet that routes the agent to it — is now demonstrated and the ingredients are well known.&lt;/p&gt;

&lt;p&gt;The durable question isn't &lt;em&gt;"is CodeGraph good?"&lt;/em&gt; It's &lt;em&gt;"what makes this class of tool architecturally correct, and how do I evaluate the next one?"&lt;/em&gt; That's what a first-principles read produces. The benchmark in the companion post is one data point; this post is the lens for reading all future data points in the same space.&lt;/p&gt;

&lt;p&gt;If you're deciding on CodeGraph specifically, read the companion. If you're thinking about LLM retrieval as a discipline — or about to bet on, or build, a similar tool — read this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recap: the six conditions, in 30 seconds
&lt;/h2&gt;

&lt;p&gt;The prior post argued any viable LLM-symbol-graph needed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No-compile parsing&lt;/strong&gt; — cold start in seconds, not minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language portability&lt;/strong&gt; — one binary for many languages, not one server per stack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM-shaped API&lt;/strong&gt; — flat, recordy output the model can digest, not nested LSP hierarchies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broad enough coverage&lt;/strong&gt; — code-as-structure plus a text-search fallback for everything else&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live update without reindex&lt;/strong&gt; — file-watcher-driven, no manual rebuild&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-config install&lt;/strong&gt; — single binary, configures the agent automatically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;CodeGraph hits all six (the field-by-field mapping is near the end of this post). Taking the mapping as established, the interesting move is to ask: &lt;strong&gt;of the design choices CodeGraph made to hit those six, which were forced and which could have gone the other way?&lt;/strong&gt; The forced ones are good engineering. The ones that &lt;em&gt;weren't&lt;/em&gt; forced — where CodeGraph picked something specific over a live alternative — are where the architecture is making a claim, and where the first-principles content lives.&lt;/p&gt;

&lt;p&gt;Three of those choices repay a deep read. The other three (file-watcher update, single-binary distribution, instruction-snippet routing) are well-understood in their own fields — OS notifications, package distribution, prompt engineering — and amount to "do the obvious thing well." The three that don't are the three this post takes apart, each against the actual index.&lt;/p&gt;




&lt;h2&gt;
  
  
  Section 1 — The AST extraction boundary: an information-theoretic case
&lt;/h2&gt;

&lt;p&gt;CodeGraph parses source with tree-sitter and extracts a specific subset of the syntax into its graph. You don't have to take the README's word for what that subset is — it's enumerable straight out of the &lt;code&gt;nodes&lt;/code&gt; and &lt;code&gt;edges&lt;/code&gt; tables. On Hono, the 4,128 nodes break down like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Node kind&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Node kind&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;import&lt;/td&gt;
&lt;td&gt;1,033&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;method&lt;/td&gt;
&lt;td&gt;240&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;route&lt;/td&gt;
&lt;td&gt;873&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;interface&lt;/td&gt;
&lt;td&gt;187&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;function&lt;/td&gt;
&lt;td&gt;569&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;property&lt;/td&gt;
&lt;td&gt;169&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;file&lt;/td&gt;
&lt;td&gt;362&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;class&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;type_alias&lt;/td&gt;
&lt;td&gt;358&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;enum_member&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;constant&lt;/td&gt;
&lt;td&gt;247&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;variable / enum&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And the 8,225 edges, which are the actually interesting part:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Edge kind&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;What it encodes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;contains&lt;/td&gt;
&lt;td&gt;2,874&lt;/td&gt;
&lt;td&gt;structural nesting (file → class → method)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;calls&lt;/td&gt;
&lt;td&gt;2,230&lt;/td&gt;
&lt;td&gt;the call graph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;references&lt;/td&gt;
&lt;td&gt;1,955&lt;/td&gt;
&lt;td&gt;symbol used here, defined there&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;imports&lt;/td&gt;
&lt;td&gt;1,033&lt;/td&gt;
&lt;td&gt;module dependency edges&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;instantiates&lt;/td&gt;
&lt;td&gt;124&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;new X()&lt;/code&gt; sites&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;extends&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;class/interface inheritance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;implements&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;interface implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now look at what is &lt;strong&gt;not&lt;/strong&gt; there. No "type" nodes. No generic-instantiation edges. No data-flow edges. No "this dynamic dispatch resolves to that concrete method" edges. CodeGraph extracts &lt;code&gt;calls&lt;/code&gt;, &lt;code&gt;references&lt;/code&gt;, &lt;code&gt;extends&lt;/code&gt;, &lt;code&gt;implements&lt;/code&gt; — relationships that are &lt;em&gt;locally apparent in the syntax&lt;/em&gt; — and stops. The first-order reading of this is "because tree-sitter doesn't resolve types." True, but circular. The deeper reading is &lt;strong&gt;why this division of labor is correct for an LLM consumer.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The information-theoretic case
&lt;/h3&gt;

&lt;p&gt;A type-checker (or full LSP) does work the LLM cannot easily redo: resolving &lt;code&gt;obj.method()&lt;/code&gt; to the actual method given the static type of &lt;code&gt;obj&lt;/code&gt;, propagating types through generics, walking an inheritance chain to the method actually invoked. That requires the full compilation context — every transitive import, every type definition, every generic instantiation. The cost is high (a build environment, slow cold start, breaks when the build breaks) and the benefit is narrow: precise semantic resolution that's genuinely hard to reconstruct from local context.&lt;/p&gt;

&lt;p&gt;A syntactic extractor does &lt;em&gt;different&lt;/em&gt; work. It makes the structure of the source queryable, but only the structure that's locally apparent: "function &lt;code&gt;dispatch&lt;/code&gt; defined at &lt;code&gt;hono-base.ts:406&lt;/code&gt;, calls &lt;code&gt;match&lt;/code&gt; here, imported from &lt;code&gt;router&lt;/code&gt;." No types, no generics, no runtime binding — but no compilation either.&lt;/p&gt;

&lt;p&gt;The information-theoretic question is: &lt;strong&gt;given an LLM that's good at semantic reasoning but bad at structural enumeration, what's the right split between what the index provides and what the LLM provides?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CodeGraph's answer: hand the LLM the &lt;em&gt;structural skeleton&lt;/em&gt; — what calls what, what's defined where, what imports what — because enumerating that across thousands of files is exactly the part the LLM is bad at and would burn dozens of tool calls trying to do by hand. Leave the &lt;em&gt;semantic resolution&lt;/em&gt; — what does this call actually invoke at runtime under dynamic dispatch? — to the LLM, because the LLM is reasonable at that once the relevant code is in its context, and baking a type resolver into the index would multiply the build cost for a recovery the LLM mostly doesn't need.&lt;/p&gt;

&lt;p&gt;The clean way to see this boundary is the &lt;code&gt;contains&lt;/code&gt; + &lt;code&gt;calls&lt;/code&gt; + &lt;code&gt;references&lt;/code&gt; edges (7,059 of the 8,225) versus the things that &lt;em&gt;aren't&lt;/em&gt; edges at all. When the companion benchmark's Q1 asked how a &lt;code&gt;GET /users/:id&lt;/code&gt; request reaches its handler, what CodeGraph gave Claude Code was the call chain — &lt;code&gt;fetch&lt;/code&gt; → &lt;code&gt;dispatch&lt;/code&gt; → &lt;code&gt;match&lt;/code&gt; — as graph edges. What it did &lt;em&gt;not&lt;/em&gt; give, and didn't try to, was which concrete &lt;code&gt;match&lt;/code&gt; implementation runs given Hono's &lt;code&gt;SmartRouter&lt;/code&gt; picking &lt;code&gt;RegExpRouter&lt;/code&gt; at runtime. The graph located the players; the LLM read the three files and resolved the dispatch. That's the split working as designed: enumeration from the index, resolution from the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  The boundary is a literal table
&lt;/h3&gt;

&lt;p&gt;Here's the detail that turns this from an argument into an observation. When tree-sitter sees a reference it cannot statically resolve to a definition, CodeGraph does not invent an edge. It writes a row to a separate &lt;code&gt;unresolved_refs&lt;/code&gt; table — name, location, the node it came from, no target. The schema has a first-class place for &lt;em&gt;"I saw a use here, I could not prove what it binds to."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;On Hono, &lt;code&gt;unresolved_refs&lt;/code&gt; has zero rows — and, as it turns out, so did every other repo I indexed to check it (Section 3 has that result, and it's not the one I expected). The empty table isn't the interesting part; the table &lt;em&gt;existing&lt;/em&gt; is the architecture stating its own boundary. A tool that faked those edges — guessed a target to make the graph look complete — would be lying to the LLM in exactly the way that produces confident wrong answers. CodeGraph's choice to record the unresolved reference &lt;em&gt;as unresolved&lt;/em&gt; is the same discipline a good cache has when it marks an entry stale instead of serving it: the honest move is to represent "don't know," not to paper over it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this matters beyond CodeGraph
&lt;/h3&gt;

&lt;p&gt;This boundary — &lt;em&gt;syntactic graph for the index, semantic reasoning for the LLM&lt;/em&gt; — is the line the next generation of LLM-coding tools will either hold or violate. The violations are predictable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Too far toward semantics in the index&lt;/strong&gt;: a tool that tries to be a full LSP-plus for the LLM. High build cost, slow cold start, fragile on broken builds, marginal benefit because the LLM can do that resolution from local context anyway.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Too far toward raw text in the index&lt;/strong&gt;: a tool that's just "grep with nicer indexing" — fast and broad, but it doesn't hand the LLM the structural skeleton it actually needs. That's the position grep+loop already occupies; an index there adds little.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CodeGraph sits in the middle, and that position is right for current LLM capability. As models get better at semantic resolution the line will move one way; as tool-loop iteration gets cheaper it will move the other. But the &lt;em&gt;principle&lt;/em&gt; — that there's an information-theoretic boundary worth picking, and that picking it requires modeling the LLM's real strengths and weaknesses — is the durable take. The right way to evaluate any new LLM-retrieval tool starts here: what does it choose to extract, what does it leave for the LLM, and is that split calibrated for what an LLM is actually good at?&lt;/p&gt;




&lt;h2&gt;
  
  
  Section 2 — SQLite + FTS5 vs vector DB: the cost curve, recursed
&lt;/h2&gt;

&lt;p&gt;CodeGraph stores its symbol graph in a local SQLite database. Not Chroma. Not Pinecone. Not Weaviate. Not Qdrant. The full table list from Hono's index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nodes              edges              files
unresolved_refs    nodes_fts          schema_versions
project_metadata   (+ FTS5 shadow tables: nodes_fts_data/idx/docsize/config)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;nodes&lt;/code&gt; and &lt;code&gt;edges&lt;/code&gt; are plain relational tables. &lt;code&gt;nodes_fts&lt;/code&gt; is an FTS5 virtual table. Searching the whole schema for an embedding column, a vector type, a float array — anything ANN-shaped — returns nothing. The only &lt;code&gt;BLOB&lt;/code&gt; columns are FTS5's own internal segment storage (&lt;code&gt;nodes_fts_data&lt;/code&gt;), not vectors. &lt;strong&gt;There are no embeddings in CodeGraph.&lt;/strong&gt; That's not an omission; it's the architecture, and it's the same call the prior post made one level down.&lt;/p&gt;

&lt;h3&gt;
  
  
  The cost-curve frame, recursed
&lt;/h3&gt;

&lt;p&gt;The prior post argued vector RAG over a codebase pays a build cost (chunk + embed every file), a maintain cost (re-embed on change, reconcile cross-chunk references), and a low per-query cost (ANN search + rerank) — and that for most repos this loses to grep+loop's &lt;em&gt;(zero build, zero maintain, per-query round-trips)&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Apply that exact frame to CodeGraph's own storage. If CodeGraph used a vector DB for its symbols, it would pay: embed every symbol's signature and body on index; re-embed on every file save (the file-watcher would have to fire embedding calls); ANN search per query. That's the same curve the prior post argued &lt;em&gt;against&lt;/em&gt; — and CodeGraph's &lt;em&gt;workload&lt;/em&gt; doesn't justify it, because the queries it serves are &lt;strong&gt;exact lookups&lt;/strong&gt;, not similarity searches. The schema proves the queries are exact by the indexes it builds for them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"Find symbol &lt;code&gt;getUserById&lt;/code&gt;"&lt;/strong&gt; → &lt;code&gt;idx_nodes_name&lt;/code&gt;, and &lt;code&gt;idx_nodes_lower_name&lt;/code&gt; for case-insensitive matches. A B-tree probe, microseconds. FTS5 (&lt;code&gt;nodes_fts&lt;/code&gt; over &lt;code&gt;name, qualified_name, docstring, signature&lt;/code&gt;) handles the fuzzier "name contains" variants. No similarity math.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Who calls &lt;code&gt;Context.set&lt;/code&gt;?"&lt;/strong&gt; → &lt;code&gt;idx_edges_target_kind&lt;/code&gt; (a reverse-edge index on &lt;code&gt;(target, kind)&lt;/code&gt;). Reverse adjacency lookup, deterministic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"What does &lt;code&gt;dispatch&lt;/code&gt; call?"&lt;/strong&gt; → &lt;code&gt;idx_edges_source_kind&lt;/code&gt; (the forward-edge index). Forward adjacency, deterministic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Trace &lt;code&gt;fetch&lt;/code&gt; → &lt;code&gt;db_query&lt;/code&gt;"&lt;/strong&gt; → repeated forward-edge hops over those same indexed edges. Graph traversal on stored adjacency, no vectors anywhere in the loop.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those forward and reverse edge indexes are the whole ballgame. Callers and callees — the queries a code-intelligence tool exists to answer — are a single indexed adjacency lookup in each direction. Vector search cannot do this &lt;em&gt;better&lt;/em&gt;; it can only do it &lt;em&gt;fuzzier and more expensively&lt;/em&gt;, because "who calls this function" has an exact answer that an approximate-nearest-neighbor index would blur.&lt;/p&gt;

&lt;p&gt;The only queries where vector search genuinely helps are semantic ones with no symbol to anchor on — &lt;em&gt;"show me the code that does authentication."&lt;/em&gt; CodeGraph doesn't serve those. The LLM does, by issuing a sequence of exact structural queries and reasoning across the results. The division is the same one from Section 1: the index answers the exact-lookup questions deterministically; the LLM answers the fuzzy-intent questions by orchestrating exact lookups. Neither needs an embedding.&lt;/p&gt;

&lt;h3&gt;
  
  
  The recursion as a design principle
&lt;/h3&gt;

&lt;p&gt;What's elegant — and worth surfacing for its own sake — is that &lt;strong&gt;CodeGraph's storage choice is consistent with the retrieval philosophy from the prior post, one level up.&lt;/strong&gt; Both arguments are the same sentence: &lt;em&gt;exact-lookup workloads should use exact-lookup tools; approximation overhead is paid only where approximation pays back.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If CodeGraph had reached for Chroma over FTS5, it would have violated its own retrieval philosophy — paying embedding and ANN cost to answer questions that have exact answers. That it didn't, that the designer recognized the symbol-graph workload is exact-lookup-shaped and picked the cheapest exact-lookup storage available, is what makes the architecture coherent across layers rather than just locally clever.&lt;/p&gt;

&lt;p&gt;The next tool in this class will face the same fork, and most will reach for a vector DB by default, because "AI tooling = vector store" is the reflex. CodeGraph's choice is the corrective: ask what your &lt;em&gt;workload&lt;/em&gt; needs, not what the category's fashion suggests. That's the cost-curve frame functioning as a meta-design tool — every time you add a layer to an LLM stack, ask which side of the curve the new layer's workload sits on, and pick storage and algorithm from the answer, not the trend.&lt;/p&gt;




&lt;h2&gt;
  
  
  Section 3 — Where CodeGraph's abstraction leaks
&lt;/h2&gt;

&lt;p&gt;Every index lies a little. The question is &lt;em&gt;where&lt;/em&gt; it lies and &lt;em&gt;whether you can tell when it does.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;CodeGraph's graph is built from syntactic extraction, so &lt;strong&gt;anywhere the runtime semantics diverge from the syntactic structure, the graph is incomplete in a way that's hard to detect from the index alone.&lt;/strong&gt; The leak isn't a bug; it's the abstraction working as designed, at a layer that structurally cannot see certain phenomena. There's a tell for it in the schema, and there's a part the schema can't tell you about — and the difference between those two is the whole point.&lt;/p&gt;

&lt;h3&gt;
  
  
  The honest part: the provenance column
&lt;/h3&gt;

&lt;p&gt;CodeGraph stamps every edge with a &lt;code&gt;provenance&lt;/code&gt; value. On Hono, 8,218 of the 8,225 edges have empty provenance — meaning &lt;em&gt;direct from the syntax tree&lt;/em&gt; — and exactly &lt;strong&gt;7&lt;/strong&gt; carry the value &lt;code&gt;heuristic&lt;/code&gt;. Those seven are edges CodeGraph's framework adapters &lt;em&gt;inferred&lt;/em&gt; from a recognized pattern rather than read off the AST: route registrations, framework binding conventions, the handful of cases where a tool that "supports Hono / Flask / Spring" pattern-matches a known idiom and synthesizes an edge the raw syntax doesn't spell out.&lt;/p&gt;

&lt;p&gt;That &lt;code&gt;heuristic&lt;/code&gt; tag is the architecture being honest. It is, in the vocabulary of the memory post in this series, an &lt;em&gt;arrow&lt;/em&gt;: every edge points back to &lt;em&gt;how it was derived&lt;/em&gt;, and the seven guessed edges are flagged as guesses. A consumer that cared could treat heuristic edges with less trust than syntactic ones. That's good cache hygiene — the index records the confidence of its own entries instead of presenting all of them as equally certain.&lt;/p&gt;

&lt;h3&gt;
  
  
  The part the schema can't tell you about
&lt;/h3&gt;

&lt;p&gt;Here's the catch, and it's the one that matters: &lt;strong&gt;the provenance column only flags edges that exist.&lt;/strong&gt; The dangerous leak isn't a guessed edge that's marked as guessed. It's the edge that &lt;em&gt;should&lt;/em&gt; exist and isn't there at all — because the relationship lives in a layer tree-sitter cannot see, so there's nothing to extract, nothing to tag, and nothing to warn you. The four big zones where this happens:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Macro-heavy code.&lt;/strong&gt; In Rust, &lt;code&gt;vec![1, 2, 3]&lt;/code&gt; expands at compile time into a call sequence the AST never contains; the graph shows a &lt;code&gt;vec!&lt;/code&gt; invocation, not the &lt;code&gt;Vec::new()&lt;/code&gt; + &lt;code&gt;push()&lt;/code&gt; that actually runs. For procedural macros (&lt;code&gt;#[derive(...)]&lt;/code&gt;, attribute macros), the &lt;em&gt;generated implementation&lt;/em&gt; is what executes and CodeGraph can't see into it without running the compiler — which would forfeit the no-compile property that Section 1 showed is the whole point. Same shape in C/C++ preprocessor-heavy code, Lisp/Clojure macros, Elixir compile-time metaprogramming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metaprogramming.&lt;/strong&gt; Python decorators routinely rewrite functions: &lt;code&gt;@dataclass&lt;/code&gt; synthesizes &lt;code&gt;__init__&lt;/code&gt;/&lt;code&gt;__repr__&lt;/code&gt;/&lt;code&gt;__eq__&lt;/code&gt;; &lt;code&gt;@app.route("/users")&lt;/code&gt; registers a handler with a router. Tree-sitter sees the decorator and the function as adjacent syntax, not the synthesis or the registration. CodeGraph's framework adapters catch the &lt;em&gt;common&lt;/em&gt; cases — and that's literally what the 7 heuristic edges on Hono are — but arbitrary user-defined decorators that mutate behavior are invisible. Ruby &lt;code&gt;method_missing&lt;/code&gt;, Python &lt;code&gt;__getattr__&lt;/code&gt;, Java reflection: same story. The graph confidently returns "no callers" for a method invoked entirely through reflection, and the LLM, trusting structured output, may hand you a confidently wrong blast radius.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generated code.&lt;/strong&gt; Protobuf, GraphQL codegen, OpenAPI clients, ORM model generation (Prisma, SQLAlchemy declarative), JSX/Svelte compilation — the code the runtime executes isn't the code in source control. It lives in &lt;code&gt;build/&lt;/code&gt;, &lt;code&gt;dist/&lt;/code&gt;, &lt;code&gt;.cache/&lt;/code&gt;, places &lt;code&gt;.gitignore&lt;/code&gt; excludes. CodeGraph indexes what's checked in; the generated layer is outside the boundary. "Who implements &lt;code&gt;UserService&lt;/code&gt;?" returns the hand-written interface, not the generated stub that implements it on the wire. Any source-only index has this; it's worth naming because it interacts badly with the user's instinct that &lt;em&gt;an "AST graph" must be complete.&lt;/em&gt; It's complete over the source it indexed — and the generated layer was never in that source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JIT and runtime-registered bindings.&lt;/strong&gt; DI containers (Spring, Guice, Dagger, ASP.NET service collection), &lt;code&gt;FastAPI Depends&lt;/code&gt;, plugin systems with runtime registration, and — the one the companion benchmark hit directly — &lt;strong&gt;middleware chains composed at app startup.&lt;/strong&gt; Hono's &lt;code&gt;app.use(...)&lt;/code&gt; builds the middleware array at runtime; tree-sitter sees the &lt;code&gt;use&lt;/code&gt; call sites and the handler as unconnected syntax. When the benchmark's Q2 asked Claude Code to trace the middleware call stack, what &lt;code&gt;codegraph_trace&lt;/code&gt; could return was the &lt;em&gt;syntactic&lt;/em&gt; call chain through &lt;code&gt;compose()&lt;/code&gt; — accurate as far as it goes, and genuinely fewer steps than baseline grep — but the actual runtime ordering of middlewares is assembled by &lt;code&gt;app.use&lt;/code&gt; calls scattered across the app, which the graph doesn't compose. The trace looked authoritative and was structurally real; it just wasn't the runtime composition, and only someone who knew the leak zone would know to check.&lt;/p&gt;

&lt;h3&gt;
  
  
  The empirical check, and the null result that sharpens it
&lt;/h3&gt;

&lt;p&gt;I expected &lt;code&gt;unresolved_refs&lt;/code&gt; to be where this shows up — index a macro-heavy repo, watch the table fill. So I indexed three to test it: Hono (TypeScript), &lt;a href="https://github.com/pallets/click" rel="noopener noreferrer"&gt;click&lt;/a&gt; (Python, decorator-heavy), and &lt;a href="https://github.com/ron-rs/ron" rel="noopener noreferrer"&gt;ron&lt;/a&gt; (a Rust crate leaning on &lt;code&gt;derive&lt;/code&gt; macros and serde). &lt;code&gt;unresolved_refs&lt;/code&gt; was &lt;strong&gt;zero on all three&lt;/strong&gt;; heuristic edges were 7, 0, and 0. The null result &lt;em&gt;is&lt;/em&gt; the finding. A &lt;code&gt;#[derive(Serialize)]&lt;/code&gt; impl never appears as an unresolved reference, because nothing in the source ever wrote a reference to it to leave dangling — the impl only exists after macro expansion. &lt;code&gt;codegraph callers serialize&lt;/code&gt; on ron returns its seven real syntactic callers and silently omits whatever the derive generates, with no flag and no empty-table warning, because from the index's point of view nothing is missing. And that is the trap. &lt;strong&gt;An empty &lt;code&gt;unresolved_refs&lt;/code&gt; table reads like a clean bill of health, but on derive-heavy or reflection-heavy code it means the opposite of "everything resolved" — it means the thing that didn't resolve never left a trace to flag.&lt;/strong&gt; The table catches references it can't resolve; it cannot catch code that was never written down to reference. That's the leak that costs you: not the guess that gets flagged, but the absence that looks exactly like completeness. It's the same failure shape as the memory post's &lt;em&gt;"could" stored as "did"&lt;/em&gt; — the dangerous error is always the one that wears the face of a correct answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why mapping the leaks matters
&lt;/h3&gt;

&lt;p&gt;A tool you trust everywhere is a tool you stop checking. &lt;strong&gt;The four zones above are where the LLM, trusting the graph, gives you confidently wrong answers&lt;/strong&gt; — and those are the failures that cost real engineering time, because the answer &lt;em&gt;looks right&lt;/em&gt; and you have no reason to second-guess it.&lt;/p&gt;

&lt;p&gt;The practical rule is small. Inside one of these zones — heavy macros, reflection/DI, codegen-heavy projects, runtime-composed bindings — CodeGraph is still a fine &lt;em&gt;starting point&lt;/em&gt;, but the LLM's answer has to be cross-checked against the runtime, not against the graph. Outside them — most application code in most languages, which is most of what most people query — the graph is enough. The provenance column tells you which &lt;em&gt;present&lt;/em&gt; edges were guessed; nothing tells you which &lt;em&gt;absent&lt;/em&gt; edges were never seen. That asymmetry is the actual trust boundary, and it's the thing to internalize before you wire any syntactic index into an agent's decision loop. Joel Spolsky named this pattern for compilers and frameworks twenty years ago — every abstraction leaks, and you pay for the leak precisely when you've forgotten the abstraction is there. CodeGraph is the latest data point in a very old series.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mapping CodeGraph to the six conditions
&lt;/h2&gt;

&lt;p&gt;Field-by-field, how CodeGraph hits each condition from &lt;em&gt;Agent Retrieval Is a Cost Curve Problem&lt;/em&gt;. Compressed; the prior post defines the conditions, the companion post applies them empirically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. No-compile parsing.&lt;/strong&gt; Tree-sitter parses source into an AST with no build invocation, no dependency resolution, no language environment. On Hono, 362 files indexed to 4,128 nodes and 8,225 edges in &lt;strong&gt;1.7 seconds&lt;/strong&gt;; the published 7-repo benchmark reports first-index on the order of minutes for VS Code-scale (~30k files), all subsequent updates incremental. LSP needs &lt;code&gt;tsc&lt;/code&gt; / &lt;code&gt;cargo check&lt;/code&gt; / &lt;code&gt;mvn&lt;/code&gt;; CodeGraph reads raw text. &lt;strong&gt;Met.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Language portability.&lt;/strong&gt; ~19 languages via tree-sitter, plus framework adapters for route-aware extraction (Hono's 873 &lt;code&gt;route&lt;/code&gt; nodes come from one of them). One binary, no per-language server. &lt;strong&gt;Met.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. LLM-shaped API.&lt;/strong&gt; Here the scaffold version of this post — and a lot of the casual coverage — gets a fact wrong worth correcting precisely. The CLI exposes a dozen commands (&lt;code&gt;query&lt;/code&gt;, &lt;code&gt;callers&lt;/code&gt;, &lt;code&gt;callees&lt;/code&gt;, &lt;code&gt;impact&lt;/code&gt;, &lt;code&gt;affected&lt;/code&gt;, &lt;code&gt;context&lt;/code&gt;, …). But the &lt;strong&gt;MCP server exposes exactly five tools&lt;/strong&gt; to the agent: &lt;code&gt;codegraph_search&lt;/code&gt; (locations only), &lt;code&gt;codegraph_context&lt;/code&gt; (described in its own schema as the &lt;em&gt;PRIMARY tool, call FIRST for any how-does-X-work question&lt;/em&gt;), &lt;code&gt;codegraph_node&lt;/code&gt; (one symbol plus its callers/callees trail), &lt;code&gt;codegraph_explore&lt;/code&gt; (several related symbols in one capped call), and &lt;code&gt;codegraph_trace&lt;/code&gt; (the call path between two symbols). The narrowing is the design: the human CLI gets &lt;code&gt;impact&lt;/code&gt; and &lt;code&gt;affected&lt;/code&gt; as separate verbs; the agent gets a &lt;em&gt;context-first&lt;/em&gt; surface of five flat tools, each returning &lt;code&gt;{symbol, file, line, snippet, related[]}&lt;/code&gt;-shaped records, with the instruction snippet steering it to &lt;code&gt;codegraph_context&lt;/code&gt; before anything else. Ten tools would be worse for an LLM than five; CodeGraph picked five. &lt;strong&gt;Met, deliberately.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Coverage breadth.&lt;/strong&gt; Symbol graph for structure; FTS5 over &lt;code&gt;name, qualified_name, docstring, signature&lt;/code&gt; for text-fallback; Claude Code's native Grep stays enabled for everything outside the index. &lt;strong&gt;Partially met — the correct partial.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Live update without reindex.&lt;/strong&gt; OS file-watcher with a short debounce; a save re-parses the touched file and re-resolves dependents' import edges. &lt;strong&gt;Met.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Zero-config install.&lt;/strong&gt; Single binary, one-line install, auto-detects the agent, writes the MCP config and the instruction snippet, then &lt;code&gt;codegraph init -i&lt;/code&gt; builds the index. Ten minutes from curiosity to working under ~1,000 files. &lt;strong&gt;Met.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Six for six. The architecture the prior post argued was theoretically right but practically missing exists, in production, with a working installer — and, read against its own schema, the choices hold up under inspection rather than just on the landing page.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this says about LLM retrieval as a discipline
&lt;/h2&gt;

&lt;p&gt;Three things, in increasing order of generality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The right LLM-index design is not a copy of human-IDE design.&lt;/strong&gt; Sourcegraph and LSP were built for a human reading one precise answer; an LLM reads many cheap rounds and reasons across them. The architectures should differ, and CodeGraph's choices — tree-sitter not LSP, five flat MCP tools not a nested LSP API, FTS5 not vectors — are evidence of someone designing for the actual consumer instead of porting an existing design. The framework predicts the design space, and the interesting variation between the tools that will fill it is not in the six conditions (those are now the table stakes) but in the &lt;em&gt;ranking&lt;/em&gt; layer — how each one orders the symbols a query surfaces. That's where the next tool will try to win, and where the next benchmark should aim.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The cost-curve frame is recursive.&lt;/strong&gt; It applies to every layer of an LLM stack, including the tools that wrap the LLM. CodeGraph's FTS5-not-Chroma choice is the same shape as the original grep-not-RAG choice. Use it as a meta-design tool: at every layer, ask which side of the curve the workload sits on, and let that pick the storage and the algorithm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The abstraction leaks are the trust boundary — and trust, in the end, has to terminate at the source.&lt;/strong&gt; This is the thread that runs through the whole series. CodeGraph's graph is a &lt;em&gt;derived view&lt;/em&gt; of the source: a cache. Its &lt;code&gt;heuristic&lt;/code&gt; provenance tags and its &lt;code&gt;unresolved_refs&lt;/code&gt; table are the parts where it keeps an arrow back to that source and is honest about what it did and didn't see. But a syntactic graph is still a lossy projection of a running program, and the leak zones are exactly where the projection drops information that only exists at runtime. The discipline that falls out of this is the same one the retrieval post and the memory post arrived at from their own directions: &lt;strong&gt;a derived artifact is trustworthy only where you can check it against the source that produced it.&lt;/strong&gt; CodeGraph is fast and exact in the 80% of code where syntax determines structure, and quietly incomplete in the 20% where it doesn't — and the only way to stay out of the failure modes is to remember the graph is a cache and keep the real code, the actual runtime, as the thing that wins every conflict.&lt;/p&gt;

&lt;p&gt;The bigger move CodeGraph represents — &lt;em&gt;third-party MCP tools filling the retrieval gap the foundation model's main agent doesn't fill&lt;/em&gt; — is the ecosystem direction the feature-flag analysis in the prior post suggested Anthropic is hedging toward. Whether Anthropic eventually builds tree-sitter symbol-graph functionality natively or leaves it to the CodeGraph-class ecosystem is a product call. The technical case for "let MCP fill it" is strong: the design space is still settling, and locking one approach into Claude Code spends option value the ecosystem is currently pricing for free.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing — the mini-series arc
&lt;/h2&gt;

&lt;p&gt;This is the third of a three-part Lab series on Claude Code's retrieval and memory architectures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;&lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;Agent Retrieval Is a Cost Curve Problem&lt;/a&gt;&lt;/em&gt; (2026-05-25) — why grep+loop, not RAG, for most projects&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;&lt;a href="https://harrisonsec.com/blog/agent-memory-is-a-cache-coherence-problem/" rel="noopener noreferrer"&gt;Agent Memory Is a Cache Coherence Problem&lt;/a&gt;&lt;/em&gt; (2026-05-28) — why hand-curated Markdown, not lossy vector recall, for cross-session memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This post&lt;/strong&gt; (2026-06-08) — what lives above the cost-curve crossover: CodeGraph as the architecturally coherent symbol-graph companion the first post argued was missing, read first-principles against its own index for what its choices say about the discipline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read together, the three describe one stance on agent retrieval and memory: choose &lt;strong&gt;lossless and exact&lt;/strong&gt; by default; expose &lt;strong&gt;MCP&lt;/strong&gt; as the integration substrate; let third-party tools fill the gaps you don't want to own; and keep an arrow back to the source everywhere, because every derived view is a cache and the source is the only thing that can't drift from itself. The cost-curve frame is the math, the cache-coherence frame is the failure taxonomy, and the first-principles read of CodeGraph is what the architecture, looked at carefully, says about where LLM retrieval is going.&lt;/p&gt;

&lt;p&gt;If you're building agent retrieval, the three frames are now in your toolkit. The companion empirical post gives you the install-or-not decision; this one gives you the lens for the next ten tools that ship in the same space.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Companion piece 1 (this is the third in a 3-post Lab series): *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;Agent Retrieval Is a Cost Curve Problem: Why Claude Code Doesn't Use RAG&lt;/a&gt;***&lt;br&gt;
&lt;em&gt;Companion piece 2: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/agent-memory-is-a-cache-coherence-problem/" rel="noopener noreferrer"&gt;Agent Memory Is a Cache Coherence Problem&lt;/a&gt;***&lt;br&gt;
&lt;em&gt;Empirical pair on the Operator track: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/i-tested-codegraph-on-hono-benchmark/" rel="noopener noreferrer"&gt;I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't.&lt;/a&gt;***&lt;br&gt;
&lt;em&gt;Background: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/consistency-scenarios-and-approaches-production/" rel="noopener noreferrer"&gt;Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works&lt;/a&gt;***&lt;br&gt;
&lt;em&gt;CodeGraph repo: *&lt;/em&gt;&lt;a href="https://github.com/colbymchenry/codegraph" rel="noopener noreferrer"&gt;https://github.com/colbymchenry/codegraph&lt;/a&gt;***&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>sqlite</category>
      <category>programming</category>
    </item>
    <item>
      <title>I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't.</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 01 Jun 2026 22:39:59 +0000</pubDate>
      <link>https://dev.to/harrisonsec/i-tested-codegraph-on-hono-the-tool-call-savings-reproduce-the-cost-savings-dont-389p</link>
      <guid>https://dev.to/harrisonsec/i-tested-codegraph-on-hono-the-tool-call-savings-reproduce-the-cost-savings-dont-389p</guid>
      <description>&lt;p&gt;Two weeks ago CodeGraph hit GitHub trending — tree-sitter + SQLite/FTS5 + MCP for Claude Code, 19k+ stars in a week. The team published a benchmark on 7 repos showing &lt;strong&gt;35% cheaper, 57% fewer tokens, 46% faster, 71% fewer tool calls&lt;/strong&gt; vs. baseline.&lt;/p&gt;

&lt;p&gt;Those are big numbers. They're also numbers from a benchmark designed by the team that built the tool, on repos they chose. Designer bias is the #1 risk in any retrieval benchmark — when you pick the test repos and write the ground truth, you'll consciously or unconsciously favor your own tool's strengths.&lt;/p&gt;

&lt;p&gt;So I ran an independent test on an 8th repo — &lt;strong&gt;Hono&lt;/strong&gt; (TypeScript, ~280 source files, in neither CodeGraph's published 7-repo suite nor any other published benchmark I could find). 5 architectural questions covering different retrieval shapes, with a deliberate control case (Q5) where the tool should not win. Two conditions (baseline grep+Read+Glob+Explore vs. CodeGraph active), &lt;strong&gt;4 repeats&lt;/strong&gt; per question per condition. 40 runs on Claude Opus 4.8 — and, critically, &lt;strong&gt;every CodeGraph run was verified to have connected, and actual &lt;code&gt;codegraph_*&lt;/code&gt; tool usage was recorded per run&lt;/strong&gt; (more on why that sentence exists below).&lt;/p&gt;

&lt;p&gt;The result splits in a way the single published headline number hides — and the split is the useful part.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — On Hono, CodeGraph delivers a &lt;strong&gt;large, consistent reduction in tool calls (-55%, 14.0 → 6.3 avg) and a smaller latency win (-20%)&lt;/strong&gt; — the published 7-repo direction reproduces here. But &lt;strong&gt;cost is a wash: +6.8%&lt;/strong&gt;, not the published −35%. On narrow-scope questions (route lookup, middleware trace) CodeGraph is actually &lt;strong&gt;20-43% more expensive&lt;/strong&gt;, because each structural lookup loads a big chunk of graph context that costs more in cached tokens than the grep round-trips it replaces. The cost win only appears on broad multi-file navigation (Q3 multi-runtime adapters: &lt;strong&gt;−29% cost, −80% tool calls, −53% latency&lt;/strong&gt;). A second finding: baseline grep+Read has &lt;strong&gt;high variance&lt;/strong&gt; — the agent occasionally spiraled to 47-52 tool calls on the broad questions, while CodeGraph never exceeded 16. &lt;strong&gt;Net at Hono's size: CodeGraph makes the agent take fewer steps and finish faster, but not for fewer dollars.&lt;/strong&gt; Total cost of the 40 valid runs: ~$14 of Opus 4.8 calls. Raw per-run CSV and the 5 verbatim prompts are below.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What "tool calls down, cost flat" actually means
&lt;/h2&gt;

&lt;p&gt;CodeGraph's published 7-repo suite (VS Code, Excalidraw, Django, Tokio, OkHttp, Gin, Alamofire) skews larger and more architecturally complex than Hono. Hono is ~280 TypeScript source files (362 files indexed by CodeGraph, including tests and configs), 16MB on disk — small enough that a thoughtful agent with grep + Read can finish most architectural questions in a handful of tool calls.&lt;/p&gt;

&lt;p&gt;The interesting result is that the &lt;em&gt;axes come apart&lt;/em&gt;. CodeGraph replaces several grep+Read round-trips with one or two structural lookups — so &lt;strong&gt;step count drops hard (-55%)&lt;/strong&gt;. But each &lt;code&gt;codegraph_context&lt;/code&gt; / &lt;code&gt;codegraph_explore&lt;/code&gt; call returns a sizeable chunk of graph context, which then rides along in the conversation cache and gets re-read every turn. At Hono's size, the dollar cost of carrying that cached payload roughly equals the dollar cost of the grep round-trips it replaced — so &lt;strong&gt;dollars stay flat (+7%) even as steps fall by more than half&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's not a contradiction of the cost-curve thesis from &lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;the prior post in this mini-series&lt;/a&gt; — it's a sharper reading of it. Hono sits &lt;strong&gt;above&lt;/strong&gt; the step-count crossover (the index already saves tool calls) but &lt;strong&gt;below&lt;/strong&gt; the dollar crossover (it doesn't yet save money). On a much bigger repo, the grep path churns through far more files and the index pays back on dollars too. Hono just happens to land in the gap between the two crossovers.&lt;/p&gt;

&lt;p&gt;A useful complementary benchmark answers three things the published one doesn't:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cross-validation on a repo not chosen by the tool's team&lt;/strong&gt; — do the published advantages generalize?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Within-repo variance across question types&lt;/strong&gt; — does the win concentrate on certain question shapes? (It does — heavily.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A control case where the tool shouldn't win&lt;/strong&gt; — Q5 (text search) tests whether the agent correctly &lt;em&gt;declines&lt;/em&gt; to use the structural engine when grep is the right tool.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Setup — install CodeGraph, ~10 minutes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# install (downloads a single binary, no Node/npm required)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.sh | sh

&lt;span class="c"&gt;# clone the test repo + index it&lt;/span&gt;
git clone https://github.com/honojs/hono.git ~/tmp/hono
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/tmp/hono
codegraph init &lt;span class="nt"&gt;-i&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Index build time on Hono (362 files, 4,128 nodes, 8,225 edges): &lt;strong&gt;1.7 seconds.&lt;/strong&gt; On-disk index: 7.1 MB.&lt;/p&gt;

&lt;p&gt;Per-condition setup for the two arms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Baseline (control):&lt;/strong&gt; a clean copy of Hono via &lt;code&gt;rsync -a --exclude='.codegraph/'&lt;/code&gt; to a separate directory so Claude couldn't accidentally grep into the index. No MCP servers registered. Agent uses native Glob + Grep + Read + Explore + Task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CodeGraph active:&lt;/strong&gt; original Hono directory with &lt;code&gt;.codegraph/&lt;/code&gt; present, MCP server registered:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"codegraph"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"codegraph"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"serve"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--mcp"&lt;/span&gt;&lt;span class="p"&gt;]}}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both arms run &lt;code&gt;claude --print --output-format stream-json --model opus&lt;/code&gt; so the model and the rest of the agent loop are identical; the only varying input is whether the CodeGraph MCP server is in the loop. Each run is a fresh session with no prior context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verifying the tool actually ran (this is not optional)
&lt;/h3&gt;

&lt;p&gt;A retrieval-tool benchmark is only valid if the tool is actually in the loop — and I learned that the hard way. My first pass at this benchmark silently ran with CodeGraph's MCP server &lt;strong&gt;never connected&lt;/strong&gt;: the config was missing the &lt;code&gt;--mcp&lt;/code&gt; flag, and Claude Code proceeds without a server that fails its hand-shake in time rather than erroring out. Every "CodeGraph" run was really just grep+Read. The comparison was noise, and the numbers looked plausibly small — which is exactly how a broken benchmark slips through.&lt;/p&gt;

&lt;p&gt;So for the data here, every run is instrumented:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--strict-mcp-config&lt;/code&gt;&lt;/strong&gt; — only the server under test is loaded, with no contamination from other globally-registered MCP servers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-warmed daemon + &lt;code&gt;MCP_TIMEOUT=30000&lt;/code&gt;&lt;/strong&gt; — CodeGraph's stdio server attaches to a warm daemon and finishes its handshake &lt;em&gt;before&lt;/em&gt; the agent loop starts. (MCP connection is async; on a fast question the agent can otherwise finish before a cold server is ready.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A per-run assertion&lt;/strong&gt; that CodeGraph reached &lt;code&gt;connected&lt;/code&gt; status, plus a record of whether the agent actually invoked a &lt;code&gt;codegraph_*&lt;/code&gt; tool. Runs that didn't connect were discarded.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All 20 CodeGraph runs in this post connected. The agent invoked CodeGraph on Q1-Q4 (4/4 repeats each) and — correctly — chose grep on the Q5 control (0/4). Most vendor benchmarks never report this check. After watching mine fail it silently, I won't publish a retrieval benchmark without it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5 questions
&lt;/h2&gt;

&lt;p&gt;Full verbatim prompts in the Appendix. Brief overview:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;What it tests&lt;/th&gt;
&lt;th&gt;Hypothesis for CodeGraph&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Route resolution: &lt;code&gt;GET /users/:id&lt;/code&gt; → handler&lt;/td&gt;
&lt;td&gt;Route-aware extraction&lt;/td&gt;
&lt;td&gt;Strong win&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Middleware chain trace through &lt;code&gt;app.use&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Dynamic dispatch tracing&lt;/td&gt;
&lt;td&gt;Decisive win via structural lookup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-runtime adapter architecture&lt;/td&gt;
&lt;td&gt;Cross-file abstraction&lt;/td&gt;
&lt;td&gt;Mid-strong win&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Refactor impact: add mandatory &lt;code&gt;requestId&lt;/code&gt; to &lt;code&gt;Context&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Impact propagation + completeness&lt;/td&gt;
&lt;td&gt;Strong win (what &lt;code&gt;codegraph_impact&lt;/code&gt; is built for)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Text search: every literal &lt;code&gt;'Content-Type'&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Keyword search baseline&lt;/td&gt;
&lt;td&gt;~Parity; agent should &lt;em&gt;decline&lt;/em&gt; the tool&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Q5 is the &lt;strong&gt;CONTROL&lt;/strong&gt; — the tool should not win here, and whether the agent even reaches for it is itself a signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data
&lt;/h2&gt;

&lt;p&gt;Each row averaged over 4 repeats. Cost is Claude's own &lt;code&gt;total_cost_usd&lt;/code&gt; (the API's authoritative figure, not my own multiplication); wall latency from request to final token; tool calls counted from unique &lt;code&gt;tool_use&lt;/code&gt; blocks in the transcript; tokens are input + output (cache tokens tracked separately in the &lt;a href="https://harrisonsec.com/codegraph-hono-benchmark-data.csv" rel="noopener noreferrer"&gt;CSV&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost / tokens
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Q&lt;/th&gt;
&lt;th&gt;Baseline cost&lt;/th&gt;
&lt;th&gt;CodeGraph cost&lt;/th&gt;
&lt;th&gt;Δ cost&lt;/th&gt;
&lt;th&gt;Baseline tokens&lt;/th&gt;
&lt;th&gt;CodeGraph tokens&lt;/th&gt;
&lt;th&gt;Δ tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Q1 route&lt;/td&gt;
&lt;td&gt;$0.321&lt;/td&gt;
&lt;td&gt;$0.393&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;+22.5%&lt;/strong&gt; ❌&lt;/td&gt;
&lt;td&gt;10,115&lt;/td&gt;
&lt;td&gt;6,045&lt;/td&gt;
&lt;td&gt;−40.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q2 middleware&lt;/td&gt;
&lt;td&gt;$0.212&lt;/td&gt;
&lt;td&gt;$0.303&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;+43.4%&lt;/strong&gt; ❌&lt;/td&gt;
&lt;td&gt;7,233&lt;/td&gt;
&lt;td&gt;5,649&lt;/td&gt;
&lt;td&gt;−21.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q3 multi-runtime&lt;/td&gt;
&lt;td&gt;$0.490&lt;/td&gt;
&lt;td&gt;$0.348&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;−28.9%&lt;/strong&gt; ✓✓&lt;/td&gt;
&lt;td&gt;11,582&lt;/td&gt;
&lt;td&gt;7,048&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−39.1%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q4 refactor&lt;/td&gt;
&lt;td&gt;$0.402&lt;/td&gt;
&lt;td&gt;$0.509&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;+26.5%&lt;/strong&gt; ❌&lt;/td&gt;
&lt;td&gt;9,119&lt;/td&gt;
&lt;td&gt;8,567&lt;/td&gt;
&lt;td&gt;−6.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q5 text (ctrl)&lt;/td&gt;
&lt;td&gt;$0.267&lt;/td&gt;
&lt;td&gt;$0.253&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−5.3%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8,874&lt;/td&gt;
&lt;td&gt;8,998&lt;/td&gt;
&lt;td&gt;+1.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Aggregate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.338&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.361&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+6.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9,385&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7,261&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−22.6%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Tool calls / wall latency
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Q&lt;/th&gt;
&lt;th&gt;Baseline calls&lt;/th&gt;
&lt;th&gt;CodeGraph calls&lt;/th&gt;
&lt;th&gt;Δ calls&lt;/th&gt;
&lt;th&gt;Baseline latency&lt;/th&gt;
&lt;th&gt;CodeGraph latency&lt;/th&gt;
&lt;th&gt;Δ latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Q1 route&lt;/td&gt;
&lt;td&gt;7.8&lt;/td&gt;
&lt;td&gt;6.8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−12.9%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;49.8s&lt;/td&gt;
&lt;td&gt;51.2s&lt;/td&gt;
&lt;td&gt;+2.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q2 middleware&lt;/td&gt;
&lt;td&gt;5.0&lt;/td&gt;
&lt;td&gt;4.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−20.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;41.2s&lt;/td&gt;
&lt;td&gt;43.1s&lt;/td&gt;
&lt;td&gt;+4.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q3 multi-runtime&lt;/td&gt;
&lt;td&gt;35.2&lt;/td&gt;
&lt;td&gt;7.0&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;−80.1%&lt;/strong&gt; ✓✓&lt;/td&gt;
&lt;td&gt;123.7s&lt;/td&gt;
&lt;td&gt;58.4s&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;−52.8%&lt;/strong&gt; ✓✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q4 refactor&lt;/td&gt;
&lt;td&gt;19.8&lt;/td&gt;
&lt;td&gt;11.8&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;−40.5%&lt;/strong&gt; ✓&lt;/td&gt;
&lt;td&gt;87.9s&lt;/td&gt;
&lt;td&gt;85.9s&lt;/td&gt;
&lt;td&gt;−2.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q5 text (ctrl)&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;td&gt;−11.1%&lt;/td&gt;
&lt;td&gt;51.2s&lt;/td&gt;
&lt;td&gt;43.5s&lt;/td&gt;
&lt;td&gt;−15.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Aggregate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−55.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;70.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;56.4s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−20.3%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two headline rows: &lt;strong&gt;−55% tool calls&lt;/strong&gt; (real and consistent — CodeGraph used fewer tools on every single question) and &lt;strong&gt;+6.8% cost&lt;/strong&gt; (CodeGraph is &lt;em&gt;not&lt;/em&gt; cheaper on Hono). The latency win (−20%) is real but concentrated: almost all of it is Q3; on Q1/Q2/Q4 latency is within ±5%.&lt;/p&gt;

&lt;h3&gt;
  
  
  The variance story — CodeGraph bounds the worst case
&lt;/h3&gt;

&lt;p&gt;Averages hide the most interesting result. Baseline tool-call counts, per repeat:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Q&lt;/th&gt;
&lt;th&gt;Baseline (4 repeats)&lt;/th&gt;
&lt;th&gt;CodeGraph (4 repeats)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Q3 multi-runtime&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14, 23, 52, 52&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5, 6, 8, 9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q4 refactor&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9, 10, 13, 47&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9, 10, 12, 16&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On the broad questions, baseline grep+Read &lt;strong&gt;occasionally spiraled&lt;/strong&gt; — the agent without an index wandered to 47-52 tool calls chasing files. Across all 40 runs, baseline ranged from 2 to &lt;strong&gt;52&lt;/strong&gt; tool calls; CodeGraph ranged from 1 to &lt;strong&gt;16&lt;/strong&gt;. &lt;strong&gt;A large part of CodeGraph's value here isn't the mean — it's that it bounds the worst case.&lt;/strong&gt; When the structural answer is one graph query away, the agent can't spiral. That's a reliability property, not just an efficiency one, and it doesn't show up in a single average.&lt;/p&gt;

&lt;p&gt;A caveat on sample size: this is 4 repeats on one repo. Treat the &lt;strong&gt;magnitudes as indicative and the directions as robust&lt;/strong&gt; — CodeGraph used fewer tool calls in every question and nearly every repeat, and the cost direction was consistent within cells (more expensive on Q1/Q2/Q4, cheaper only on Q3). What I would not over-read is the exact percentages; a 47-vs-9 baseline spread on Q4 means the per-question means carry real uncertainty even at n=4.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-question narrative
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Q1 (route resolution) — CodeGraph used, but more expensive.&lt;/strong&gt; Both arms traced &lt;code&gt;app.fetch&lt;/code&gt; → &lt;code&gt;#dispatch&lt;/code&gt; → &lt;code&gt;this.router.match()&lt;/code&gt; and read &lt;code&gt;SmartRouter&lt;/code&gt; → &lt;code&gt;RegExpRouter&lt;/code&gt;. CodeGraph used &lt;code&gt;codegraph_context&lt;/code&gt; + &lt;code&gt;codegraph_trace&lt;/code&gt; (2 calls/run) and cut tool calls 13% and tokens 40% — but cost rose 22.5% and latency was flat. The structural context it front-loaded was heavier than the 1-2 grep steps it saved. &lt;strong&gt;Hono's router is small enough (5-6 files for the full picture) that grep finds it directly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q2 (middleware chain trace) — used, 43% more expensive.&lt;/strong&gt; CodeGraph landed the &lt;code&gt;app.use&lt;/code&gt; → middleware array → &lt;code&gt;compose()&lt;/code&gt; chain in 4 tool calls vs baseline's 5, but cost jumped 43%. Same mechanism as Q1, more pronounced: the call-chain context payload dominated a question baseline answered cheaply in 5 small steps. &lt;strong&gt;The clearest example of "fewer steps, more dollars."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q3 (multi-runtime adapter) — the unambiguous win.&lt;/strong&gt; Enumerating 6 adapter directories (Cloudflare Workers / Deno / Bun / Node / AWS Lambda / Vercel Edge) is exactly where one graph query beats many Glob+grep iterations. Baseline averaged &lt;strong&gt;35 tool calls and 124s&lt;/strong&gt; (and spiraled to 52 twice); CodeGraph: &lt;strong&gt;7 calls, 58s, −29% cost.&lt;/strong&gt; This is the question shape where structural retrieval pays back on every axis at once — and the only one where it saved money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q4 (refactor impact: add &lt;code&gt;requestId&lt;/code&gt; to &lt;code&gt;Context&lt;/code&gt;) — tools halved, cost up.&lt;/strong&gt; Supposed to be CodeGraph's strongest case (&lt;code&gt;codegraph_impact&lt;/code&gt; is built for blast-radius). It did cut tool calls 40% (and tamed baseline's 47-call spiral), but cost rose 26.5%: the impact-graph walk pulled wide context the agent didn't fully need at Hono's size. Completeness was comparable across arms (both identified the &lt;code&gt;Context&lt;/code&gt; constructors in &lt;code&gt;src/hono-base.ts&lt;/code&gt;, the &lt;code&gt;Variables&lt;/code&gt; plumbing, and the per-method handler signatures). &lt;strong&gt;Fewer, more-bounded steps — but the propagation graph isn't wide enough here to pay back on dollars.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q5 (text search, control) — the agent declined the tool, and that's the point.&lt;/strong&gt; On a pure literal-&lt;code&gt;'Content-Type'&lt;/code&gt; search, the agent &lt;strong&gt;never invoked CodeGraph in any of the 4 repeats&lt;/strong&gt; — it reached straight for grep. Result: near-parity (−5% cost, −15% latency, both inside the noise). The old version of this post claimed an "FTS5 fallback" win here; that was an artifact of the broken first run. The truth is simpler and better: &lt;strong&gt;with CodeGraph connected and available, the agent correctly chose grep for a grep-shaped task.&lt;/strong&gt; No over-engineering. That's the table-stakes behavior you actually want from a retrieval tool, and it's worth more than a fabricated 1-step saving.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-validation with CodeGraph's published 7-repo benchmark
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Published (7 repos)&lt;/th&gt;
&lt;th&gt;This test (Hono, n=4)&lt;/th&gt;
&lt;th&gt;Reproduces?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool calls&lt;/td&gt;
&lt;td&gt;−71%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−55%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✓ Yes — same ballpark&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;−46%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−20%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~ Directionally, ~half the magnitude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens&lt;/td&gt;
&lt;td&gt;−57%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−23%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~ Directionally, smaller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−35%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+6.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗ &lt;strong&gt;No — opposite sign&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The tool-call reduction is the part that generalizes cleanly to a repo the team didn't pick. The cost reduction is the part that doesn't — and that's not an attack on CodeGraph, it's a statement about repo size. Their published suite skews large (VS Code is 30k+ files; Tokio is mid-thousands), and their own published table is &lt;strong&gt;non-monotonic in file count&lt;/strong&gt; — Gin (~110 files) shows a 21% cost win while OkHttp (~645 files) shows ~2%, and Tokio (~790 files) shows 82%. &lt;strong&gt;Repo size matters, but it isn't a clean threshold; question shape matters at least as much.&lt;/strong&gt; A single repo can't locate a universal crossover. What Hono shows is one clear data point: at ~280 files, the step-count win is already here, the dollar win isn't yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision matrix — install CodeGraph when
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Install CodeGraph&lt;/th&gt;
&lt;th&gt;Skip (baseline grep+Read is enough)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;You care about &lt;strong&gt;fewer agent steps / lower latency&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;✓ (−55% tool calls even on a small repo)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You're optimizing &lt;strong&gt;dollar cost&lt;/strong&gt; on a sub-~500-file repo&lt;/td&gt;
&lt;td&gt;(may cost slightly more)&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repo is large (low thousands of files+)&lt;/td&gt;
&lt;td&gt;✓ (dollar win should appear too)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workload is &lt;strong&gt;broad multi-file navigation / architecture&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;✓ (this is where it wins on every axis)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workload is &lt;strong&gt;narrow single-symbol lookups&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;(fewer steps, but not cheaper)&lt;/td&gt;
&lt;td&gt;(grep is fine)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Static-typed (TS / Rust / Go / Java / Swift / C#)&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dynamic-typed (Python / Ruby / untyped JS)&lt;/td&gt;
&lt;td&gt;⚠️ partial (tree-sitter misses runtime semantics)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workflow is text-search dominant&lt;/td&gt;
&lt;td&gt;(no penalty — the agent declines the tool)&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent &lt;strong&gt;reliability&lt;/strong&gt; matters (bounding worst-case exploration)&lt;/td&gt;
&lt;td&gt;✓ (caps the 50-tool-call spirals)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You're not sure&lt;/td&gt;
&lt;td&gt;install it; ~10 min, &amp;lt;2s to index a Hono-sized repo, uninstall is one command&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key call from the data:&lt;/strong&gt; at Hono's scale the reason to install CodeGraph is &lt;strong&gt;fewer steps, lower latency, and bounded worst-case exploration&lt;/strong&gt; — not a lower bill. If your decision rule is purely dollars-per-query on a small repo, baseline grep+Read is still competitive. If it's agent speed, predictability, or you're working in a larger codebase, the index earns its place.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd want to test next
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Larger TS repo head-to-head&lt;/strong&gt; — same 5 questions on Prisma (~2,000 TS files) or TanStack Query (~600 files) to find where the &lt;em&gt;dollar&lt;/em&gt; crossover actually is. Hypothesis: cost flips negative somewhere in the high hundreds to low thousands of files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic-typed repo&lt;/strong&gt; — same 5 questions on FastAPI or Django REST to see how much of the step-count win survives when tree-sitter can't resolve dynamic dispatch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-session compounding&lt;/strong&gt; — single-question runs miss the multi-turn agent context. Does the per-query step saving compound across a real session, or stay linear?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All future content. None block the install-or-not decision the data above already answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  One-line verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;On TypeScript / Rust / Go projects, install CodeGraph if you want fewer agent steps, lower latency, and bounded worst-case exploration — those reproduce on an independent repo. Don't install it expecting a lower bill on a small codebase: at Hono's ~280-file scale it was ~7% &lt;em&gt;more&lt;/em&gt; expensive, and in this benchmark a cost win appeared only on broad multi-file navigation (Q3) — the published −35% likely needs much larger repos.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For the architectural deep-dive on &lt;em&gt;why&lt;/em&gt; this class of tool works and where the abstractions leak, see the companion Lab piece &lt;em&gt;Agent Retrieval Above the Crossover: A First-Principles Read of CodeGraph&lt;/em&gt; (publishing 2026-06-08).&lt;/p&gt;

&lt;p&gt;For the broader cost-curve framework this benchmark applies, see the prior Lab post: &lt;em&gt;&lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;Agent Retrieval Is a Cost Curve Problem&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Appendix: Benchmark Questions
&lt;/h2&gt;

&lt;p&gt;The 5 prompts used, verbatim. Each was sent to Claude Code in a fresh session (&lt;code&gt;claude --print --model opus&lt;/code&gt;), 4 times per arm. No follow-up prompts within a run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q1 — Route resolution
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;When a request hits &lt;code&gt;GET /users/:id&lt;/code&gt; in a Hono app, walk me through how Hono's routing finds and invokes the right handler. Where in the source does the URL → handler matching happen, and what data structure stores the route table?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Q2 — Middleware chain trace
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Hono middleware is chained via &lt;code&gt;app.use(middleware)&lt;/code&gt;. When a request flows through several middlewares before hitting the handler, what's the actual call stack from the incoming request to the handler? Specifically — how does Hono ensure middleware runs in order, and how is &lt;code&gt;c&lt;/code&gt; (context) + &lt;code&gt;next&lt;/code&gt; passed between them?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Q3 — Cross-file abstraction navigation
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Hono supports multiple runtime adapters (Cloudflare Workers, Deno, Bun, Node, AWS Lambda, Vercel Edge). How is this multi-runtime abstraction implemented? What's the shared interface, and where do the per-runtime adapters live? Show me the architecture.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Q4 — Refactor impact
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Imagine I'm planning to add a mandatory new property &lt;code&gt;requestId: string&lt;/code&gt; to Hono's &lt;code&gt;Context&lt;/code&gt; class. What files and functions across the codebase would be affected? Give me the full blast radius — where Context is constructed, where it's typed in signatures, and where mandatory-property additions would break.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Q5 — Text search (control)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Find every place in the Hono codebase where the literal string &lt;code&gt;'Content-Type'&lt;/code&gt; (the exact HTTP header name, case-sensitive) appears. Include source code, tests, comments, and documentation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Scoring &amp;amp; environment
&lt;/h3&gt;

&lt;p&gt;Each question evaluated on cost (&lt;code&gt;total_cost_usd&lt;/code&gt;), tokens (input+output, cache tracked separately), wall latency, unique tool-call count, and a manual correctness/completeness check (both arms agreed on the same answer in every Q). Every CodeGraph run was additionally checked for &lt;code&gt;connected&lt;/code&gt; MCP status and actual &lt;code&gt;codegraph_*&lt;/code&gt; tool invocation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Environment:&lt;/strong&gt; Hono @ commit &lt;code&gt;2cbeadda&lt;/code&gt; (2026-05-28) · CodeGraph 0.9.7 · Claude Code 2.1.159 · model &lt;code&gt;claude-opus-4-8&lt;/code&gt; (Opus 4.8) · macOS. &lt;strong&gt;Raw per-run data&lt;/strong&gt; (cost, tokens, tool calls, latency, connection status) for all 40 runs: &lt;strong&gt;&lt;a href="https://harrisonsec.com/codegraph-hono-benchmark-data.csv" rel="noopener noreferrer"&gt;CSV&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Appendix: Why there's no third arm (knowing)
&lt;/h2&gt;

&lt;p&gt;I intended to include &lt;strong&gt;knowing&lt;/strong&gt; (Blackwell Systems) as a third arm. In headless batch mode its MCP server connected only intermittently — knowing advertises asynchronous &lt;code&gt;tools/listChanged&lt;/code&gt;, which races Claude Code's MCP startup window, so on most runs the agent never saw knowing's tools and silently fell back to grep+Read.&lt;/p&gt;

&lt;p&gt;Reporting task-cost numbers from runs where the tool wasn't actually in the loop is exactly the trap that invalidated my &lt;em&gt;first&lt;/em&gt; attempt at this benchmark (see Verifying the tool actually ran), so I'm not publishing knowing figures. That's a limitation of my batch harness, &lt;strong&gt;not&lt;/strong&gt; a verdict on knowing — a persistent / pre-warmed MCP host or an interactive session would likely fix it. If I get a reliable knowing setup, I'll benchmark it on its own terms and publish separately.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Lab companion (first-principles architectural read of CodeGraph and the class of tools it represents): **Agent Retrieval Above the Crossover&lt;/em&gt;* — publishing 2026-06-08.*&lt;br&gt;
&lt;em&gt;Prior Lab post in the retrieval / memory mini-series: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;Agent Retrieval Is a Cost Curve Problem&lt;/a&gt;***&lt;br&gt;
&lt;em&gt;Companion Lab post on cross-session memory: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/agent-memory-is-a-cache-coherence-problem/" rel="noopener noreferrer"&gt;Agent Memory Is a Cache Coherence Problem&lt;/a&gt;***&lt;br&gt;
&lt;em&gt;CodeGraph repo: *&lt;/em&gt;&lt;a href="https://github.com/colbymchenry/codegraph" rel="noopener noreferrer"&gt;https://github.com/colbymchenry/codegraph&lt;/a&gt;***&lt;/p&gt;

</description>
      <category>ai</category>
      <category>benchmark</category>
      <category>devtools</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Agent Memory Is a Cache Coherence Problem</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Fri, 29 May 2026 15:17:06 +0000</pubDate>
      <link>https://dev.to/harrisonsec/agent-memory-is-a-cache-coherence-problem-4jmk</link>
      <guid>https://dev.to/harrisonsec/agent-memory-is-a-cache-coherence-problem-4jmk</guid>
      <description>&lt;p&gt;This post is one half of a pair. The other half — &lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;&lt;em&gt;Agent Retrieval Is a Cost Curve Problem&lt;/em&gt;&lt;/a&gt; — argues that Claude Code's within-session code retrieval avoids RAG because the cost curve says it should. This piece argues something parallel about &lt;em&gt;cross-session memory&lt;/em&gt;: the lossy auto-capture systems being marketed as "AI memory" are, in classical distributed-systems vocabulary, &lt;strong&gt;caches&lt;/strong&gt;. They inherit every problem caches have always had, and the hype around them is mostly arguing for one side of a write-back vs write-through trade as if the other side didn't exist.&lt;/p&gt;

&lt;p&gt;Sequel to &lt;em&gt;&lt;a href="https://harrisonsec.com/blog/consistency-scenarios-and-approaches-production/" rel="noopener noreferrer"&gt;Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works&lt;/a&gt;&lt;/em&gt;. If you remember that piece, you'll recognize the move: take a problem space the AI community is debating with fresh vocabulary, and notice that the database community already mapped the failure modes thirty years ago.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Cross-session agent memory varies on two independent axes, not one: &lt;strong&gt;fidelity&lt;/strong&gt; (lossless vs lossy) and &lt;strong&gt;retrieval&lt;/strong&gt; (exact lookup vs approximate vector). Claude Code's built-in memory plus a hand-written &lt;code&gt;CLAUDE.md&lt;/code&gt; lives at &lt;em&gt;lossless + exact&lt;/em&gt;. The currently-trending &lt;code&gt;claude-mem&lt;/code&gt; (70k+ GitHub stars as of May 2026) lives at &lt;em&gt;lossy + approximate&lt;/em&gt; — auto-capture passed through a Haiku compression step and recalled via SQLite-FTS5 + Chroma vectors. The second is, structurally, a cache: a derived lossy view of the source of truth, retrieved approximately. It inherits every cache problem the distributed-systems literature already named — staleness, wrong-row retrieval, no coherence with the source. I ran claude-mem under controlled conditions and compared it against the deterministic CLAUDE.md baseline; the numbers (and the &lt;em&gt;kinds&lt;/em&gt; of failures) line up with the classical cache framing. The most interesting failure isn't tokens or latency. It's that compression flattens &lt;strong&gt;modality&lt;/strong&gt; — a hedged hypothetical becomes a flat fact, indistinguishable from a firm decision. An agent confidently acting on a maybe-it-said-yes is worse than an agent with no memory at all.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Two Axes (Don't Collapse Them Into One)
&lt;/h2&gt;

&lt;p&gt;Most takes on agent memory collapse the design space onto a single axis: "lossless and limited" vs "lossy and powerful." That framing hides the failure modes.&lt;/p&gt;

&lt;p&gt;The real space is two-dimensional:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fidelity&lt;/strong&gt; — &lt;em&gt;lossless&lt;/em&gt; (verbatim, what-you-wrote-is-what-was-stored) vs &lt;em&gt;lossy&lt;/em&gt; (LLM-compressed: a summary written by a smaller model over the raw events).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt; — &lt;em&gt;exact / curated&lt;/em&gt; (you wrote an index entry; the system reads it back) vs &lt;em&gt;approximate / semantic&lt;/em&gt; (vector embeddings; cosine similarity; top-K nearest neighbors).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBzdWJncmFwaCBheGVzWyJUd28gYXhlcywgZm91ciBxdWFkcmFudHMiXQogICAgICAgIGRpcmVjdGlvbiBMUgogICAgICAgIHN1YmdyYXBoIGNvbDFbIkV4YWN0IHJldHJpZXZhbCJdCiAgICAgICAgICAgIHExWyI8Yj5Mb3NzbGVzcyArIEV4YWN0PC9iPjxici8-Q0xBVURFLm1kIC8gaGFuZC1jdXJhdGVkPGJyLz5QcmVjaXNpb246IGhpZ2g8YnIvPkNvdmVyYWdlOiBsb3c8YnIvPlVwa2VlcDogbWFudWFsIl0KICAgICAgICAgICAgcTNbIjxiPkxvc3N5ICsgRXhhY3Q8L2I-PGJyLz5Db21wcmVzc2VkIGJ1dCBrZXl3b3JkLWluZGV4ZWQ8YnIvPih1bnVzdWFsIGluIHByYWN0aWNlKSJdCiAgICAgICAgZW5kCiAgICAgICAgc3ViZ3JhcGggY29sMlsiQXBwcm94aW1hdGUgcmV0cmlldmFsIl0KICAgICAgICAgICAgcTJbIjxiPkxvc3NsZXNzICsgQXBwcm94aW1hdGU8L2I-PGJyLz5SYXcgbG9ncyArIHZlY3RvciBzZWFyY2g8YnIvPihodWdlIHN0b3JhZ2U7IHdlYWsgc2lnbmFsKSJdCiAgICAgICAgICAgIHE0WyI8Yj5Mb3NzeSArIEFwcHJveGltYXRlPC9iPjxici8-PGI-Y2xhdWRlLW1lbSBhbmQgbW9zdCAnQUkgbWVtb3J5JyB0b29sczwvYj48YnIvPkF1dG8tY29tcHJlc3MgKyB2ZWN0b3IgcmVjYWxsPGJyLz5Ud28gbGF5ZXJzIG9mIGFwcHJveGltYXRpb24iXQogICAgICAgIGVuZAogICAgZW5kCgogICAgY2xhc3NEZWYgZ29vZCBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzRGVmIHdhcm4gZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiBiYWQgZmlsbDojZmVkN2Q3LHN0cm9rZTojYzUzMDMwCiAgICBjbGFzcyBxMSBnb29kCiAgICBjbGFzcyBxMiB3YXJuCiAgICBjbGFzcyBxMyB3YXJuCiAgICBjbGFzcyBxNCBiYWQ%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBzdWJncmFwaCBheGVzWyJUd28gYXhlcywgZm91ciBxdWFkcmFudHMiXQogICAgICAgIGRpcmVjdGlvbiBMUgogICAgICAgIHN1YmdyYXBoIGNvbDFbIkV4YWN0IHJldHJpZXZhbCJdCiAgICAgICAgICAgIHExWyI8Yj5Mb3NzbGVzcyArIEV4YWN0PC9iPjxici8-Q0xBVURFLm1kIC8gaGFuZC1jdXJhdGVkPGJyLz5QcmVjaXNpb246IGhpZ2g8YnIvPkNvdmVyYWdlOiBsb3c8YnIvPlVwa2VlcDogbWFudWFsIl0KICAgICAgICAgICAgcTNbIjxiPkxvc3N5ICsgRXhhY3Q8L2I-PGJyLz5Db21wcmVzc2VkIGJ1dCBrZXl3b3JkLWluZGV4ZWQ8YnIvPih1bnVzdWFsIGluIHByYWN0aWNlKSJdCiAgICAgICAgZW5kCiAgICAgICAgc3ViZ3JhcGggY29sMlsiQXBwcm94aW1hdGUgcmV0cmlldmFsIl0KICAgICAgICAgICAgcTJbIjxiPkxvc3NsZXNzICsgQXBwcm94aW1hdGU8L2I-PGJyLz5SYXcgbG9ncyArIHZlY3RvciBzZWFyY2g8YnIvPihodWdlIHN0b3JhZ2U7IHdlYWsgc2lnbmFsKSJdCiAgICAgICAgICAgIHE0WyI8Yj5Mb3NzeSArIEFwcHJveGltYXRlPC9iPjxici8-PGI-Y2xhdWRlLW1lbSBhbmQgbW9zdCAnQUkgbWVtb3J5JyB0b29sczwvYj48YnIvPkF1dG8tY29tcHJlc3MgKyB2ZWN0b3IgcmVjYWxsPGJyLz5Ud28gbGF5ZXJzIG9mIGFwcHJveGltYXRpb24iXQogICAgICAgIGVuZAogICAgZW5kCgogICAgY2xhc3NEZWYgZ29vZCBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzRGVmIHdhcm4gZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiBiYWQgZmlsbDojZmVkN2Q3LHN0cm9rZTojYzUzMDMwCiAgICBjbGFzcyBxMSBnb29kCiAgICBjbGFzcyBxMiB3YXJuCiAgICBjbGFzcyBxMyB3YXJuCiAgICBjbGFzcyBxNCBiYWQ%3D" alt="flowchart TD" width="730" height="660"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The interesting failures live in the bottom-right quadrant — &lt;em&gt;lossy + approximate&lt;/em&gt; — because the failures of one axis are &lt;em&gt;invisible to a user evaluating along the other&lt;/em&gt;. The system loses information at write time and approximates at read time, and the user sees a single "answer" that fused both losses. Debugging means asking: was that wrong because the original event was corrupted in compression, or because retrieval surfaced the wrong row? You usually can't tell.&lt;/p&gt;

&lt;p&gt;Most takes conflate "lossy = unreliable" with "vector = powerful." They're orthogonal. You can have lossless + vector (raw logs, vector-indexed — fine but storage-heavy and signal-weak). You can have lossy + exact (compressed summaries, FTS-indexed — works for some applications). Lossy + approximate is what's being marketed as "AI memory," and it's the quadrant most exposed to compounding failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Baseline: Lossless + Exact (&lt;code&gt;CLAUDE.md&lt;/code&gt; + built-in memory)
&lt;/h2&gt;

&lt;p&gt;Claude Code's built-in memory system, paired with a hand-written &lt;code&gt;CLAUDE.md&lt;/code&gt;, sits firmly in the &lt;em&gt;lossless + exact&lt;/em&gt; quadrant. The design choices, made explicit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Verbatim storage.&lt;/strong&gt; What the author wrote is what gets stored. Markdown in, Markdown out. There's no compression step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always-loaded index + on-demand body.&lt;/strong&gt; &lt;code&gt;MEMORY.md&lt;/code&gt; (the index) gets injected into every session — capped, deliberately, around 200 lines to avoid context bloat. Individual memory files are read on demand, when the index entry suggests one is relevant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Curated, not a firehose.&lt;/strong&gt; A human (or a structured prompt) decides what is worth storing. Not every tool call. Not every file read. Only the durable, surprising, cross-session-useful facts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exact recall.&lt;/strong&gt; The model reads a specific file. Either it's there and is read verbatim, or it isn't. No fuzzy near-matches; no confidence score.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mental model: a hand-written &lt;strong&gt;WAL&lt;/strong&gt; (write-ahead log) plus a &lt;strong&gt;curated index&lt;/strong&gt;. Both are close to a source of truth — the author's deliberate decision — and they recall exactly what was committed.&lt;/p&gt;

&lt;p&gt;Tradeoffs are visible from this framing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Precision&lt;/strong&gt;: 100%. What you stored is what you get back.&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Auditability&lt;/strong&gt;: you can read the file yourself. No black box.&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Token economics&lt;/strong&gt;: index sits in context; bodies fetched only when needed. Cheap.&lt;/li&gt;
&lt;li&gt;❌ &lt;strong&gt;Coverage&lt;/strong&gt;: limited to what the author bothered to write down.&lt;/li&gt;
&lt;li&gt;❌ &lt;strong&gt;Upkeep&lt;/strong&gt;: manual. Memories rot; updating them is a chore.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The earlier post in this series, &lt;a href="https://harrisonsec.com/blog/claude-code-memory-first-principles-tradeoffs/" rel="noopener noreferrer"&gt;&lt;em&gt;Claude Code Deep Dive Part 4: Why It Uses Markdown Files Instead of Vector DBs&lt;/em&gt;&lt;/a&gt;, walks through the specific design choices in the publicly circulated build snapshot. Here I'll focus on what happens when you put the lossless+exact baseline next to the lossy+approximate contender.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Contender: Lossy + Approximate (&lt;code&gt;claude-mem&lt;/code&gt;)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/thedotmack/claude-mem" rel="noopener noreferrer"&gt;&lt;code&gt;claude-mem&lt;/code&gt;&lt;/a&gt; is among the highest-starred entries in the agent-memory category right now (70k+ GitHub stars as of May 2026). I tested v13.2.0. The architecture, summarized:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto-capture firehose.&lt;/strong&gt; Lifecycle hooks (&lt;code&gt;SessionStart&lt;/code&gt;, &lt;code&gt;UserPromptSubmit&lt;/code&gt;, &lt;code&gt;PostToolUse&lt;/code&gt;, &lt;code&gt;Stop&lt;/code&gt;, &lt;code&gt;SessionEnd&lt;/code&gt;) fire on essentially everything the model does. The hooks pipe events to a Bun worker on &lt;code&gt;localhost:37701&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression to facts/narrative.&lt;/strong&gt; At session boundaries the worker invokes &lt;strong&gt;Haiku 4.5&lt;/strong&gt; to compress raw observations into structured &lt;em&gt;facts&lt;/em&gt; (a JSON array) and a &lt;em&gt;narrative&lt;/em&gt; (a paragraph). This compression runs &lt;strong&gt;on your own Claude subscription&lt;/strong&gt; — billed to your quota, ~5,150 compression tokens per session in my test.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid index.&lt;/strong&gt; Compressed observations are written to two indexes simultaneously: &lt;strong&gt;SQLite-FTS5&lt;/strong&gt; (full-text keyword) and &lt;strong&gt;Chroma&lt;/strong&gt; (vector embeddings). Recall is hybrid — keyword and ANN together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disables built-in memory.&lt;/strong&gt; Installation sets &lt;code&gt;CLAUDE_CODE_DISABLE_AUTO_MEMORY=1&lt;/code&gt; in &lt;code&gt;~/.claude/settings.json&lt;/code&gt;. The built-in CLAUDE.md path is turned off; &lt;code&gt;claude-mem&lt;/code&gt; is meant to &lt;em&gt;replace&lt;/em&gt;, not augment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardcoded data dir.&lt;/strong&gt; Despite respecting &lt;code&gt;CLAUDE_CONFIG_DIR&lt;/code&gt; for plugin config, the data store path (&lt;code&gt;~/.claude-mem&lt;/code&gt;) is hardcoded. Sandboxing is partial.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mental model: a &lt;strong&gt;derived, lossy materialized view&lt;/strong&gt; of session events, plus a &lt;strong&gt;similarity cache&lt;/strong&gt; for retrieval. Two layers of approximation: a lossy &lt;em&gt;write&lt;/em&gt; transform (Haiku compression) and an approximate &lt;em&gt;read&lt;/em&gt; transform (ANN). Each compounds the other.&lt;/p&gt;

&lt;p&gt;This is exactly what the bottom-right quadrant looks like in deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Test (Real Numbers, 2026-05-20)
&lt;/h2&gt;

&lt;p&gt;To make the comparison concrete I built a small URL-shortener as the test bed: simple enough that the "right answer" was unambiguous, structured enough that real architectural decisions had to be recorded. The setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Test arm:&lt;/strong&gt; &lt;code&gt;claude-mem&lt;/code&gt; v13.2.0, sandboxed via &lt;code&gt;CLAUDE_CONFIG_DIR=/tmp/cmem-test/dot-claude&lt;/code&gt;. Built-in memory disabled (per install default). Session 1 established four decisions about the codebase; Session 2 asked for them back. This is the arm I measured.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Baseline:&lt;/strong&gt; the same four decisions written into a hand-curated &lt;code&gt;CLAUDE.md&lt;/code&gt; — 1,075 chars, ~269 tokens. Built-in memory intact. The baseline numbers in the table below are deterministic properties of how the built-in &lt;code&gt;CLAUDE.md&lt;/code&gt; path works (verbatim recall, no extra round-trip, no compression bill), not a separately measured session.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The decisions in Session 1 (so the comparison is fair):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use Redis for URL storage.&lt;/li&gt;
&lt;li&gt;Generate short codes with base62.&lt;/li&gt;
&lt;li&gt;Add a 30-day TTL.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"We could refresh the TTL on each access if we want sliding expiration."&lt;/em&gt; (Note the hedge — this is the modality test that matters.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Numbers, side by side:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Measure&lt;/th&gt;
&lt;th&gt;claude-mem (v13.2.0)&lt;/th&gt;
&lt;th&gt;bare CLAUDE.md&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Per recall-cycle tokens&lt;/td&gt;
&lt;td&gt;~6,700&lt;/td&gt;
&lt;td&gt;~280&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;↳ Passive context injection on session start&lt;/td&gt;
&lt;td&gt;~1,050&lt;/td&gt;
&lt;td&gt;~269 (full file)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;↳ &lt;code&gt;mcp-search&lt;/code&gt; retrieval round-trip&lt;/td&gt;
&lt;td&gt;~502&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;↳ Haiku 4.5 compression cost (charged to your quota)&lt;/td&gt;
&lt;td&gt;~5,150 / session&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extra round-trip for details&lt;/td&gt;
&lt;td&gt;yes (~22s)&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fidelity of recall&lt;/td&gt;
&lt;td&gt;lossy (see below)&lt;/td&gt;
&lt;td&gt;100% verbatim&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Built-in memory state&lt;/td&gt;
&lt;td&gt;disabled (&lt;code&gt;CLAUDE_CODE_DISABLE_AUTO_MEMORY=1&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;intact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression cost&lt;/td&gt;
&lt;td&gt;on user's Claude subscription&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Headless capture (&lt;code&gt;claude -p&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;zero events&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upkeep&lt;/td&gt;
&lt;td&gt;automatic&lt;/td&gt;
&lt;td&gt;manual edit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The token gap — 6,700 vs 280 — is meaningful but not the headline. The headline is the &lt;em&gt;fidelity&lt;/em&gt; row.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sharpest Failure: Compression Flattens Modality
&lt;/h2&gt;

&lt;p&gt;The four decisions written in Session 1 included three firm choices and one hedge:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"We use Redis... base62 short codes... 30-day TTL... we &lt;em&gt;could&lt;/em&gt; refresh the TTL on each access &lt;strong&gt;if we want&lt;/strong&gt; sliding expiration."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When I read the raw observation row that &lt;code&gt;claude-mem&lt;/code&gt; wrote, the four items appeared as a flat JSON array — &lt;code&gt;facts: [...]&lt;/code&gt; — with &lt;strong&gt;no modal marker&lt;/strong&gt; distinguishing the hedge from the decisions. The hedge had been flattened into the same shape as the firm choices.&lt;/p&gt;

&lt;p&gt;Session 2 confirmed it. I asked the recalling agent to describe the TTL design. It cheerfully reported "we refresh the TTL on each access for sliding expiration" — as though that had been decided. When I challenged it directly, its own reply was that it could not distinguish firm decisions from options that had merely been considered. The compressed &lt;code&gt;facts[]&lt;/code&gt; row it was reading from preserved the content of each item but not its modal status — what I'll call its &lt;em&gt;epistemic status&lt;/em&gt; throughout the rest of this post.&lt;/p&gt;

&lt;p&gt;That's the failure. &lt;strong&gt;The lossy layer loses epistemic status, not just bytes.&lt;/strong&gt; A &lt;em&gt;maybe&lt;/em&gt; becomes a &lt;em&gt;decision&lt;/em&gt;. The recalling agent has no way to know it shouldn't trust the row.&lt;/p&gt;

&lt;p&gt;This is worse than no memory. An agent with no memory has to ask, or reread, or check the code. An agent with confident-wrong memory acts. The cost of acting on a fabricated decision compounds: now there's code (or a PR, or an architectural note) committed under the false premise, and &lt;em&gt;that&lt;/em&gt; will be the next round's input.&lt;/p&gt;

&lt;p&gt;The generalization: any LLM compression step that maps "speech-act varieties" (decisions, hypotheses, questions, jokes, hedges) onto a single typed structure — like a &lt;code&gt;facts[]&lt;/code&gt; array — loses the modal axis. To preserve it, you'd need to compress into a &lt;em&gt;richer&lt;/em&gt; schema (with &lt;code&gt;kind: 'decision' | 'option' | 'question'&lt;/code&gt; per item), and you'd need the compression model to reliably tag the modality. Haiku 4.5 didn't tag it. Whether a more careful prompt or schema would is an open question, but it's a &lt;em&gt;design&lt;/em&gt; question the current tool doesn't even pose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Six Measured Findings, Versioned to v13.2.0
&lt;/h2&gt;

&lt;p&gt;In one place, six things I measured. Versioned because tool behavior changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Replaces, not augments.&lt;/strong&gt; Install sets &lt;code&gt;CLAUDE_CODE_DISABLE_AUTO_MEMORY=1&lt;/code&gt;. Built-in memory is turned off. Default deployment is single-system, not hybrid.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partial sandbox.&lt;/strong&gt; &lt;code&gt;CLAUDE_CONFIG_DIR&lt;/code&gt; redirects plugin config, but the data store path &lt;code&gt;~/.claude-mem&lt;/code&gt; is hardcoded. Multi-tenant isolation is incomplete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression runs on your subscription.&lt;/strong&gt; Haiku 4.5 compresses observations to ~5,150 tokens per session, billed to &lt;em&gt;your&lt;/em&gt; Anthropic quota. Free tools that consume your paid quota deserve a footnote.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invisible to headless mode.&lt;/strong&gt; &lt;code&gt;claude -p&lt;/code&gt; runs (non-interactive) emit &lt;em&gt;zero&lt;/em&gt; capture events in my tests. The lifecycle hooks fire only in interactive sessions. CI users and automation pipelines get no memory at all.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression flattens modality&lt;/strong&gt; (the sharpest finding, detailed above). A hedge becomes a flat fact, indistinguishable from a firm decision in the compressed &lt;code&gt;facts[]&lt;/code&gt; schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token economics lose on small projects.&lt;/strong&gt; ~6,700 tokens per recall cycle (passive inject + mcp-search round-trip + Haiku compression) versus ~280 deterministic, 100%-faithful tokens for the CLAUDE.md baseline. On a 1,000-line project, the per-token cost gap is wider than any retrieval benefit it provides.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Tool-version drift is real — by the time you read this, some of these may have been fixed. The cache-coherence framing in the next section is version-independent and was the actual reason I wrote the post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is a Cache Problem, Precisely
&lt;/h2&gt;

&lt;p&gt;The distributed-systems vocabulary for this design is &lt;em&gt;materialized view of a source&lt;/em&gt;, &lt;em&gt;served from a similarity cache&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Write-time lossy transform&lt;/strong&gt; = a materialized view that can drift from the source of truth (the actual codebase, the actual decisions). The source is the user's intent and the live code; the view is the compressed facts/narrative. Each write step can lose information that the view will never recover.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-time ANN&lt;/strong&gt; = approximate retrieval. Top-K nearest neighbors. False positives are structural, not a bug — a sufficiently-similar wrong row will be returned with confidence indistinguishable from the right one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No coherence with the source.&lt;/strong&gt; Classical caches have &lt;strong&gt;invalidation hooks&lt;/strong&gt; — write-through, write-back, snoop protocols, MESI states. They tie cache lines back to the canonical source so that writes propagate and stale lines get evicted or rewritten. &lt;code&gt;claude-mem&lt;/code&gt; has &lt;em&gt;no&lt;/em&gt; tie to the codebase or to user-issued corrections. You reverse a decision in conversation, the memory still believes the original.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Staleness without expiry.&lt;/strong&gt; Even without explicit invalidation, classical caches use TTLs to bound staleness. &lt;code&gt;claude-mem&lt;/code&gt; has no TTL on facts. A fact written six months ago competes for retrieval with one written yesterday, and the older one might win the vector hop.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the cache-coherence frame is the right frame, the literature is rich and useful. Pat Helland's &lt;em&gt;Immutability Changes Everything&lt;/em&gt; (ACM Queue, 2015) and the broader databases-and-OS literature on cache-coherence protocols (MESI / MOESI), materialized-view invalidation, and write-through vs write-back are the right starting reading. The trades they describe — staleness vs cost, eventual vs strong coherence, when to flush, when to invalidate — are the &lt;em&gt;same trades&lt;/em&gt; the agent-memory community is rediscovering with fresh names.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBzcmNbIlNvdXJjZSBvZiB0cnV0aCJdCiAgICAgICAgczFbIlVzZXIgaW50ZW50PGJyLz5MaXZlIGNvZGU8YnIvPkNvbnZlcnNhdGlvbiJdCiAgICBlbmQKICAgIHN1YmdyYXBoIHdyaXRlWyJXcml0ZSBwYXRoIChsb3NzeSkiXQogICAgICAgIHcxWyJIYWlrdSA0LjU8YnIvPmNvbXByZXNzaW9uIl0KICAgICAgICB3MlsiZmFjdHNbXSDCtyBuYXJyYXRpdmU8YnIvPihubyBtb2RhbGl0eSwgbm8gcHJvdmVuYW5jZSkiXQogICAgZW5kCiAgICBzdWJncmFwaCBzdG9yZVsiTWF0ZXJpYWxpemVkIHZpZXciXQogICAgICAgIHN0MVsiU1FMaXRlIEZUUzUiXQogICAgICAgIHN0MlsiQ2hyb21hIHZlY3RvcnMiXQogICAgZW5kCiAgICBzdWJncmFwaCByZWFkWyJSZWFkIHBhdGggKGFwcHJveGltYXRlKSJdCiAgICAgICAgcjFbInF1ZXJ5IHZlY3Rvcjxici8-KyBrZXl3b3JkIl0KICAgICAgICByMlsidG9wLUsgaGl0cyJdCiAgICBlbmQKCiAgICBzMSAtLT58aG9vayBjYXB0dXJlc3wgdzEKICAgIHcxIC0tPiB3MgogICAgdzIgLS0-IHN0MQogICAgdzIgLS0-IHN0MgogICAgc3QxIC0tPiByMgogICAgc3QyIC0tPiByMgogICAgcjEgLS0-IHIyCiAgICByMiAtLT58ImluamVjdGVkIGFzIGZhY3QifCBzMQoKICAgIGNsYXNzRGVmIHNyYyBmaWxsOiNlNmZmZmEsc3Ryb2tlOiMzMTk3OTUKICAgIGNsYXNzRGVmIGxvc3N5IGZpbGw6I2ZlZDdkNyxzdHJva2U6I2M1MzAzMAogICAgY2xhc3NEZWYgc3RvcmUgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiByZWFkIGZpbGw6I2ViZjRmZixzdHJva2U6IzVhNjdkOAogICAgY2xhc3MgczEgc3JjCiAgICBjbGFzcyB3MSx3MiBsb3NzeQogICAgY2xhc3Mgc3QxLHN0MiBzdG9yZQogICAgY2xhc3MgcjEscjIgcmVhZA%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBzcmNbIlNvdXJjZSBvZiB0cnV0aCJdCiAgICAgICAgczFbIlVzZXIgaW50ZW50PGJyLz5MaXZlIGNvZGU8YnIvPkNvbnZlcnNhdGlvbiJdCiAgICBlbmQKICAgIHN1YmdyYXBoIHdyaXRlWyJXcml0ZSBwYXRoIChsb3NzeSkiXQogICAgICAgIHcxWyJIYWlrdSA0LjU8YnIvPmNvbXByZXNzaW9uIl0KICAgICAgICB3MlsiZmFjdHNbXSDCtyBuYXJyYXRpdmU8YnIvPihubyBtb2RhbGl0eSwgbm8gcHJvdmVuYW5jZSkiXQogICAgZW5kCiAgICBzdWJncmFwaCBzdG9yZVsiTWF0ZXJpYWxpemVkIHZpZXciXQogICAgICAgIHN0MVsiU1FMaXRlIEZUUzUiXQogICAgICAgIHN0MlsiQ2hyb21hIHZlY3RvcnMiXQogICAgZW5kCiAgICBzdWJncmFwaCByZWFkWyJSZWFkIHBhdGggKGFwcHJveGltYXRlKSJdCiAgICAgICAgcjFbInF1ZXJ5IHZlY3Rvcjxici8-KyBrZXl3b3JkIl0KICAgICAgICByMlsidG9wLUsgaGl0cyJdCiAgICBlbmQKCiAgICBzMSAtLT58aG9vayBjYXB0dXJlc3wgdzEKICAgIHcxIC0tPiB3MgogICAgdzIgLS0-IHN0MQogICAgdzIgLS0-IHN0MgogICAgc3QxIC0tPiByMgogICAgc3QyIC0tPiByMgogICAgcjEgLS0-IHIyCiAgICByMiAtLT58ImluamVjdGVkIGFzIGZhY3QifCBzMQoKICAgIGNsYXNzRGVmIHNyYyBmaWxsOiNlNmZmZmEsc3Ryb2tlOiMzMTk3OTUKICAgIGNsYXNzRGVmIGxvc3N5IGZpbGw6I2ZlZDdkNyxzdHJva2U6I2M1MzAzMAogICAgY2xhc3NEZWYgc3RvcmUgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiByZWFkIGZpbGw6I2ViZjRmZixzdHJva2U6IzVhNjdkOAogICAgY2xhc3MgczEgc3JjCiAgICBjbGFzcyB3MSx3MiBsb3NzeQogICAgY2xhc3Mgc3QxLHN0MiBzdG9yZQogICAgY2xhc3MgcjEscjIgcmVhZA%3D%3D" alt="flowchart LR" width="1357" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What's missing from this diagram, and crucially: a backedge from "source of truth" to the materialized view that fires when the source changes. That's the &lt;strong&gt;invalidation arrow&lt;/strong&gt;. Its absence is the structural reason &lt;code&gt;claude-mem&lt;/code&gt; gets wrong-row retrieval on decisions the user has reversed. Until something supplies that arrow, the system is best understood as a write-only cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Each Wins
&lt;/h2&gt;

&lt;p&gt;Cost-curve thinking (the same frame used in the &lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;companion piece&lt;/a&gt;) gives a clean answer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lossless + Exact wins when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Project size is small or scope is clear.&lt;/strong&gt; Curation is cheap; the manual upkeep budget is small.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fidelity matters.&lt;/strong&gt; You need to recall the &lt;em&gt;exact&lt;/em&gt; decision, not a vibe of it. Coding agents, design decisions, security policy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The author exists and is engaged.&lt;/strong&gt; Someone is willing to write three lines into &lt;code&gt;MEMORY.md&lt;/code&gt; when a decision is made.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lossy + Approximate wins when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;History is too big to hand-curate.&lt;/strong&gt; A year of conversations across multiple contributors, none of whom can be expected to maintain a &lt;code&gt;MEMORY.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coverage matters more than precision.&lt;/strong&gt; You'd rather have a fuzzy memory that something was discussed than no memory at all. Customer-support agents over a year of tickets; team retrospectives over a quarter of standups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The cost of acting on a fabricated fact is low.&lt;/strong&gt; A confident-wrong recall in a support agent says "sorry let me check"; the user corrects it. The same recall in a coding agent ships broken code to production. The blast radius of a false positive determines the budget for accepting one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rule of thumb: &lt;strong&gt;fuzzy-but-present beats precise-but-absent&lt;/strong&gt;, but only when the false-positive cost is low enough to absorb. For coding work on a 5,000-LOC project, it isn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Decision Framework
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Lossless + Exact&lt;/th&gt;
&lt;th&gt;Lossy + Approximate&lt;/th&gt;
&lt;th&gt;Hybrid&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single project, &amp;lt; 50k LOC&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-project / multi-year history&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decisions need exact recall&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vague-but-present recall is acceptable&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Author is engaged (willing to curate)&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No human curator available&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost of confident-wrong is high (production code, money)&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost of confident-wrong is low (suggestion, search)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You can pay the Haiku compression bill from your quota&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You operate headlessly or via CI&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For most readers of this blog — engineers working on a single non-trivial codebase, where decisions matter and confident-wrong is expensive — the columns lean hard left.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Coherent Agent Memory System Would Need
&lt;/h2&gt;

&lt;p&gt;The interesting question, once you accept the cache-coherence frame, is: &lt;em&gt;what would the lossy + approximate corner look like if it were built like a real cache instead of a write-only one?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A short list of capabilities the current generation of "AI memory" tools is missing, and which any serious system in this space will eventually have to ship:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Source pointer (provenance).&lt;/strong&gt; Every fact carries a back-pointer to the originating event: timestamp, session ID, the raw transcript turn or tool result it was derived from. Without this, you can't audit a wrong recall — you only see the fact, never its lineage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modality tagging.&lt;/strong&gt; Every fact tagged with epistemic status — &lt;code&gt;decision | option_considered | hypothesis | question | observation&lt;/code&gt; — at write time, by the compression model. Without this, the system loses what the failure section above showed: the difference between &lt;em&gt;we will&lt;/em&gt; and &lt;em&gt;we could&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supersedes / invalidation chain.&lt;/strong&gt; A later fact can declare an earlier fact superseded ("decision A was reversed on date T by B"). Recall surfaces the &lt;em&gt;latest applicable&lt;/em&gt; fact, not the most semantically similar one. This is the in-band invalidation classical caches use; agent memory currently has none.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expiry / TTL by class.&lt;/strong&gt; Decisions might be permanent; observations rot fast ("the build was passing this morning" should not influence behavior at 4 PM). Different fact classes get different TTLs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invalidation hook tied to the source of truth.&lt;/strong&gt; When the underlying codebase or document changes in a way that contradicts a stored fact, the fact gets flagged for re-validation. This is the &lt;em&gt;write-through&lt;/em&gt; arrow in the cache diagram earlier — currently absent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence surfaced to the caller.&lt;/strong&gt; Instead of returning a flat string, return &lt;code&gt;{value, confidence, provenance}&lt;/code&gt;. The recalling agent then knows when to trust, when to double-check, when to ignore.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of these is novel. All of them are standard in production cache systems, query planners, and event-sourcing stores. They're hard to retrofit onto a system that wasn't designed with provenance and invalidation as first-class concerns. They're not hard to design in from the start — but doing so means giving up the "just drop a hook on everything, ship next week" simplicity that makes the current crop of tools accumulate stars.&lt;/p&gt;

&lt;p&gt;If you're building an agent that has to &lt;em&gt;act&lt;/em&gt; on its memory rather than just &lt;em&gt;display&lt;/em&gt; it, this list is the spec.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing — "Persistent Memory" Is a Trade, Not a Feature
&lt;/h2&gt;

&lt;p&gt;"Persistent AI memory" gets talked about like a feature you turn on. It isn't. It's a choice on the &lt;em&gt;two-axis&lt;/em&gt; design space above, and every position on that space has known failure modes. The lossless-and-exact corner has the upkeep cost and the coverage limit. The lossy-and-approximate corner has the staleness, the wrong-row retrieval, and — the finding I came away most surprised by — the loss of modality.&lt;/p&gt;

&lt;p&gt;Two takeaways worth carrying out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;If someone is selling you "automatic AI memory," ask which quadrant.&lt;/strong&gt; If the answer is &lt;em&gt;lossy + approximate&lt;/em&gt;, ask the six questions from the section above: provenance, modality, supersedes, TTL, invalidation, confidence. If the answer to most of them is "the embeddings handle it," you're being sold a cache wearing the word &lt;em&gt;memory&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The classical literature is the right starting point.&lt;/strong&gt; Cache coherence, write-back vs write-through, eventual vs strong consistency, materialized-view invalidation — the database and OS communities have spent forty years working through these tradeoffs. Reading their writing is more useful than reading the latest agent-memory thread.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The companion to this post — &lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;&lt;em&gt;Agent Retrieval Is a Cost Curve Problem&lt;/em&gt;&lt;/a&gt; — argues a parallel thing about &lt;em&gt;within-session&lt;/em&gt; code retrieval: that Claude Code's "use grep, not RAG" choice isn't romance ("trust the model") but math (cost curves), and that the source code shows Anthropic A/B-testing alternative retrieval architectures (Explore vs Fork) in production. Read together, the two pieces add up to a coherent stance about Anthropic's bets across the &lt;em&gt;fidelity × retrieval&lt;/em&gt; design space: in both within-session code search and cross-session memory, the default is &lt;strong&gt;lossless + exact&lt;/strong&gt;, and the alternative branches are kept gated behind feature flags so the decisions can flip when the cost curves do. The memory side alone has at least four such gates visible in the snapshot I reviewed — &lt;code&gt;tengu_coral_fern&lt;/code&gt;, &lt;code&gt;tengu_herring_clock&lt;/code&gt;, &lt;code&gt;tengu_passport_quail&lt;/code&gt;, &lt;code&gt;tengu_slate_thimble&lt;/code&gt; — plus the build-time &lt;code&gt;KAIROS&lt;/code&gt;, &lt;code&gt;TEAMMEM&lt;/code&gt;, and &lt;code&gt;EXTRACT_MEMORIES&lt;/code&gt; gates in &lt;code&gt;src/memdir/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That stance is not "we trust the model." It's: read the cost curves, build for the curves' current shape, leave the toggles in for when they shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  Appendix: How I Measured This
&lt;/h2&gt;

&lt;p&gt;For the reader who wants to reproduce — or, more usefully, who wants to know exactly what was and wasn't measured.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Versions and environment.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;claude-mem&lt;/code&gt; v13.2.0 (npm install via the project's standard install script)&lt;/li&gt;
&lt;li&gt;Claude Code: current public release at time of test (2026-05-20)&lt;/li&gt;
&lt;li&gt;macOS 25.4.0, zsh&lt;/li&gt;
&lt;li&gt;Sandbox: &lt;code&gt;CLAUDE_CONFIG_DIR=/tmp/cmem-test/dot-claude&lt;/code&gt; for the plugin config; data store at &lt;code&gt;~/.claude-mem&lt;/code&gt; (this directory location is hardcoded inside the tool — see Finding #2)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Test project.&lt;/strong&gt;&lt;br&gt;
A small URL-shortener spec written from scratch, with four decisions in Session 1: (1) Redis for URL storage, (2) base62 short-code generation, (3) 30-day TTL, (4) the hedged sliding-expiration option that became the modality test.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Commands and protocol.&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fresh &lt;code&gt;claude-mem&lt;/code&gt; install into the sandboxed &lt;code&gt;CLAUDE_CONFIG_DIR&lt;/code&gt;. Verified install behavior set &lt;code&gt;CLAUDE_CODE_DISABLE_AUTO_MEMORY=1&lt;/code&gt; in &lt;code&gt;settings.json&lt;/code&gt; (Finding #1).&lt;/li&gt;
&lt;li&gt;Session 1: interactive Claude Code session, walked through the four decisions with the tool actively capturing via lifecycle hooks. Watched the Bun worker on &lt;code&gt;localhost:37701&lt;/code&gt; accept events.&lt;/li&gt;
&lt;li&gt;Session ended; session-boundary compression fired; Haiku-compressed &lt;code&gt;facts[]&lt;/code&gt; and &lt;code&gt;narrative&lt;/code&gt; written to SQLite-FTS5 + Chroma. Token count for the compression call read from the API trace.&lt;/li&gt;
&lt;li&gt;Session 2: fresh interactive session; queried for each of the four decisions; observed the recall path (mcp-search round-trip; ~22s extra latency).&lt;/li&gt;
&lt;li&gt;Compared retrieved content against the original Session 1 transcript byte-by-byte to identify the modality flattening (Finding #5).&lt;/li&gt;
&lt;li&gt;Repeated the install + Session 1 pattern in headless &lt;code&gt;claude -p&lt;/code&gt; mode to confirm Finding #4 (no events captured).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;What I'm explicitly &lt;em&gt;not&lt;/em&gt; claiming.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This is a single test run on a deliberately small project. The N is 1.&lt;/li&gt;
&lt;li&gt;The CLAUDE.md baseline column in the test table is &lt;em&gt;not&lt;/em&gt; a separately measured comparison session. It reflects deterministic properties of the built-in &lt;code&gt;CLAUDE.md&lt;/code&gt; path (verbatim recall, no compression bill, no extra round-trip) that follow from the design — not a measured outcome.&lt;/li&gt;
&lt;li&gt;I didn't benchmark Chroma vector recall quality across many queries. The modality finding came from a single targeted probe (the TTL question); the cache-coherence framing predicts the same class of failure across many queries, but predicting and measuring are different.&lt;/li&gt;
&lt;li&gt;I tested v13.2.0. The tool is actively developed; specific findings may have been addressed in later releases by the time you read this. The cache-coherence framing is version-independent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Artifacts.&lt;/strong&gt; Session 1 / Session 2 transcripts, the raw &lt;code&gt;claude-mem&lt;/code&gt; SQLite + Chroma snapshots, and the token-cost API traces from the test run are kept locally and can be made available on reasonable request — get in touch.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Companion piece: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"&gt;Agent Retrieval Is a Cost Curve Problem: Why Claude Code Doesn't Use RAG&lt;/a&gt;***&lt;br&gt;
&lt;em&gt;Background: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/consistency-scenarios-and-approaches-production/" rel="noopener noreferrer"&gt;Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works&lt;/a&gt;***&lt;br&gt;
&lt;em&gt;For the design rationale behind Claude Code's built-in memory in particular: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/claude-code-memory-first-principles-tradeoffs/" rel="noopener noreferrer"&gt;Claude Code Deep Dive Part 4: Why It Uses Markdown Files Instead of Vector DBs&lt;/a&gt;***&lt;/p&gt;

</description>
      <category>ai</category>
      <category>memory</category>
      <category>claudecode</category>
      <category>programming</category>
    </item>
    <item>
      <title>Agent Retrieval Is a Cost Curve Problem: Why Claude Code Doesn't Use RAG</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Tue, 26 May 2026 03:12:20 +0000</pubDate>
      <link>https://dev.to/harrisonsec/agent-retrieval-is-a-cost-curve-problem-why-claude-code-doesnt-use-rag-5c6m</link>
      <guid>https://dev.to/harrisonsec/agent-retrieval-is-a-cost-curve-problem-why-claude-code-doesnt-use-rag-5c6m</guid>
      <description>&lt;p&gt;There's a popular interview question making the rounds: &lt;em&gt;"Why doesn't Claude Code use RAG to retrieve code? Why grep?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The popular answer goes: chunking breaks code structure, vectors approximate when code demands exact, indexes go stale, cold-start is slow, retrieval is a black box. All five are real. None of them are the reason.&lt;/p&gt;

&lt;p&gt;They're symptoms. The reason is older than RAG, older than LLMs, older than the term &lt;em&gt;retrieval&lt;/em&gt;. It's a &lt;strong&gt;cost curve&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Index-based retrieval pays a high build cost plus a &lt;em&gt;nonlinear&lt;/em&gt; maintenance cost in churn × index complexity. LLM tool-loop retrieval pays nothing up front and a per-query cost that's roughly project-size-independent for queries an LLM actually issues. For most small-to-mid-size repos the crossover is never reached. The "Anthropic trusts the model" framing is romantic; the actual answer is colder — build cost zero, per-query cost amortizes faster than index drift, so the math says grep.&lt;/p&gt;

&lt;p&gt;There's also a precision axis, which most engineers care about more than cost. Vector RAG is approximate by design — &lt;code&gt;getUserById&lt;/code&gt; returns alongside &lt;code&gt;getUserByEmail&lt;/code&gt; because they're semantically adjacent. Code usually wants &lt;em&gt;exact&lt;/em&gt;, which grep gives you for free. &lt;em&gt;Symbol-graph&lt;/em&gt; indexes (Sourcegraph, Kythe, LSP) are precision-first but haven't become the LLM companion either — covered below.&lt;/p&gt;

&lt;p&gt;Audited against a &lt;a href="https://harrisonsec.com/blog/claude-code-source-leaked-hidden-features/" rel="noopener noreferrer"&gt;publicly circulated build snapshot&lt;/a&gt; of Claude Code with file:line citations. The kicker: the Explore subagent's "use this when you'll need more than 3 queries" rule is gated behind a feature flag (&lt;code&gt;tengu_amber_stoat&lt;/code&gt;) and being A/B-tested against a parallel architecture (Fork). The canonical answer is conditional. &lt;em&gt;That&lt;/em&gt; is the answer that gets you the offer.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Frame That Matters: Total Cost Over Time
&lt;/h2&gt;

&lt;p&gt;Pick any retrieval system and you pay for three things, on different schedules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Build cost&lt;/strong&gt; — one-time work to assemble whatever structure makes lookup fast. For an index, this is chunking + embedding + insert. For tool-loops, this is zero.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain cost&lt;/strong&gt; — ongoing work to keep the structure honest as the underlying data changes. For an index, this is invalidation, reindex, drift reconciliation. For tool-loops, this is also zero — the "structure" is the live filesystem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-query cost&lt;/strong&gt; — work done when a question arrives. For an index, this is a vector search + a few reranks + an LLM call. For tool-loops, this is N LLM-tool round-trips, where N varies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The temptation is to compare per-query cost: "vector search is one round-trip, tool-loop is six." That's why RAG looks dominant on a whiteboard. But you ship a system, not a whiteboard. The bill is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;total_cost = build_cost + maintain_cost × time + per_query_cost × queries
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a project that changes daily and gets queried hourly, the term that actually grows is &lt;code&gt;maintain_cost × time&lt;/code&gt;. For index-based retrieval on a churning codebase, maintain cost grows at least linearly with churn — and &lt;em&gt;can&lt;/em&gt; grow faster than teams expect, because cross-chunk and cross-file references force cascading re-embeddings and symbol-graph consistency checks. A naive incremental indexer is linear; a correct one tracking cross-file refactors is often worse than linear in the worst case. For tool-loops, maintain cost is identically zero, because the loop has no persistent structure.&lt;/p&gt;

&lt;p&gt;The build/maintain term dominates anything you save on per-query cost, until your project is large enough that per-query cost itself becomes the bottleneck. For most small-to-mid-size repos, the crossover is never reached.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBzbWFsbFsiTW9zdCBwcm9qZWN0cyDCtyBncmVwIHdpbnMiXQogICAgICAgIHMxWyJidWlsZDogMDxici8-bWFpbnRhaW46IDA8YnIvPnF1ZXJ5OiB-TiDDlyB0b29sIHJvdW5kLXRyaXAiXQogICAgZW5kCiAgICBzdWJncmFwaCBjcm9zc292ZXJbIkNyb3Nzb3ZlciDCtyB3b3JrbG9hZC1kZXBlbmRlbnQiXQogICAgICAgIGMxWyJpbmRleCBidWlsZCBjb3N0PGJyLz5hbW9ydGl6ZXMgd2hlbjxici8-cXVlcnkgcmF0ZSDDlyBwcmVjaXNpb24gc2F2aW5nczxici8-ZXhjZWVkcyBtYWludGFpbiBjb3N0Il0KICAgIGVuZAogICAgc3ViZ3JhcGggaHVnZVsiTWVnYS1tb25vcmVwbyDCtyBpbmRleCB3aW5zIl0KICAgICAgICBoMVsiYnVpbGQ6IGhpZ2g8YnIvPm1haW50YWluOiBoaWdoIGJ1dCBhbW9ydGl6ZWQ8YnIvPmFjcm9zcyBtaWxsaW9ucyBvZiBxdWVyaWVzPGJyLz5xdWVyeTogc2luZ2xlIHZlY3RvciBob3AiXQogICAgZW5kCgogICAgc21hbGwgLS0-IGNyb3Nzb3ZlciAtLT4gaHVnZQoKICAgIGNsYXNzRGVmIHNtYWxsIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3NEZWYgY3Jvc3MgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiBiaWcgZmlsbDojZmVkN2Q3LHN0cm9rZTojYzUzMDMwCiAgICBjbGFzcyBzbWFsbCBzbWFsbAogICAgY2xhc3MgY3Jvc3NvdmVyIGNyb3NzCiAgICBjbGFzcyBodWdlIGJpZw%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBzbWFsbFsiTW9zdCBwcm9qZWN0cyDCtyBncmVwIHdpbnMiXQogICAgICAgIHMxWyJidWlsZDogMDxici8-bWFpbnRhaW46IDA8YnIvPnF1ZXJ5OiB-TiDDlyB0b29sIHJvdW5kLXRyaXAiXQogICAgZW5kCiAgICBzdWJncmFwaCBjcm9zc292ZXJbIkNyb3Nzb3ZlciDCtyB3b3JrbG9hZC1kZXBlbmRlbnQiXQogICAgICAgIGMxWyJpbmRleCBidWlsZCBjb3N0PGJyLz5hbW9ydGl6ZXMgd2hlbjxici8-cXVlcnkgcmF0ZSDDlyBwcmVjaXNpb24gc2F2aW5nczxici8-ZXhjZWVkcyBtYWludGFpbiBjb3N0Il0KICAgIGVuZAogICAgc3ViZ3JhcGggaHVnZVsiTWVnYS1tb25vcmVwbyDCtyBpbmRleCB3aW5zIl0KICAgICAgICBoMVsiYnVpbGQ6IGhpZ2g8YnIvPm1haW50YWluOiBoaWdoIGJ1dCBhbW9ydGl6ZWQ8YnIvPmFjcm9zcyBtaWxsaW9ucyBvZiBxdWVyaWVzPGJyLz5xdWVyeTogc2luZ2xlIHZlY3RvciBob3AiXQogICAgZW5kCgogICAgc21hbGwgLS0-IGNyb3Nzb3ZlciAtLT4gaHVnZQoKICAgIGNsYXNzRGVmIHNtYWxsIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3NEZWYgY3Jvc3MgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiBiaWcgZmlsbDojZmVkN2Q3LHN0cm9rZTojYzUzMDMwCiAgICBjbGFzcyBzbWFsbCBzbWFsbAogICAgY2xhc3MgY3Jvc3NvdmVyIGNyb3NzCiAgICBjbGFzcyBodWdlIGJpZw%3D%3D" alt="flowchart LR" width="1093" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's the whole argument. The rest of this post is evidence — what the cost-curve choice looks like in source code, and where Anthropic is hedging.&lt;/p&gt;

&lt;p&gt;A teaser for the punchline two sections down: the canonical "Claude Code spawns an Explore subagent for open-ended search" rule that most explainers quote is &lt;strong&gt;gated behind a feature flag&lt;/strong&gt; (&lt;code&gt;tengu_amber_stoat&lt;/code&gt;) and &lt;strong&gt;A/B-tested in production against a second architecture (Fork)&lt;/strong&gt; that takes the opposite trade. Anthropic is hedging on the retrieval design itself. We'll come back to this in &lt;em&gt;"The Subagent Twist Nobody Quotes Correctly."&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Popular Answer, Charitably
&lt;/h2&gt;

&lt;p&gt;Before deflating it: the popular answer isn't wrong, it's just downstream. Briefly, with the steel-manned version:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chunking breaks structure.&lt;/strong&gt; A function split across chunks loses both halves of an &lt;code&gt;if/else&lt;/code&gt;, and call-graph relationships fragment between chunks. AST-aware chunkers exist; they're better, not solved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vectors approximate.&lt;/strong&gt; &lt;code&gt;getUserById&lt;/code&gt; returns alongside &lt;code&gt;getUserByEmail&lt;/code&gt; and &lt;code&gt;getUserByName&lt;/code&gt; because they're semantically adjacent. Exact-symbol search beats this trivially. &lt;em&gt;Important distinction:&lt;/em&gt; this is true of &lt;strong&gt;vector RAG&lt;/strong&gt; specifically. &lt;strong&gt;Symbol-graph indexes&lt;/strong&gt; — Sourcegraph, Kythe, Glean, LSP-backed code search — are a different category: they index by function/class/reference, not by chunked vector. They give exact answers and are what mega-monorepos at Google and Meta actually run for code search. When this post says "RAG," it means vector RAG. The cost-curve argument applies to vector RAG; symbol-graph has its own cost curve and lives higher up the scale spectrum. It has its own separate reason for not becoming the LLM companion — covered in the next section.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexes go stale.&lt;/strong&gt; Every commit invalidates some subset of chunks. Incremental update has edge cases (renames, file moves, cross-file rename refactors). Full reindex is expensive enough to discourage frequent commits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cold start.&lt;/strong&gt; Minutes-to-first-query is a non-starter for "open the tool, start working" UX.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Black-box recall.&lt;/strong&gt; Top-K vector hits are not human-auditable. When the LLM returns a wrong answer, you can't tell whether retrieval failed or reasoning did.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All five are pains. Each has a counter — better chunkers, hybrid retrieval, incremental indexers, warm pools, attribution layers. The counters cost engineering. Some teams spend the engineering and ship working systems. Anthropic looked at the bill and decided not to.&lt;/p&gt;

&lt;p&gt;Why? Because the &lt;em&gt;baseline&lt;/em&gt; for code retrieval — grep over a clean filesystem with an LLM in the loop — already works well enough that adding an index is paying engineering cost to solve symptoms whose root cause is the index itself. Removing the index removes the pains. The remaining cost is per-query LLM round-trips, which the cost-curve frame says is acceptable below the crossover.&lt;/p&gt;

&lt;p&gt;That's why grep. Everything else is engineering details on top of that decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  So Why Hasn't Symbol-Graph Become the LLM-Companion Either?
&lt;/h2&gt;

&lt;p&gt;If symbol-graph indexes are precision-first, language-aware, and battle-tested at FAANG scale, the natural question is: why didn't &lt;em&gt;they&lt;/em&gt; become the default companion to LLM coding agents? Why grep, not LSP-over-MCP?&lt;/p&gt;

&lt;p&gt;The answer is the same shape as the vector-RAG answer — high friction in places that don't show up on a feature comparison — but the specific frictions are different:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build cost is high in a different way.&lt;/strong&gt; Symbol-graph indexes need to compile (or semi-compile) the project to resolve symbols. For Rust, C++, large TypeScript or Java codebases this is minutes to tens of minutes per cold start. "Open Claude Code and start working" can't pay that toll.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language-specific, not portable.&lt;/strong&gt; LSP is one server per language. Tree-sitter coverage helps but isn't uniform. A grep-backed agent works on any text in any language with zero setup; a symbol-graph-backed agent inherits the project's language-server matrix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API/format mismatch with how LLMs reason.&lt;/strong&gt; LSP returns deeply nested JSON (locations, ranges, document hierarchies); grep returns &lt;code&gt;file:line: content&lt;/code&gt;. The second is almost literally an LLM's native dialect; the first needs adapting. The translation tax is real.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coverage is narrower than it looks.&lt;/strong&gt; Symbol-graph models code-as-structure. It misses config files, comments, strings, generated code, markdown, env files, shell scripts, the README — all of which are first-class context for a real coding session. Grep covers anything that's text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The win is for structure questions, not intent questions.&lt;/strong&gt; "Where is &lt;code&gt;getUserById&lt;/code&gt; defined?" — symbol-graph is exactly right. "How does the login flow work?" — back to grep + read. A real coding day has both kinds; building infrastructure that only solves one kind is paying a high fixed cost for half the answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The constraint reversed under it.&lt;/strong&gt; Symbol-graph was designed for a world where the constraint was &lt;em&gt;human attention bandwidth&lt;/em&gt; — give a developer one precise answer they can read. LLMs don't have that constraint; they can read 30 grep hits cheaply and reason across them. The bottleneck moved from "precision of retrieval" to "fluency of the model reading the retrieval." Symbol-graph is optimizing the part that's no longer expensive.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One-line summary: &lt;strong&gt;symbol-graph is precision tooling built for human IDEs. LLMs are not human IDEs.&lt;/strong&gt; Their retrieval bottleneck is different — they prefer many cheap rounds over one expensive precise call. Installing a symbol-graph for an agent that can grep thirty times in a session is, roughly, hiring a second driver for someone who already drives.&lt;/p&gt;

&lt;p&gt;This is also why the few existing LLM ↔ symbol-graph integrations (Cursor's &lt;code&gt;@symbol&lt;/code&gt; references via LSP, Sourcegraph Cody, Codeium with LSP backend) are &lt;em&gt;additive niceties&lt;/em&gt; in those products, not the retrieval backbone. The backbone is still the same as Claude Code's — grep over text.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Primitives, Audited
&lt;/h2&gt;

&lt;p&gt;Source paths below are from a publicly circulated, non-public build snapshot of Claude Code that I have on disk for analysis purposes. APIs and exact line numbers will drift; the design choices below have been stable in the snapshot I reviewed, and match observed runtime behavior on current public Claude Code releases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grep — ripgrep, with structured output and a "don't shell out" enforcement clause
&lt;/h3&gt;

&lt;p&gt;The Grep tool description, from &lt;code&gt;src/tools/GrepTool/prompt.ts:7-16&lt;/code&gt;, is short and pointed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A powerful search tool built on ripgrep

  Usage:
  - ALWAYS use Grep for search tasks. NEVER invoke `grep` or `rg` as a Bash command. The Grep tool has been optimized for correct permissions and access.
  - Supports full regex syntax (e.g., "log.*Error", "function\s+\w+")
  - Filter files with glob parameter (e.g., "*.js", "**/*.tsx") or type parameter (e.g., "js", "py", "rust")
  - Output modes: "content" shows matching lines, "files_with_matches" shows only file paths (default), "count" shows match counts
  - Use Agent tool for open-ended searches requiring multiple rounds
  - Pattern syntax: Uses ripgrep (not grep) - literal braces need escaping (use `interface\{\}` to find `interface{}` in Go code)
  - Multiline matching: By default patterns match within single lines only. For cross-line patterns like `struct \{[\s\S]*?field`, use `multiline: true`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things to read out of this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The &lt;code&gt;ALWAYS / NEVER&lt;/code&gt; is doing work.&lt;/strong&gt; The model has Bash. It could shell out to &lt;code&gt;rg&lt;/code&gt; or &lt;code&gt;grep&lt;/code&gt; directly. The prompt forbids it. Why? Three reasons, in order of importance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Permission surface.&lt;/strong&gt; Bash is a universal tool. Auditing what the model can do with Bash means auditing every shell command. Audited Grep means audited Grep, period.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output discipline.&lt;/strong&gt; A Bash &lt;code&gt;rg&lt;/code&gt; dumps raw text into the context window. Grep returns one of three structured modes — &lt;code&gt;content&lt;/code&gt;, &lt;code&gt;files_with_matches&lt;/code&gt;, &lt;code&gt;count&lt;/code&gt; — letting the model pick the cheapest mode that answers the question. This is a token-budget decision, not a feature decision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend swap.&lt;/strong&gt; Today the implementation is ripgrep. Tomorrow it might be &lt;code&gt;bfs&lt;/code&gt;/&lt;code&gt;ugrep&lt;/code&gt; (the source already has a branch for embedded-search tools — see &lt;code&gt;hasEmbeddedSearchTools()&lt;/code&gt; in &lt;code&gt;src/utils/embeddedTools.ts&lt;/code&gt;). A tool boundary makes the swap invisible to the model.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The default output mode is &lt;code&gt;files_with_matches&lt;/code&gt;.&lt;/strong&gt; Not &lt;code&gt;content&lt;/code&gt;. The model has to opt in to seeing actual lines. This is a token-conservation default: most of the time, the model wants to know which files matched so it can narrow further; only when it's ready to read does it ask for the lines.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multiline is opt-in.&lt;/strong&gt; Default ripgrep is line-bounded — a deliberate restriction, because cross-line regex on a large tree is a perf cliff. The model can opt into multiline when it knows it needs to, paying the cost only then.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These are all small choices. Each one shaves a tail off the per-query cost. Cumulatively, they're why the per-query term in our cost equation stays low enough that the build/maintain term — zero — dominates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Glob — filename patterns with a recency heuristic and a hard cap
&lt;/h3&gt;

&lt;p&gt;Glob is the "find files by name" primitive. Two design tells in its construction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Results are sorted by mtime descending.&lt;/strong&gt; Most-recently-modified file first. The heuristic is that in any given session, the files you've touched recently are the files you're about to touch again. This is the same logic IDEs use for the "Recent Files" list, and it's empirically right far more than it's wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A hard cap of 100 results.&lt;/strong&gt; Past that, output truncates. The model can tighten the pattern and re-call. The cap exists because an LLM that consumes 800 file paths because it asked for &lt;code&gt;**/*&lt;/code&gt; is an LLM that's burned a quarter of its context on noise.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both are token-budget decisions disguised as ergonomic ones. The mtime sort means a small N typically covers the relevant set. The 100-file cap means a careless query degrades gracefully instead of catastrophically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read — bounded, fresh, stat-checked
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;src/tools/FileReadTool/prompt.ts&lt;/code&gt; is the most interesting of the three because of what it constrains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- By default, it reads up to 2000 lines starting from the beginning of the file (...)
- When you already know which part of the file you need, only read that part. This can be important for larger files.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(&lt;code&gt;MAX_LINES_TO_READ = 2000&lt;/code&gt; at line 10; the "only read that part" guidance is &lt;code&gt;OFFSET_INSTRUCTION_TARGETED&lt;/code&gt; at line 20-21, swapped in dynamically based on context.)&lt;/p&gt;

&lt;p&gt;A 2000-line default cap. &lt;code&gt;offset&lt;/code&gt; and &lt;code&gt;limit&lt;/code&gt; parameters for targeted reads. And — critically — every read calls &lt;code&gt;stat&lt;/code&gt; on disk. No cache. No index. No staleness layer.&lt;/p&gt;

&lt;p&gt;The implication is that &lt;strong&gt;Read is always live&lt;/strong&gt;. The model that just edited a file and wants to see the result reads the file and gets the new bytes. The model that's iterating on a fix doesn't fight cache invalidation because there is no cache to invalidate. This is the "no maintain cost" term in the cost equation, made concrete: the live filesystem is the index, and the filesystem is always up to date with itself.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;OFFSET_INSTRUCTION_TARGETED&lt;/code&gt; template is worth noting separately. It's swapped in over the default when the prompting context suggests the model already knows what range it wants. It's a tiny prompt-engineering detail, but it's also a piece of evidence that the team thinks carefully about teaching the model to read selectively. The lesson the model is being taught, every prompt, is &lt;em&gt;don't be greedy&lt;/em&gt;. That's exactly the discipline that keeps per-query cost from blowing up.&lt;/p&gt;

&lt;h3&gt;
  
  
  The composition
&lt;/h3&gt;

&lt;p&gt;Walk through a real query: &lt;em&gt;"Where's the login flow in this project?"&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Glob `&lt;/strong&gt;/&lt;em&gt;login&lt;/em&gt;.{ts,tsx,js}`** — returns up to 100 files, most-recently-modified first. Usually under ten matches; usually the right one is in the first three.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grep &lt;code&gt;passport|auth|login&lt;/code&gt; --glob ''&lt;/strong&gt; — narrows to specific lines. Three output modes available; the model picks the cheapest one that disambiguates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read the file, with &lt;code&gt;offset&lt;/code&gt;/&lt;code&gt;limit&lt;/code&gt; targeted at the matched region&lt;/strong&gt; — reads only what's needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three primitives. One realistic query. Total cost: three round-trips, a few hundred tokens of output total. No index, no embedding, no rebuild. Filesystem unchanged.&lt;/p&gt;

&lt;p&gt;Now add ten more iterations as the model investigates a bug. Each one is the same three primitives, in different combinations. The cost grows linearly with iterations; it doesn't grow super-linearly with the codebase. That's the curve.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Loop, Sketched Honestly
&lt;/h2&gt;

&lt;p&gt;You can find pseudo-code versions of Claude Code's main loop in essays and threads. They all look something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolUses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;use&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolUses&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;use&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sketch is conceptually right. It is also missing roughly 1,415 lines.&lt;/p&gt;

&lt;p&gt;The real loop is at &lt;code&gt;src/query.ts:307&lt;/code&gt; (&lt;code&gt;while (true) {&lt;/code&gt;) and closes at &lt;code&gt;:1728&lt;/code&gt; (&lt;code&gt;} // while (true)&lt;/code&gt;). The body is 1,421 lines. The full file is 1,729. If you want a guided tour, the &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;Claude Code Deep Dive Part 2&lt;/a&gt; walks through it section by section. The summary for our purposes here is shorter:&lt;/p&gt;

&lt;p&gt;The Platonic loop — &lt;em&gt;call model, run tools, repeat&lt;/em&gt; — is six lines. The other 1,415 lines are doing one of four things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Streaming tool execution.&lt;/strong&gt; Tool calls don't wait for the full model response before they start running; they execute as the streaming &lt;code&gt;tool_use&lt;/code&gt; blocks arrive. This is non-trivial because the model can emit a partial &lt;code&gt;tool_use&lt;/code&gt;, retract it, or continue thinking before the rest of the call arrives.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache and compaction management.&lt;/strong&gt; Microcompact maintains tool-result cache coherence by &lt;code&gt;tool_use_id&lt;/code&gt; without inspecting content. The auto-compact path triggers when the message stream nears a context budget. Both interact with the prompt cache, which has its own coherence rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failure recovery.&lt;/strong&gt; A nested &lt;code&gt;while (attemptWithFallback)&lt;/code&gt; at line 654 handles model fallback when the primary returns a recoverable error. There's a separate path for &lt;code&gt;max_output_tokens&lt;/code&gt; recovery (when the model would emit a tool_use but truncate before the &lt;code&gt;tool_result&lt;/code&gt;). Orphan tool_results are aggressively pruned so a retry doesn't carry stale IDs forward.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking-block preservation.&lt;/strong&gt; Reasoning content has to span the assistant trajectory — same turn, plus any subsequent tool_use/tool_result chain — because the model expects to see its own prior reasoning to continue coherently. The loop preserves these blocks across iterations.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The point is not "the loop is complicated." The point is: &lt;strong&gt;the cost of running an agent on tool-loops moves into the loop, where it's controllable&lt;/strong&gt;. There's no second system — no index pipeline, no vector store, no embedding service — with its own failure modes, latency budgets, and consistency guarantees. The loop is the engineering target.&lt;/p&gt;

&lt;p&gt;This is the same property that makes a single-process database easier to operate than a microservices fleet, for reasons that have nothing to do with the database. Concentrate complexity where you can attack it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Subagent Twist Nobody Quotes Correctly
&lt;/h2&gt;

&lt;p&gt;This is the section where most explanations of Claude Code's retrieval get it wrong, including the article that prompted this post. The popular telling goes: &lt;em&gt;"For open-ended exploration, Claude Code spawns an Explore subagent — Anthropic codifies the rule that if you need more than three queries, you should fork."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The rule exists. The popular telling has two things wrong about it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rule is not in &lt;code&gt;AgentTool/prompt.ts&lt;/code&gt;.&lt;/strong&gt; It's in &lt;code&gt;src/constants/prompts.ts:378-379&lt;/code&gt;, injected into the system prompt by template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For simple, directed codebase searches (e.g. for a specific file/class/function) use [searchTools] directly.

For broader codebase exploration and deep research, use the Agent tool with subagent_type=Explore. This is slower than using [searchTools] directly, so use this only when a simple, directed search proves to be insufficient or when your task will clearly require more than [EXPLORE_AGENT_MIN_QUERIES] queries.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;EXPLORE_AGENT_MIN_QUERIES&lt;/code&gt; is defined in &lt;code&gt;src/tools/AgentTool/built-in/exploreAgent.ts:59&lt;/code&gt; as the integer &lt;code&gt;3&lt;/code&gt;. So the "more than 3 queries" threshold is literal — but it's a single named constant, easy to change, and the guidance is interpolated dynamically per system-prompt build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rule is conditional.&lt;/strong&gt; The two lines above are &lt;em&gt;only&lt;/em&gt; emitted when both conditions hold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/constants/prompts.ts:374-381&lt;/span&gt;
&lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;hasAgentTool&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class="nf"&gt;areExplorePlanAgentsEnabled&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;isForkSubagentEnabled&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;`For simple, directed codebase searches ...`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`For broader codebase exploration ... subagent_type=Explore ...`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both &lt;code&gt;areExplorePlanAgentsEnabled()&lt;/code&gt; and &lt;code&gt;!isForkSubagentEnabled()&lt;/code&gt; have to be true. The first is itself feature-gated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/tools/AgentTool/builtInAgents.ts:13-22&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;areExplorePlanAgentsEnabled&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;BUILTIN_EXPLORE_PLAN_AGENTS&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 3P default: true — Bedrock/Vertex keep agents enabled (matches pre-experiment&lt;/span&gt;
    &lt;span class="c1"&gt;// external behavior). A/B test treatment sets false to measure impact of removal.&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;getFeatureValue_CACHED_MAY_BE_STALE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tengu_amber_stoat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;BUILTIN_EXPLORE_PLAN_AGENTS&lt;/code&gt; is a Bun build feature gate. &lt;code&gt;tengu_amber_stoat&lt;/code&gt; is a GrowthBook key. (Aside: &lt;code&gt;tengu_*&lt;/code&gt; is the Anthropic-internal naming prefix for Claude Code — any flag you see with that prefix in the source is a Claude Code feature toggle.) The comment in source is the giveaway: &lt;em&gt;"A/B test treatment sets false to measure impact of removal."&lt;/em&gt; Anthropic is &lt;strong&gt;actively testing what happens when they remove Explore and Plan agents entirely&lt;/strong&gt; for some fraction of internal users.&lt;/p&gt;

&lt;p&gt;Read that again. The canonical "Anthropic uses Explore for exploration" claim is &lt;em&gt;true for the default treatment&lt;/em&gt; and &lt;em&gt;being measured against the no-Explore baseline&lt;/em&gt;. The interview answer that says "Anthropic always uses Explore" is wrong — or at least, more confident than the source.&lt;/p&gt;

&lt;p&gt;This is the half-credit point. Knowing that the rule exists is the first half. Knowing the rule is feature-gated and under measurement is the second half.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics of Explore, in Detail
&lt;/h2&gt;

&lt;p&gt;If you read just the &lt;em&gt;function&lt;/em&gt; of Explore, the design looks like a generic subagent: open-ended exploration, returns a summary, isolates from the main context. If you read its &lt;em&gt;parameters&lt;/em&gt;, the economics jump out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/tools/AgentTool/built-in/exploreAgent.ts:64-83&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;EXPLORE_AGENT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;BuiltInAgentDefinition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;agentType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Explore&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;whenToUse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;EXPLORE_WHEN_TO_USE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;disallowedTools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nx"&gt;AGENT_TOOL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;// can't spawn nested subagents&lt;/span&gt;
    &lt;span class="nx"&gt;EXIT_PLAN_MODE_TOOL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;FILE_EDIT_TOOL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;// read-only&lt;/span&gt;
    &lt;span class="nx"&gt;FILE_WRITE_TOOL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// read-only&lt;/span&gt;
    &lt;span class="nx"&gt;NOTEBOOK_EDIT_TOOL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;built-in&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseDir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;built-in&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// Ants get inherit to use the main agent's model; external users get haiku for speed&lt;/span&gt;
  &lt;span class="c1"&gt;// Note: For ants, getAgentModel() checks tengu_explore_agent GrowthBook flag at runtime&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;USER_TYPE&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ant&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;inherit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;haiku&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// Explore is a fast read-only search agent — it doesn't need commit/PR/lint&lt;/span&gt;
  &lt;span class="c1"&gt;// rules from CLAUDE.md. The main agent has full context and interprets results.&lt;/span&gt;
  &lt;span class="na"&gt;omitClaudeMd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;getSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;getExploreSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three economic decisions encoded here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Explore runs on Haiku for external users.&lt;/strong&gt; Not the main reasoning model. Exploration is a &lt;em&gt;cheap-tokens job&lt;/em&gt; — there's no creative reasoning happening, just iterate-and-filter — and Anthropic uses a fast, small, cheap model for it. The main agent gets the expensive model when it gets the summary back. This is the &lt;em&gt;staffing&lt;/em&gt; analogue: junior associate does the deposition review, senior partner reads the brief.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore omits CLAUDE.md.&lt;/strong&gt; The argument in the inline comment: &lt;em&gt;"The main agent has full context and interprets results."&lt;/em&gt; CLAUDE.md is for project rules — commit conventions, PR style, lint guidance. Explore isn't doing any of those things. Loading CLAUDE.md into its prompt would be paying tokens for guidance it can't act on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore can't spawn subagents.&lt;/strong&gt; &lt;code&gt;AGENT_TOOL_NAME&lt;/code&gt; is in &lt;code&gt;disallowedTools&lt;/code&gt;. No recursion. This is a budget guarantee: an Explore call has bounded depth.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And the system prompt makes the read-only constraint impossible to misread:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=== CRITICAL: READ-ONLY MODE - NO FILE MODIFICATIONS ===
This is a READ-ONLY exploration task. You are STRICTLY PROHIBITED from:
- Creating new files (no Write, touch, or file creation of any kind)
- Modifying existing files (no Edit operations)
- Deleting files (no rm or deletion)
- Moving or copying files (no mv or cp)
- Creating temporary files anywhere, including /tmp
- Using redirect operators (&amp;gt;, &amp;gt;&amp;gt;, |) or heredocs to write to files
- Running ANY commands that change system state
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(&lt;code&gt;exploreAgent.ts:26-34&lt;/code&gt;. The full prompt also requires Explore to spawn parallel tool calls aggressively — "you must try to spawn multiple parallel tool calls for grepping and reading files" — which is yet another budget squeeze: turn N round-trips into one wall-clock round, lower latency, same token bill.)&lt;/p&gt;

&lt;p&gt;Pull the threads together and Explore reads like a &lt;em&gt;budgeted exploration worker&lt;/em&gt;: small model, no CLAUDE.md noise, no nested recursion, read-only, parallel-first. It is the most economically tuned piece of the retrieval system, and it's there because once you've decided the model drives retrieval, the next question is &lt;em&gt;which&lt;/em&gt; model, and &lt;em&gt;what&lt;/em&gt; it gets paid to think about.&lt;/p&gt;

&lt;p&gt;The popular telling skips all of this. It says "Anthropic spawns Explore." The source says &lt;em&gt;Anthropic spawns the cheapest possible agent that can do the work&lt;/em&gt;, &lt;em&gt;strips its context to the minimum that lets it function&lt;/em&gt;, and &lt;em&gt;measures whether spawning it at all beats just letting the main agent loop directly&lt;/em&gt;. That last clause is the A/B in the previous section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fork — The Architecture Under Test
&lt;/h2&gt;

&lt;p&gt;Here's the other half of the experiment.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;src/tools/AgentTool/forkSubagent.ts&lt;/code&gt; exports &lt;code&gt;isForkSubagentEnabled()&lt;/code&gt;. When that returns true, &lt;code&gt;AgentTool/prompt.ts&lt;/code&gt; rewrites the system prompt to introduce a new operation: &lt;strong&gt;fork&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The fork section, paraphrased from &lt;code&gt;prompt.ts:80-96&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Fork yourself (omit &lt;code&gt;subagent_type&lt;/code&gt;) when the intermediate tool output isn't worth keeping in your context. The criterion is qualitative — "will I need this output again" — not task size. Forks are cheap because they share your prompt cache. Don't set &lt;code&gt;model&lt;/code&gt; on a fork — a different model can't reuse the parent's cache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't peek.&lt;/strong&gt; The tool result includes an &lt;code&gt;output_file&lt;/code&gt; path — do not Read or tail it unless the user explicitly asks for a progress check. You get a completion notification; trust it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't race.&lt;/strong&gt; After launching, you know nothing about what the fork found. Never fabricate or predict fork results. The notification arrives as a user-role message in a later turn; it is never something you write yourself.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Read this against Explore and the trade becomes visible:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Explore&lt;/th&gt;
&lt;th&gt;Fork&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context isolation&lt;/td&gt;
&lt;td&gt;Fresh subagent, separate context&lt;/td&gt;
&lt;td&gt;Inherits parent context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;Haiku (cheap)&lt;/td&gt;
&lt;td&gt;Same as parent (no swap allowed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt cache&lt;/td&gt;
&lt;td&gt;New cache, no parent reuse&lt;/td&gt;
&lt;td&gt;Reuses parent cache (huge savings on the system prompt and prior turns)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLAUDE.md&lt;/td&gt;
&lt;td&gt;Omitted&lt;/td&gt;
&lt;td&gt;Inherited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recursion&lt;/td&gt;
&lt;td&gt;Disallowed&lt;/td&gt;
&lt;td&gt;Allowed (forks can fork)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure model&lt;/td&gt;
&lt;td&gt;Subagent returns summary or error&lt;/td&gt;
&lt;td&gt;Notification arrives later as user message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Discipline required&lt;/td&gt;
&lt;td&gt;Caller frames the query&lt;/td&gt;
&lt;td&gt;Caller writes a directive prompt; trusts the notification&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Neither dominates. &lt;strong&gt;Explore&lt;/strong&gt; wins when isolation matters more than cost — the exploration is noisy enough that you don't want any of the tool spew in your main context, and the work is mechanical enough that Haiku can handle it. &lt;strong&gt;Fork&lt;/strong&gt; wins when the work needs the main model's depth and you're willing to pay the cache-reuse savings to get it, accepting that the fork is just a future-you with a directive.&lt;/p&gt;

&lt;p&gt;Anthropic is shipping both, gated, and watching which one wins on internal metrics. The interview answer that says "Claude Code uses Explore for exploration" is &lt;strong&gt;a snapshot of one branch of the experiment&lt;/strong&gt;. The next release cycle may settle it differently.&lt;/p&gt;

&lt;p&gt;This is exactly the kind of fact you can't get from the popular tellings, because they're written from a single observation of behavior. Reading the source gives you the experimental design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version caveat.&lt;/strong&gt; Flag names (&lt;code&gt;BUILTIN_EXPLORE_PLAN_AGENTS&lt;/code&gt;, &lt;code&gt;tengu_amber_stoat&lt;/code&gt;, &lt;code&gt;isForkSubagentEnabled&lt;/code&gt;) and exact gate semantics reflect the snapshot I reviewed and will drift — Anthropic ships fast. The two specific architectures (Explore, Fork) may be renamed, consolidated, or replaced by the time you read this. What's stable is the &lt;em&gt;pattern&lt;/em&gt;: Anthropic running multiple retrieval architectures in parallel, gated, measuring against each other. That pattern outlasts any specific flag name and is what the argument here actually rests on. Verify against the current release before depending on any specific flag.&lt;/p&gt;

&lt;h2&gt;
  
  
  When RAG Still Wins
&lt;/h2&gt;

&lt;p&gt;The cost-curve thesis predicts when RAG crosses over and dominates. Three regimes:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Mega-monorepos where any kind of index is amortized across millions of queries
&lt;/h3&gt;

&lt;p&gt;At Google or Meta or a top-tier hedge fund — codebases large enough that every per-query latency point costs real money across the org — there &lt;em&gt;is&lt;/em&gt; an index. But, importantly, the index they actually run is almost always &lt;strong&gt;symbol-graph&lt;/strong&gt; (Kythe, Glean, Sourcegraph) rather than vector RAG, for the precision reasons in the earlier section. The index is built once, maintained by a dedicated team, amortized across the engineering population's queries forever. Per-query cost goes to single-digit milliseconds; index drift is handled.&lt;/p&gt;

&lt;p&gt;In &lt;em&gt;this&lt;/em&gt; regime, tool-loops are paying the same per-query cost over and over across N engineers, and the math flips toward indexing. Below that scale, the staffing cost of "a dedicated team that owns the code index" is itself a hidden cost that wipes the savings — and the index of choice at smaller scales (per-project Sourcegraph, ctags) is still not vector RAG. Vector RAG specifically tends to win in the next two regimes, not this one.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Pure semantic queries where there's no symbol to grep for
&lt;/h3&gt;

&lt;p&gt;"Find the code that handles user-deactivation edge cases when the account is also a billing admin" — there's no specific symbol. There's a conceptual region of the code that you're looking for. A vector search over function-doc embeddings might point you to the right cluster of functions faster than the model would grep.&lt;/p&gt;

&lt;p&gt;Claude Code's answer to this is: spawn Explore, let it iterate. That works, but it isn't free. If your workload is dominated by semantic-cluster queries (auditing, security review, refactor planning), RAG starts to pencil out.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. When LLM context is the scarce resource — cheap, short-context model regime
&lt;/h3&gt;

&lt;p&gt;This is the framing nobody else gives. Cost-curves cut both ways. If your model has a 32k context window and costs $0.50/M tokens, doing six tool round-trips for retrieval is six rounds of context consumption. A one-shot RAG hit lets you spend that context budget on reasoning. RAG dominates when &lt;strong&gt;per-query token cost dominates per-query latency cost&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Anthropic optimized Claude Code for the regime where context is cheap and abundant (Opus 4.7 with a 1M-token window, in some configurations) and per-query latency is what users feel. In a different regime — say, on-device coding agents over a small open-source model — the cost-curve flips and RAG is the right tool. Same first principles, different numerical answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Decision Framework
&lt;/h2&gt;

&lt;p&gt;Match your project to the column. The cost-curve answers the question for you.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Use grep + LLM tool-loop&lt;/th&gt;
&lt;th&gt;Use RAG&lt;/th&gt;
&lt;th&gt;Either / hybrid&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Project size: under ~1M lines&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project size: 1M–100M lines&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project size: 100M+ lines (mega-monorepo)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codebase changes daily&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codebase is mostly static (knowledge base, archived)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queries are exact-symbol ("find getUserById")&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queries are conceptual ("how does auth work")&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model context is cheap and large&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model context is scarce/expensive&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You don't have a team to own a code index&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You have an index team and tooling already (Sourcegraph, Glean, ctags)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The crossover doesn't happen at any one threshold; multiple columns moving in the same direction tips the cost curve. For most non-FAANG projects, the columns sit on the grep side.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Companion Question
&lt;/h2&gt;

&lt;p&gt;This post is one half of a two-part argument about Claude Code's retrieval and memory choices. The companion — &lt;strong&gt;"Agent Memory Is a Cache Coherence Problem"&lt;/strong&gt; &lt;em&gt;(publishing 2026-05-28)&lt;/em&gt; — makes the same kind of argument about &lt;em&gt;cross-session&lt;/em&gt; memory: why Claude Code's built-in memory is hand-written Markdown instead of vector-recalled embeddings, even with the world hyping &lt;code&gt;claude-mem&lt;/code&gt; (70k+ GitHub stars as of May 2026) as a drop-in upgrade.&lt;/p&gt;

&lt;p&gt;Read together, the two pieces add up to a coherent design stance. Anthropic's bet, across both axes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fidelity over fuzz.&lt;/strong&gt; Both within-session retrieval (grep) and cross-session memory (CLAUDE.md) are lossless and exact. Both refuse vector approximation as the default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost curves over romance.&lt;/strong&gt; Neither choice is justified by "we trust the model." Both are justified by the math: zero build cost + zero maintain cost beats nonlinear maintain cost for the workloads they target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experimentation in production.&lt;/strong&gt; Both architectures have alternative branches under active flag gating. On the retrieval side: &lt;code&gt;tengu_amber_stoat&lt;/code&gt; for Explore-vs-no-Explore, with Fork as a parallel architecture. On the memory side: &lt;code&gt;tengu_coral_fern&lt;/code&gt;, &lt;code&gt;tengu_herring_clock&lt;/code&gt;, &lt;code&gt;tengu_passport_quail&lt;/code&gt;, &lt;code&gt;tengu_slate_thimble&lt;/code&gt;, plus the build-time gates &lt;code&gt;KAIROS&lt;/code&gt;, &lt;code&gt;TEAMMEM&lt;/code&gt;, and &lt;code&gt;EXTRACT_MEMORIES&lt;/code&gt; — all visible in &lt;code&gt;src/memdir/&lt;/code&gt;. The pattern is the same: ship a default, leave the toggles in, keep measuring.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The shape of the design is a careful refusal to lock in. The cost curves favor the current choices today. They might not in a year. The system is built to flip.&lt;/p&gt;

&lt;p&gt;That, finally, is the answer the interview question is fishing for. &lt;em&gt;Why not RAG?&lt;/em&gt; Because the cost curves don't justify it for this workload, and Anthropic has the engineering culture to refuse the cargo-cult. &lt;em&gt;Will it always be that way?&lt;/em&gt; No — and the feature flags in the source say so out loud.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Companion piece: **Agent Memory Is a Cache Coherence Problem&lt;/em&gt;* (publishing 2026-05-28)*&lt;br&gt;
&lt;em&gt;Background: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/consistency-scenarios-and-approaches-production/" rel="noopener noreferrer"&gt;Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works&lt;/a&gt;***&lt;br&gt;
&lt;em&gt;For a focused walk through the loop file: *&lt;/em&gt;&lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;Claude Code Deep Dive Part 2: The 1,421-Line While Loop&lt;/a&gt;***&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>claudecode</category>
      <category>programming</category>
    </item>
    <item>
      <title>Channels Aren't Message Passing — How Parked Goroutines OOM-Killed a Pod</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Thu, 14 May 2026 05:26:27 +0000</pubDate>
      <link>https://dev.to/harrisonsec/channels-arent-message-passing-how-parked-goroutines-oom-killed-a-pod-4ijf</link>
      <guid>https://dev.to/harrisonsec/channels-arent-message-passing-how-parked-goroutines-oom-killed-a-pod-4ijf</guid>
      <description>&lt;p&gt;It's 3am. The Kafka consumer pod that's been running cleanly for six weeks gets OOM-killed. Kubernetes restarts it. Five minutes later: OOM-killed again. Restart. OOM-killed a third time. By the fourth restart I've shelved the dashboard and started reading &lt;code&gt;runtime/chan.go&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The code that died fit on one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I want to tell you that line is the bug. It isn't. An unbuffered channel will happily backpressure a &lt;em&gt;single&lt;/em&gt; producer — every send rendezvous with a receiver, the producer cannot run ahead. The channel did exactly what it was designed to do.&lt;/p&gt;

&lt;p&gt;What I had built &lt;em&gt;around&lt;/em&gt; it didn't. The Kafka consumer loop wrapped &lt;code&gt;events &amp;lt;- parseEvent(msg)&lt;/code&gt; inside a &lt;code&gt;go func(msg) { ... }(msg)&lt;/code&gt;, spawning a fresh goroutine per inbound message. Every one of those goroutines blocked on send, parked on the channel's &lt;code&gt;sendq&lt;/code&gt; list, and kept its stack and the parsed event alive in memory. The channel was the gravestone. The unbounded &lt;code&gt;go func&lt;/code&gt; fan-out was what filled it.&lt;/p&gt;

&lt;p&gt;This is the story of what a Go channel actually is at the runtime level, why "channels are message passing" is one of the most expensive lies in the Go ecosystem, and why the most common channel bug isn't &lt;em&gt;in&lt;/em&gt; the channel — it's in the code that calls into it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — A Go channel is not a queue and not a message bus. It's a heap-allocated &lt;code&gt;hchan&lt;/code&gt; struct containing a mutex, a ring buffer, and two parked-goroutine lists. The send operation is a &lt;code&gt;memcpy&lt;/code&gt; under a lock, not a transmission. &lt;strong&gt;Channels only deliver backpressure if the producer side is bounded.&lt;/strong&gt; The OOM that started this story came not from &lt;code&gt;make(chan Event)&lt;/code&gt; — that was working as designed — but from an unbounded &lt;code&gt;go func(msg)&lt;/code&gt; fan-out parking thousands of goroutines on &lt;code&gt;sendq&lt;/code&gt;, each retaining a 10KB payload. The fix isn't a buffer size. It's making backpressure part of the producer contract: a single long-lived producer with &lt;code&gt;select&lt;/code&gt;-based backoff, plus a bounded queue as a safety net. The same architectural mistake shows up at every layer where engineers reach for an "in-process queue" — including the inbound queue of your AI agent.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Mental Model That Killed The Pod
&lt;/h2&gt;

&lt;p&gt;Here is what I thought a channel did, and I suspect most Go engineers carry some version of this picture:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A channel is like a Kafka topic in-process. Producers push messages onto it. Consumers pull messages off it. The runtime handles ordering and delivery. It's CSP — Communicating Sequential Processes — Hoare's thing, basically a typed pipe."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every word of that sentence is wrong in a way that matters. There is no topic. Nothing is being pushed anywhere. The runtime is not a broker. The word &lt;em&gt;passing&lt;/em&gt; — borrowed from message-passing concurrency, where independent processes communicate across an isolation boundary — is the most misleading part. In a Go channel, there is no isolation boundary. There is one struct on the heap, and both goroutines reach in and mutate it.&lt;/p&gt;

&lt;p&gt;I held the message-passing model long enough that when the Kafka consumer started ingesting a 12-hour upstream replay at full throttle, I had no instinct that &lt;em&gt;the messages were going somewhere bounded&lt;/em&gt;. They weren't. They were sitting in a ring buffer that I had failed to give a size to.&lt;/p&gt;




&lt;h2&gt;
  
  
  What A Channel Actually Is
&lt;/h2&gt;

&lt;p&gt;Crack open &lt;a href="https://github.com/golang/go/blob/master/src/runtime/chan.go" rel="noopener noreferrer"&gt;runtime/chan.go&lt;/a&gt; in the Go source tree and you'll find this (layout stable since Go 1.7, confirmed against Go 1.21–1.25):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;hchan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;qcount&lt;/span&gt;   &lt;span class="kt"&gt;uint&lt;/span&gt;           &lt;span class="c"&gt;// total data in the queue&lt;/span&gt;
    &lt;span class="n"&gt;dataqsiz&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt;           &lt;span class="c"&gt;// size of the circular queue&lt;/span&gt;
    &lt;span class="n"&gt;buf&lt;/span&gt;      &lt;span class="n"&gt;unsafe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pointer&lt;/span&gt; &lt;span class="c"&gt;// points to an array of dataqsiz elements&lt;/span&gt;
    &lt;span class="n"&gt;elemsize&lt;/span&gt; &lt;span class="kt"&gt;uint16&lt;/span&gt;
    &lt;span class="n"&gt;closed&lt;/span&gt;   &lt;span class="kt"&gt;uint32&lt;/span&gt;
    &lt;span class="n"&gt;elemtype&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;_type&lt;/span&gt;
    &lt;span class="n"&gt;sendx&lt;/span&gt;    &lt;span class="kt"&gt;uint&lt;/span&gt;           &lt;span class="c"&gt;// send index&lt;/span&gt;
    &lt;span class="n"&gt;recvx&lt;/span&gt;    &lt;span class="kt"&gt;uint&lt;/span&gt;           &lt;span class="c"&gt;// receive index&lt;/span&gt;
    &lt;span class="n"&gt;recvq&lt;/span&gt;    &lt;span class="n"&gt;waitq&lt;/span&gt;          &lt;span class="c"&gt;// list of recv waiters&lt;/span&gt;
    &lt;span class="n"&gt;sendq&lt;/span&gt;    &lt;span class="n"&gt;waitq&lt;/span&gt;          &lt;span class="c"&gt;// list of send waiters&lt;/span&gt;
    &lt;span class="n"&gt;lock&lt;/span&gt;     &lt;span class="n"&gt;mutex&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. That's the channel. A struct with a mutex, a pointer to a circular byte array, two indices to track read/write positions in the ring, and two intrusive linked lists holding parked goroutines that are waiting to send or receive.&lt;/p&gt;

&lt;p&gt;When you write &lt;code&gt;ch &amp;lt;- value&lt;/code&gt;, the runtime calls &lt;a href="https://github.com/golang/go/blob/master/src/runtime/chan.go#L160" rel="noopener noreferrer"&gt;&lt;code&gt;chansend&lt;/code&gt;&lt;/a&gt;, which does roughly this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Take the lock&lt;/strong&gt; (&lt;code&gt;lock(&amp;amp;c.lock)&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check &lt;code&gt;recvq&lt;/code&gt;&lt;/strong&gt; — is there a goroutine already parked waiting to receive? If yes, copy &lt;code&gt;value&lt;/code&gt; &lt;em&gt;directly&lt;/em&gt; from the sender's stack into the receiver's stack via &lt;code&gt;sendDirect&lt;/code&gt;, mark the receiver runnable with &lt;code&gt;goready&lt;/code&gt;, release the lock, return. No buffer involved — when a receiver is already waiting, send can hand off directly without ever touching the ring buffer. (In normal operation a buffered channel can't simultaneously have queued data AND parked receivers; if &lt;code&gt;recvq&lt;/code&gt; has a waiter, the buffer is empty.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Otherwise, check buffer space&lt;/strong&gt; — if &lt;code&gt;qcount &amp;lt; dataqsiz&lt;/code&gt;, copy &lt;code&gt;value&lt;/code&gt; into &lt;code&gt;buf[sendx]&lt;/code&gt;, advance &lt;code&gt;sendx&lt;/code&gt;, increment &lt;code&gt;qcount&lt;/code&gt;, release the lock, return.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Otherwise, park the sender&lt;/strong&gt; — append the sender's goroutine to &lt;code&gt;sendq&lt;/code&gt;, release the lock, and call &lt;code&gt;gopark&lt;/code&gt; to suspend execution until a receiver wakes it up.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Receive is the mirror image, calling &lt;code&gt;chanrecv&lt;/code&gt; with &lt;code&gt;sendq&lt;/code&gt; and &lt;code&gt;recvq&lt;/code&gt; swapped.&lt;/p&gt;

&lt;p&gt;Here is the shape of it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggVEQKICAgIHN1YmdyYXBoIFNlbmRlciBbU2VuZGVyIGdvcm91dGluZV0KICAgICAgICBTMVsiY2ggJmx0Oy0gdmFsdWUiXQogICAgZW5kCgogICAgc3ViZ3JhcGggQ2hhbm5lbCBbaGNoYW4gc3RydWN0IG9uIGhlYXBdCiAgICAgICAgTFttdXRleCBsb2NrXQogICAgICAgIEJbInJpbmcgYnVmZmVyPGJyLz5kYXRhcXNpeiBzbG90cyJdCiAgICAgICAgUlFbcmVjdnE6IHBhcmtlZCByZWNlaXZlcnNdCiAgICAgICAgU1Fbc2VuZHE6IHBhcmtlZCBzZW5kZXJzXQogICAgZW5kCgogICAgc3ViZ3JhcGggUmVjZWl2ZXIgW1JlY2VpdmVyIGdvcm91dGluZV0KICAgICAgICBSMVsidiA6PSAmbHQ7LWNoIl0KICAgIGVuZAoKICAgIFMxIC0tPnwiMS4gYWNxdWlyZSBsb2NrInwgTAogICAgTCAtLT58IjIuIHJlY3ZxIGVtcHR5PyJ8IEIKICAgIEwgLS0-fCIyLiByZWN2cSBoYXMgd2FpdGVyInwgUlEKICAgIFJRIC0tPnwiZGlyZWN0IGNvcHksIG5vIGJ1ZmZlciJ8IFIxCiAgICBCIC0tPnwiY29weSB0byBidWYgaWYgc3BhY2UifCBSMQogICAgTCAtLT58ImJ1ZmZlciBmdWxsLCBwYXJrIHNlbmRlciJ8IFNR" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggVEQKICAgIHN1YmdyYXBoIFNlbmRlciBbU2VuZGVyIGdvcm91dGluZV0KICAgICAgICBTMVsiY2ggJmx0Oy0gdmFsdWUiXQogICAgZW5kCgogICAgc3ViZ3JhcGggQ2hhbm5lbCBbaGNoYW4gc3RydWN0IG9uIGhlYXBdCiAgICAgICAgTFttdXRleCBsb2NrXQogICAgICAgIEJbInJpbmcgYnVmZmVyPGJyLz5kYXRhcXNpeiBzbG90cyJdCiAgICAgICAgUlFbcmVjdnE6IHBhcmtlZCByZWNlaXZlcnNdCiAgICAgICAgU1Fbc2VuZHE6IHBhcmtlZCBzZW5kZXJzXQogICAgZW5kCgogICAgc3ViZ3JhcGggUmVjZWl2ZXIgW1JlY2VpdmVyIGdvcm91dGluZV0KICAgICAgICBSMVsidiA6PSAmbHQ7LWNoIl0KICAgIGVuZAoKICAgIFMxIC0tPnwiMS4gYWNxdWlyZSBsb2NrInwgTAogICAgTCAtLT58IjIuIHJlY3ZxIGVtcHR5PyJ8IEIKICAgIEwgLS0-fCIyLiByZWN2cSBoYXMgd2FpdGVyInwgUlEKICAgIFJRIC0tPnwiZGlyZWN0IGNvcHksIG5vIGJ1ZmZlciJ8IFIxCiAgICBCIC0tPnwiY29weSB0byBidWYgaWYgc3BhY2UifCBSMQogICAgTCAtLT58ImJ1ZmZlciBmdWxsLCBwYXJrIHNlbmRlciJ8IFNR" alt="graph TD" width="792" height="628"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three things are worth burning into memory:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One — there is no transport.&lt;/strong&gt; The "message" never leaves the heap. Sender writes bytes; receiver reads bytes; the lock arbitrates. This is shared-memory synchronization with the &lt;em&gt;appearance&lt;/em&gt; of message passing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two — the buffer is just a ring of typed slots.&lt;/strong&gt; &lt;code&gt;dataqsiz&lt;/code&gt; is set exactly once, at &lt;code&gt;make(chan T, N)&lt;/code&gt; time. If you write &lt;code&gt;make(chan T)&lt;/code&gt;, &lt;code&gt;dataqsiz&lt;/code&gt; is zero and there is no buffer at all — every send must rendezvous with a receiver or park.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three — &lt;code&gt;sendq&lt;/code&gt; is unbounded.&lt;/strong&gt; This is the part nobody talks about. The ring buffer has a fixed size. The list of &lt;em&gt;parked senders waiting to write into the ring buffer&lt;/em&gt; does not. If a thousand goroutines all hit a full channel, the runtime parks all thousand of them on &lt;code&gt;sendq&lt;/code&gt; and each one keeps its stack and any data it was about to send alive in memory.&lt;/p&gt;

&lt;p&gt;That third point is what made the OOM I had a different shape from the one I was about to describe.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Incident, Mechanism By Mechanism
&lt;/h2&gt;

&lt;p&gt;The pod that died had a goroutine topology that looked like this — and the bug is &lt;em&gt;not&lt;/em&gt; the &lt;code&gt;make(chan Event)&lt;/code&gt; line. Watch the outer loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Consumer — slow.&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ev&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// ~3ms per event&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}()&lt;/span&gt;

&lt;span class="c"&gt;// THE ACTUAL BUG: outer loop spawns a fresh goroutine per inbound message.&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;parseEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// every blocked send parks on sendq&lt;/span&gt;
    &lt;span class="p"&gt;}(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you replace the inner &lt;code&gt;go func(msg) { ... }(msg)&lt;/code&gt; with a direct &lt;code&gt;events &amp;lt;- parseEvent(msg)&lt;/code&gt;, the outer loop &lt;em&gt;itself&lt;/em&gt; becomes the producer, and the unbuffered channel correctly backpressures it — the loop simply doesn't advance until the consumer is ready. No OOM.&lt;/p&gt;

&lt;p&gt;But because each message is dispatched to a fresh helper goroutine, the outer loop never blocks. It keeps spawning. Each helper goroutine reaches the send, finds no waiting receiver, and parks on &lt;code&gt;sendq&lt;/code&gt;. Now &lt;code&gt;sendq&lt;/code&gt; is the unbounded thing. Here is what actually happened, in order:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Sustained baseline: rendezvous works
&lt;/h3&gt;

&lt;p&gt;At 1K msg/sec inbound and ~3ms per &lt;code&gt;process&lt;/code&gt; call (~333/sec consumer throughput), the consumer is already behind by 3x at steady state. For weeks this didn't OOM because the Kafka client's own internal buffer absorbed the gap, and lag built up on the broker side — visible in Grafana, ignored by me.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Replay: the producer detaches from the consumer's pace
&lt;/h3&gt;

&lt;p&gt;When upstream re-emitted 12 hours of events, the Kafka client's internal pre-fetch buffer filled to capacity (default &lt;code&gt;fetch.message.max.bytes&lt;/code&gt; × partition count = several hundred MB) and started backing up &lt;em&gt;Kafka-side&lt;/em&gt; without applying backpressure to the consumer goroutine, because the client library was configured with a large internal queue.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The actual heap growth: parked sender goroutines
&lt;/h3&gt;

&lt;p&gt;Each call to &lt;code&gt;events &amp;lt;- parseEvent(msg)&lt;/code&gt; on the unbuffered channel would either rendezvous (rare during replay) or park. When it parked, the sender goroutine held:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Its own stack (~8KB minimum, grew under load)&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Event&lt;/code&gt; value it was about to send (~10KB per event with strings, headers, payload)&lt;/li&gt;
&lt;li&gt;A reference into the Kafka message it was parsing (another ~10KB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multiply by the number of in-flight parsing goroutines — which kept being spawned by an outer loop that didn't apply backpressure to itself — and you arrive at the 12GB heap. The channel's &lt;code&gt;sendq&lt;/code&gt; was the proximate memory sink, not the buffer (which was zero-sized).&lt;/p&gt;

&lt;p&gt;The goroutine lifecycle for each parsing goroutine looked like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc3RhdGVEaWFncmFtLXYyCiAgICBbKl0gLS0-IFJ1bm5pbmc6IGdvIGZ1bmMoKQogICAgUnVubmluZyAtLT4gUGFya2VkX29uX3NlbmRxOiBjaCA8LSB2YWx1ZSAobm8gcmVjZWl2ZXIpCiAgICBQYXJrZWRfb25fc2VuZHEgLS0-IFJ1bm5hYmxlOiByZWNlaXZlciB3YWtlcyBtZQogICAgUnVubmFibGUgLS0-IFJ1bm5pbmc6IHNjaGVkdWxlciBwaWNrcyBtZQogICAgUnVubmluZyAtLT4gWypdOiBmdW5jdGlvbiByZXR1cm5zCgogICAgbm90ZSByaWdodCBvZiBQYXJrZWRfb25fc2VuZHEKICAgICAgICBTdGFjayByZXRhaW5lZC4KICAgICAgICBFdmVudCBwYXlsb2FkIHJldGFpbmVkLgogICAgICAgIEthZmthIG1zZyByZWZlcmVuY2UgcmV0YWluZWQuCiAgICAgICAgc2VuZHEgaGFzIE5PIHNpemUgYm91bmQuCiAgICBlbmQgbm90ZQ%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc3RhdGVEaWFncmFtLXYyCiAgICBbKl0gLS0-IFJ1bm5pbmc6IGdvIGZ1bmMoKQogICAgUnVubmluZyAtLT4gUGFya2VkX29uX3NlbmRxOiBjaCA8LSB2YWx1ZSAobm8gcmVjZWl2ZXIpCiAgICBQYXJrZWRfb25fc2VuZHEgLS0-IFJ1bm5hYmxlOiByZWNlaXZlciB3YWtlcyBtZQogICAgUnVubmFibGUgLS0-IFJ1bm5pbmc6IHNjaGVkdWxlciBwaWNrcyBtZQogICAgUnVubmluZyAtLT4gWypdOiBmdW5jdGlvbiByZXR1cm5zCgogICAgbm90ZSByaWdodCBvZiBQYXJrZWRfb25fc2VuZHEKICAgICAgICBTdGFjayByZXRhaW5lZC4KICAgICAgICBFdmVudCBwYXlsb2FkIHJldGFpbmVkLgogICAgICAgIEthZmthIG1zZyByZWZlcmVuY2UgcmV0YWluZWQuCiAgICAgICAgc2VuZHEgaGFzIE5PIHNpemUgYm91bmQuCiAgICBlbmQgbm90ZQ%3D%3D" alt="stateDiagram-v2" width="612" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every goroutine sitting in &lt;code&gt;Parked_on_sendq&lt;/code&gt; is reachable (it's on the runtime's wait queue, which is rooted in the &lt;code&gt;hchan&lt;/code&gt; struct, which is rooted by both the producer and consumer goroutines). Reachable means non-collectible. The longer the consumer falls behind, the longer the queue grows.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. GC can't help
&lt;/h3&gt;

&lt;p&gt;Go's GC can only reclaim unreachable memory. Every parked goroutine on &lt;code&gt;sendq&lt;/code&gt; is reachable (it's on the runtime's scheduler queue). Every &lt;code&gt;Event&lt;/code&gt; it's holding is reachable. The GC ran, found nothing to free, and the heap continued growing until the kernel OOM-killer fired.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The cgroup hammer drops
&lt;/h3&gt;

&lt;p&gt;cgroup memory limit was 4GB. Heap crossed 4GB. OOM kill. Kubernetes restarted the pod. The replay was still in progress on the broker side, so the same sequence ran again. And again.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this looks like in pprof
&lt;/h3&gt;

&lt;p&gt;You don't have to take my word for the mechanism — it reproduces in under a minute. I built a minimal demo at &lt;a href="https://github.com/harrison001/channels-oom-demo" rel="noopener noreferrer"&gt;&lt;code&gt;harrison001/channels-oom-demo&lt;/code&gt;&lt;/a&gt; (&lt;a href="https://github.com/harrison001/channels-oom-demo/blob/main/cmd/bug/main.go" rel="noopener noreferrer"&gt;&lt;code&gt;cmd/bug&lt;/code&gt;&lt;/a&gt;) that runs the same workload shape on a laptop. The output of the bug version over 22 seconds, captured with &lt;code&gt;runtime.NumGoroutine()&lt;/code&gt; and &lt;code&gt;runtime.MemStats.HeapAlloc&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;t=   1s  goroutines=   497  heap_alloc=     5 MB
t=   5s  goroutines=  2462  heap_alloc=    28 MB
t=  10s  goroutines=  4915  heap_alloc=    61 MB
t=  15s  goroutines=  7369  heap_alloc=    89 MB
t=  20s  goroutines=  9828  heap_alloc=   109 MB
t=  22s  goroutines= 10813  heap_alloc=   125 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Goroutine count grows at almost exactly 1 per millisecond (the spawn rate). Heap grows at ~5MB/sec, dominated by the 10KB Event payload each parked goroutine is holding. Extrapolate to a 12-hour replay at production volume and you arrive at the original 12GB OOM.&lt;/p&gt;

&lt;p&gt;For comparison, the fix version (&lt;a href="https://github.com/harrison001/channels-oom-demo/blob/main/cmd/fix/main.go" rel="noopener noreferrer"&gt;&lt;code&gt;cmd/fix&lt;/code&gt;&lt;/a&gt;) on the same workload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;t=   1s  goroutines=     3  heap_alloc=     3 MB  chan_len= 256
t=  10s  goroutines=     3  heap_alloc=     4 MB  chan_len= 256
t=  20s  goroutines=     3  heap_alloc=     5 MB  chan_len= 256
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three goroutines (producer, consumer, pprof listener). Heap flat at 4-5 MB. Channel pinned at its 256-slot bound, meaning the producer is constantly blocked on send and applying backpressure upstream — exactly what we want.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fix, And Why It Works
&lt;/h2&gt;

&lt;p&gt;The visible code change was one parameter. The real fix was making backpressure part of the producer contract — two changes, working together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// (1) bounded queue as safety net&lt;/span&gt;

&lt;span class="c"&gt;// (2) single long-lived producer goroutine with select-based backoff —&lt;/span&gt;
&lt;span class="c"&gt;// NO outer loop spawning fresh goroutines per message.&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;parseEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="c"&gt;// sent — loop continues at consumer speed when channel fills&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key word in change (2) is &lt;strong&gt;single&lt;/strong&gt;. There is exactly one goroutine reading from Kafka and writing to the channel. When the channel fills, that goroutine blocks on send; the &lt;code&gt;for msg := range&lt;/code&gt; loop stops calling &lt;code&gt;Poll()&lt;/code&gt;; the Kafka client's internal pre-fetch queue stops draining; consumer lag accumulates broker-side; the broker simply retains messages until we come back. No &lt;code&gt;go func(msg)&lt;/code&gt; helpers. Nothing piling up on &lt;code&gt;sendq&lt;/code&gt;. Memory stays bounded because the &lt;em&gt;producer&lt;/em&gt; is bounded — the buffer is only a safety net to absorb short bursts.&lt;/p&gt;

&lt;p&gt;What this changes, mechanically:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before (unbounded &lt;code&gt;go func&lt;/code&gt; fan-out + &lt;code&gt;make(chan Event)&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;After (single producer + &lt;code&gt;make(chan Event, 256)&lt;/code&gt;)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One goroutine per inbound message&lt;/td&gt;
&lt;td&gt;One long-lived producer goroutine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;sendq&lt;/code&gt; grows unboundedly with parked helpers&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;sendq&lt;/code&gt; empty by construction; producer is sole sender&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No signal to upstream — outer loop never blocks&lt;/td&gt;
&lt;td&gt;Producer blocks on send; outer loop runs at consumer speed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka client keeps pre-fetching, lag invisible&lt;/td&gt;
&lt;td&gt;Kafka client's internal queue fills, consumer stops polling, broker-side lag accumulates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OOM&lt;/td&gt;
&lt;td&gt;Bounded heap, bounded latency, Kafka rebalances cleanly when behind&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A bounded channel buffer alone does not prevent this OOM. If you applied change (1) without change (2), you'd merely increase the OOM-killing rate — the outer &lt;code&gt;go func(msg)&lt;/code&gt; fan-out would keep spawning, the buffer would fill in milliseconds, helpers would pile up on &lt;code&gt;sendq&lt;/code&gt; exactly as before. Backpressure is not a property of any one component — it is a property of the entire chain having no unbounded buffer (and no unbounded fan-out) anywhere in it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggTFIKICAgIEFbS2Fma2EgYnJva2VyXSAtLT58ZmV0Y2h8IEJbS2Fma2EgY2xpZW50PGJyLz5wcmUtZmV0Y2ggYnVmZmVyXQogICAgQiAtLT4gQ1tQcm9kdWNlcjxici8-Z29yb3V0aW5lXQogICAgQyAtLT58c2VsZWN0fCBEWyJldmVudHM8YnIvPmNoYW4gVCwgMjU2Il0KICAgIEQgLS0-IEVbQ29uc3VtZXI8YnIvPmdvcm91dGluZV0KICAgIEUgLS0-IEZbKERhdGFiYXNlKV0KCiAgICBGIC0uIHNsb3cgLi0-IEUKICAgIEUgLS4gc2xvdyBkcmFpbiAuLT4gRAogICAgRCAtLiBmdWxsIC4tPiBDCiAgICBDIC0uIGJsb2NrcyBvbiBzZW5kIC4tPiBCCiAgICBCIC0uIHF1ZXVlIGZpbGxzLCBmZXRjaCBzbG93cyAuLT4gQQogICAgQSAtLiBicm9rZXIgcmV0YWlucyBtc2dzPGJyLz5jb25zdW1lciBsYWcgZ3Jvd3MgLi0-IEEKCiAgICBjbGFzc0RlZiBib3VuZCBmaWxsOiNjZmUsc3Ryb2tlOiMwODAKICAgIGNsYXNzIEQgYm91bmQ%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggTFIKICAgIEFbS2Fma2EgYnJva2VyXSAtLT58ZmV0Y2h8IEJbS2Fma2EgY2xpZW50PGJyLz5wcmUtZmV0Y2ggYnVmZmVyXQogICAgQiAtLT4gQ1tQcm9kdWNlcjxici8-Z29yb3V0aW5lXQogICAgQyAtLT58c2VsZWN0fCBEWyJldmVudHM8YnIvPmNoYW4gVCwgMjU2Il0KICAgIEQgLS0-IEVbQ29uc3VtZXI8YnIvPmdvcm91dGluZV0KICAgIEUgLS0-IEZbKERhdGFiYXNlKV0KCiAgICBGIC0uIHNsb3cgLi0-IEUKICAgIEUgLS4gc2xvdyBkcmFpbiAuLT4gRAogICAgRCAtLiBmdWxsIC4tPiBDCiAgICBDIC0uIGJsb2NrcyBvbiBzZW5kIC4tPiBCCiAgICBCIC0uIHF1ZXVlIGZpbGxzLCBmZXRjaCBzbG93cyAuLT4gQQogICAgQSAtLiBicm9rZXIgcmV0YWlucyBtc2dzPGJyLz5jb25zdW1lciBsYWcgZ3Jvd3MgLi0-IEEKCiAgICBjbGFzc0RlZiBib3VuZCBmaWxsOiNjZmUsc3Ryb2tlOiMwODAKICAgIGNsYXNzIEQgYm91bmQ%3D" alt="graph LR" width="1521" height="188"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every link in this chain is bounded — the database has connection pool limits, the consumer is rate-limited by &lt;code&gt;process()&lt;/code&gt; latency, the channel buffer is 256, the Kafka client's internal queue has a configured max, and the broker simply retains messages on disk when its consumer falls behind. When ANY downstream link slows, the pressure propagates back up by the consumer ceasing to pull; the broker doesn't need to be told anything. The whole system runs at the rate of its slowest component.&lt;/p&gt;

&lt;p&gt;If any link in that chain has an unbounded buffer, the chain has no backpressure. That link will absorb the load until it OOMs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bounded Buffers Are Not About Channels
&lt;/h2&gt;

&lt;p&gt;The lesson is not "use buffered channels." The lesson is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Any in-process queue without a capacity bound is a latent OOM.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This applies identically across runtimes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Runtime&lt;/th&gt;
&lt;th&gt;The footgun&lt;/th&gt;
&lt;th&gt;The fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;Unbounded goroutine fan-out parked on sends (&lt;code&gt;go func(msg) { ch &amp;lt;- ... }(msg)&lt;/code&gt;); oversized buffered channels&lt;/td&gt;
&lt;td&gt;Single long-lived producer + &lt;code&gt;select&lt;/code&gt; + bounded buffer as safety net&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust (Tokio)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mpsc::unbounded_channel()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mpsc::channel(N)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python (asyncio)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;asyncio.Queue()&lt;/code&gt; with no &lt;code&gt;maxsize&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;asyncio.Queue(maxsize=N)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node.js&lt;/td&gt;
&lt;td&gt;Unbounded array of in-flight Promises&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;p-limit&lt;/code&gt;, &lt;code&gt;Sema&lt;/code&gt;, or explicit pool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Erlang/Elixir&lt;/td&gt;
&lt;td&gt;Process mailbox grows unboundedly when selective receive can't keep up&lt;/td&gt;
&lt;td&gt;Demand-driven flow control: &lt;a href="https://hexdocs.pm/gen_stage/GenStage.html" rel="noopener noreferrer"&gt;&lt;code&gt;GenStage&lt;/code&gt;&lt;/a&gt; / &lt;code&gt;Flow&lt;/code&gt; for pipelines, or explicit ack-based protocols in &lt;code&gt;gen_statem&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every one of these reaches for the same shape — an in-process queue — and every one of them OOMs the same way when the shape is unbounded.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Channels Are The Right Tool
&lt;/h2&gt;

&lt;p&gt;I want to be careful not to overcorrect. Channels are not a mistake. They are an excellent primitive used incorrectly. Cases where reaching for a channel is the right call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cancellation signaling&lt;/strong&gt; — &lt;code&gt;context.Done()&lt;/code&gt; is a &lt;code&gt;&amp;lt;-chan struct{}&lt;/code&gt;. This is canonical.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fan-out work distribution with a worker pool&lt;/strong&gt; — a bounded channel feeding N worker goroutines is a clean semaphore. Buffer size = pool size or small multiple of it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Producer-consumer with a known throughput ratio&lt;/strong&gt; — yes, with a bounded buffer sized to the latency budget.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error aggregation from concurrent goroutines&lt;/strong&gt; — small buffered channel, drain on goroutine completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handoff between pipeline stages&lt;/strong&gt; — bounded, with explicit close semantics on the upstream stage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cases where reaching for a channel is the wrong call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cross-process messaging&lt;/strong&gt; — use a real broker (NATS, Kafka, Redis Streams). Channels do not survive a pod restart.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence&lt;/strong&gt; — channels are stack-local-ish. If your pod dies, the in-flight data is gone. If you need "at least once" across restarts, you need a real queue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bursty load with unknown shape&lt;/strong&gt; — if you cannot put a meaningful upper bound on the buffer, you have not understood the load. Adding a channel does not give you understanding; it postpones the OOM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anything that wants to be a message bus&lt;/strong&gt; — that's not a channel. That's a message bus. They are different categories of system.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Same Bug, Different Layer: AI Agent Inbound Queues
&lt;/h2&gt;

&lt;p&gt;The reason this post lives in the SecurityLab track and not just "Go tips" is that the exact same mistake is now happening, at scale, in LLM agent infrastructure. I've seen the pattern repeatedly in recent AI backends — same architectural shape, different runtime.&lt;/p&gt;

&lt;p&gt;The pattern: an agent backend exposes an HTTP endpoint. Each inbound request is dispatched to a worker pool via an in-process queue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The bug, in a different language
&lt;/span&gt;&lt;span class="n"&gt;request_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# unbounded
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;http_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# never blocks
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queued&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 8 seconds, sometimes 30
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Steady state is fine: requests arrive faster than they're processed, queue grows slowly, latency creeps up, nobody notices because the HTTP layer keeps returning 200.&lt;/p&gt;

&lt;p&gt;Then a launch happens. Or a viral tweet. Or a marketing email goes out. Inbound rate spikes 50x for 20 minutes. The queue accepts everything (it's unbounded). The worker pool can't keep up — LLM calls are inelastic, you can't parallelize past your token-per-minute quota. The queue grows to 200K items. Each item holds a request payload (~50KB with conversation history) and a future. 10GB of heap. OOM. Pod restart. All 200K requests lost. Users see 500s instead of the explicit "rate-limited, try again in 30s" they would have seen with proper backpressure.&lt;/p&gt;

&lt;p&gt;The fix is identical to the Go fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;request_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maxsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;http_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;request_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_nowait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QueueFull&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retry-After&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queued&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;503 is a feature. It is the system telling the client &lt;em&gt;we're at capacity, retry in 30 seconds&lt;/em&gt;. It is honest. It is bounded. It is the difference between a system that degrades gracefully and one that dies silently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reproducing This Yourself
&lt;/h2&gt;

&lt;p&gt;The numbers in this post come from a minimal Go program that fits in under 100 lines per command. The repo lives at:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/harrison001/channels-oom-demo" rel="noopener noreferrer"&gt;github.com/harrison001/channels-oom-demo&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/harrison001/channels-oom-demo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;channels-oom-demo

&lt;span class="c"&gt;# Watch goroutine count + heap climb every second&lt;/span&gt;
go run ./cmd/bug

&lt;span class="c"&gt;# Switch to the fix — flat at 3 goroutines, 5 MB heap&lt;/span&gt;
go run ./cmd/fix
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each program exposes pprof on &lt;code&gt;localhost:6060&lt;/code&gt;. While the bug version is running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Confirm 10K+ goroutines parked on chansend → runtime_chanrecv1&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s1"&gt;'http://localhost:6060/debug/pprof/goroutine?debug=1'&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;

&lt;span class="c"&gt;# Confirm the heap is dominated by Event payloads, not the channel itself&lt;/span&gt;
go tool pprof &lt;span class="nt"&gt;-text&lt;/span&gt; http://localhost:6060/debug/pprof/heap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The bug demo has a hard cap at 20,000 goroutines so it won't actually OOM your laptop. Remove the cap if you want to see the kernel finish the job.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Wish I'd Known
&lt;/h2&gt;

&lt;p&gt;If I could send one note back to myself eighteen months before the OOM:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you reach for an in-process queue, you are choosing a backpressure boundary. The buffer size is not a performance tuning knob. It is a contract: &lt;em&gt;under sustained load greater than my consumer's throughput, this is how much memory I am willing to lose before I tell the producer to stop.&lt;/em&gt; If you don't pick a number, the runtime picks one for you, and the number is &lt;em&gt;whatever fits in RAM right before the kernel kills the process.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Channels in Go look like message-passing because the syntax was deliberately borrowed from CSP, a model where independent processes communicate by passing values across an isolation boundary. In Go there is no isolation boundary. The channel is a struct in shared memory, the goroutines are coroutines on the same scheduler, and the entire setup is synchronization plumbing in CSP clothing.&lt;/p&gt;

&lt;p&gt;Once you see the &lt;code&gt;hchan&lt;/code&gt; struct, you can't un-see it. Every channel decision after that is a synchronization decision, not a transport decision. And synchronization decisions always have a capacity bound — you just have to choose whether to pick it explicitly or have the OOM-killer pick it for you.&lt;/p&gt;




&lt;h3&gt;
  
  
  Keep going
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code&lt;/strong&gt;: &lt;a href="https://github.com/harrison001/channels-oom-demo" rel="noopener noreferrer"&gt;&lt;code&gt;harrison001/channels-oom-demo&lt;/code&gt;&lt;/a&gt; — reproduce both versions, capture your own pprof&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next piece&lt;/strong&gt;: &lt;em&gt;Goroutines Are Cheap — Until Backpressure Is Missing&lt;/em&gt; — coming next. The producer side of the same mistake: why "just spawn a goroutine" is the second half of the bug.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subscribe&lt;/strong&gt;: I write one of these monthly on runtime mechanics, distributed systems postmortems, and the security implications of getting them wrong. &lt;a href="https://buttondown.com/harrisonsec" rel="noopener noreferrer"&gt;Newsletter&lt;/a&gt; · &lt;a href="https://harrisonsec.com/track/securitylab/" rel="noopener noreferrer"&gt;SecurityLab track&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If you've hit this bug — or its cousin in a different runtime — I'd genuinely like to hear about it. The Erlang and Node.js shapes especially: I have hunches but not enough scars. Reply to the newsletter or open an issue on the demo repo.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>go</category>
      <category>concurrency</category>
      <category>performance</category>
      <category>backend</category>
    </item>
    <item>
      <title>How I Improved an AI Agent from 40% to 60% — With A/B Test Data</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Tue, 12 May 2026 15:49:19 +0000</pubDate>
      <link>https://dev.to/harrisonsec/how-i-improved-an-ai-agent-from-40-to-60-with-ab-test-data-4f2i</link>
      <guid>https://dev.to/harrisonsec/how-i-improved-an-ai-agent-from-40-to-60-with-ab-test-data-4f2i</guid>
      <description>&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I was optimizing an AI agent for a production system — a creator agent that handles user requests like "make this character fiercer" or "rename this entity." The agent runs a 5-layer pipeline: Perceive → Cognate → Decide → Act → Express, with real LLM calls at each step.&lt;/p&gt;

&lt;p&gt;Quality was bad. Not "it doesn't work" bad — "it works 40% of the time" bad. The remaining 60% were wrong entity targeting, infinite reasoning loops, and silent failures.&lt;/p&gt;

&lt;p&gt;I ran 5 standardized test cases, each repeated 5 times (LLMs are non-deterministic), measuring pass rate:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;QL-001&lt;/td&gt;
&lt;td&gt;Create 4 entities + 1 relationship in one message&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-002&lt;/td&gt;
&lt;td&gt;Classify user intent correctly&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-003&lt;/td&gt;
&lt;td&gt;Update the right entity in a world with 6 characters + 4 locations&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-004&lt;/td&gt;
&lt;td&gt;Maintain context across long conversation&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-005&lt;/td&gt;
&lt;td&gt;Simple rename ("Ember" → "Infernia")&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Overall: 40% pass rate.&lt;/strong&gt; The model (equivalent to GPT-4 class) was plenty capable. Something else was wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Diagnosis: Context Was the Problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  QL-003: Why the Agent Confused Entities (40% → 80%)
&lt;/h3&gt;

&lt;p&gt;The user says: "Make Ember more fierce and give her fire breath."&lt;/p&gt;

&lt;p&gt;The world has 10 entities: 6 characters (Ember, Luna, Grak, Roland, Mira, Pip) and 4 locations. The agent's &lt;code&gt;BuildChatCompletionMessages&lt;/code&gt; function dumped ALL entity data into the prompt — every character's backstory, every location's description.&lt;/p&gt;

&lt;p&gt;The LLM had to find Ember in a wall of irrelevant text. Sometimes it picked Luna. Sometimes it referenced the wrong character's traits. Not because the model was stupid — because the context was noisy.&lt;/p&gt;

&lt;h3&gt;
  
  
  QL-005: Why Simple Rename Failed (20% → 80%)
&lt;/h3&gt;

&lt;p&gt;"Rename Ember to Infernia." One entity, one operation. Should be trivial.&lt;/p&gt;

&lt;p&gt;Two problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;No round limit — the agent sometimes looped 15+ times on a rename, reasoning tools firing endlessly&lt;/li&gt;
&lt;li&gt;When a tool failed, the LLM got: &lt;code&gt;{"error": true, "message": "This tool is temporarily unavailable."}&lt;/code&gt; — no context on what to do next&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model gave up or produced responses that didn't contain "Infernia."&lt;/p&gt;

&lt;h3&gt;
  
  
  QL-001: Why Multi-Step Creation Was Impossible (0% → 0%)
&lt;/h3&gt;

&lt;p&gt;"Create a dragon named Ember who lives in Crystal Caves. Ember has a rivalry with Sir Roland who guards the village gate."&lt;/p&gt;

&lt;p&gt;This requires creating 4 entities + 1 relationship. The 5-layer pipeline processes entities sequentially, each in isolation. The relationship creation doesn't know the knight was just created — there's no shared state between action steps.&lt;/p&gt;

&lt;p&gt;Both baseline and improved scored 0%. This is an architectural problem, not a context problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fixes: 8 Changes, 7 Pure Code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fix 1: PlanExecution (the only LLM call)
&lt;/h3&gt;

&lt;p&gt;One API call before the main loop. The LLM generates a plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Goal: Update Ember's properties
Steps: 1. Identify Ember entity  2. Apply personality changes
Tools needed: updateCharacter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This plan gets injected into the cognition layer's context. The intent classifier now sees a roadmap, not just raw entity data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; ~$0.003 per request, 3-5s latency. The only fix that uses an LLM call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 2: PrioritizeContext (pure code)
&lt;/h3&gt;

&lt;p&gt;Sort context items by salience score. Higher-relevance items go first. Low-relevance items dropped when the token budget is exceeded.&lt;/p&gt;

&lt;p&gt;When the user says "Make Ember fiercer," Ember's data gets priority. Luna's backstory gets dropped. The LLM sees signal, not noise.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Salience&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Salience&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tokenBudget&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. Pure sort + filter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 3: CompressContext (pure code)
&lt;/h3&gt;

&lt;p&gt;Old conversation rounds get summarized extractively — find tool names, find CONCLUSION markers, truncate the rest. No LLM needed for this level of compression.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. String operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 4: Preserve Conclusions (pure code)
&lt;/h3&gt;

&lt;p&gt;When reasoning text is truncated at 4,000 characters, the truncation used to cut wherever it landed. If the LLM decided "I need to rename Ember to Infernia" in round 1 but that conclusion was at character 4,100, round 2 forgot the decision.&lt;/p&gt;

&lt;p&gt;Fix: &lt;code&gt;truncateReasoningPreservingConclusions()&lt;/code&gt; finds CONCLUSION/DECISION markers and keeps them even when truncating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. String search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 5: Max Rounds Cap (pure code)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;DefaultMaxRounds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;roundCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DefaultMaxRounds&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Previously unlimited. The agent sometimes looped 15+ rounds on a trivial task. Now it stops at 10 and produces its best result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. One if-statement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 6: Structured Tool Errors (pure code)
&lt;/h3&gt;

&lt;p&gt;Before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tool_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"updateCharacter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This tool is temporarily unavailable."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tool_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"updateCharacter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This tool is temporarily unavailable."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"error_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"timeout"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"retryable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;retryable: true&lt;/code&gt;, the LLM knows to try again instead of giving up. With &lt;code&gt;error_type: "timeout"&lt;/code&gt;, it knows the issue is transient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. String classification.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 7: Circuit Breaker (pure code)
&lt;/h3&gt;

&lt;p&gt;Count failures per LLM provider. After 3 consecutive failures, skip that provider and try the fallback. Prevents the agent from burning through 120 seconds of timeout on a dead provider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. Counter + threshold.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 8: HTTP Client Reuse (pure code)
&lt;/h3&gt;

&lt;p&gt;Store &lt;code&gt;*http.Client&lt;/code&gt; on the provider struct, reuse across calls. Previously each call created a new client, a new TCP connection, a new TLS handshake.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Zero. Struct field.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;th&gt;After Fix&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;th&gt;What Fixed It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;QL-001&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;=&lt;/td&gt;
&lt;td&gt;Needs pipeline architecture change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-002&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;=&lt;/td&gt;
&lt;td&gt;Already working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-003&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+40%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PrioritizeContext + PlanExecution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-004&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;=&lt;/td&gt;
&lt;td&gt;Already working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QL-005&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+60%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Max rounds + structured errors + conclusion preservation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Overall: 40% → 60%. Same model. Better input.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Latency went from 26s to 43s due to the PlanExecution LLM call (~3-5s per test). The HTTP reuse and circuit breaker savings show up under concurrent load, not in a 5-test sequential run.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Didn't Improve — And Why
&lt;/h2&gt;

&lt;p&gt;QL-001 (multi-step creation) stayed at 0%. This isn't a context problem — it's a pipeline architecture problem. Each entity is created in isolation, and the IDs returned by each step are discarded before the next step runs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBVWyJVc2VyOiBDcmVhdGUgRW1iZXIgKyBSb2xhbmQ8YnIvPisgcml2YWxyeSBiZXR3ZWVuIHRoZW0iXQogICAgVSAtLT4gUzFbIlN0ZXAgMTxici8-Y3JlYXRlQ2hhcmFjdGVyKEVtYmVyKTxici8-4oaSIHJldHVybnMgZHJhZ29uX2lkXzQyIl0KICAgIFMxIC0uIHN0YXRlIGRpc2NhcmRlZCAuLT4gUzJbIlN0ZXAgMjxici8-Y3JlYXRlQ2hhcmFjdGVyKFJvbGFuZCk8YnIvPuKGkiByZXR1cm5zIGtuaWdodF9pZF83NyJdCiAgICBTMiAtLiBzdGF0ZSBkaXNjYXJkZWQgLi0-IFMzWyJTdGVwIDM8YnIvPmNyZWF0ZVJlbGF0aW9uc2hpcCg_LCA_KTxici8-bm8gSURzIGF2YWlsYWJsZSJdCiAgICBTMyAtLXggRlsiUmVsYXRpb25zaGlwIGZhaWxzPGJyLz5RTC0wMDE6IDAlIHBhc3MgcmF0ZSJd" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBVWyJVc2VyOiBDcmVhdGUgRW1iZXIgKyBSb2xhbmQ8YnIvPisgcml2YWxyeSBiZXR3ZWVuIHRoZW0iXQogICAgVSAtLT4gUzFbIlN0ZXAgMTxici8-Y3JlYXRlQ2hhcmFjdGVyKEVtYmVyKTxici8-4oaSIHJldHVybnMgZHJhZ29uX2lkXzQyIl0KICAgIFMxIC0uIHN0YXRlIGRpc2NhcmRlZCAuLT4gUzJbIlN0ZXAgMjxici8-Y3JlYXRlQ2hhcmFjdGVyKFJvbGFuZCk8YnIvPuKGkiByZXR1cm5zIGtuaWdodF9pZF83NyJdCiAgICBTMiAtLiBzdGF0ZSBkaXNjYXJkZWQgLi0-IFMzWyJTdGVwIDM8YnIvPmNyZWF0ZVJlbGF0aW9uc2hpcCg_LCA_KTxici8-bm8gSURzIGF2YWlsYWJsZSJdCiAgICBTMyAtLXggRlsiUmVsYXRpb25zaGlwIGZhaWxzPGJyLz5RTC0wMDE6IDAlIHBhc3MgcmF0ZSJd" alt="mermaid diagram" width="1607" height="118"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fixing this requires collapsing the 5-layer pipeline into a unified agent with cross-step state — a larger architectural change, not a context fix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; Context optimization has a ceiling. Past that ceiling, you need architecture changes. But the ceiling is higher than most people think — we still had 20% improvement available before hitting it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Still Missing
&lt;/h2&gt;

&lt;p&gt;Three pieces of infrastructure were built but not wired:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Gap&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;VerifyOutput&lt;/td&gt;
&lt;td&gt;Logs quality issues&lt;/td&gt;
&lt;td&gt;Doesn't retry on failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ScoreMemoryUsage&lt;/td&gt;
&lt;td&gt;Computes relevance scores&lt;/td&gt;
&lt;td&gt;Scores never applied to future retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PlanExecution&lt;/td&gt;
&lt;td&gt;Generates plan before loop&lt;/td&gt;
&lt;td&gt;Plan not tracked during execution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three are &lt;strong&gt;open loops&lt;/strong&gt;. The infrastructure detects problems but doesn't act on them. Closing these loops is the next 20% — getting from 60% to 80%+.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Better input → better output. The LLM is the same.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your agent is underperforming, check the context before blaming the model. In our case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;7 out of 8 fixes were pure code&lt;/li&gt;
&lt;li&gt;Zero additional LLM cost (except one planning call at $0.003)&lt;/li&gt;
&lt;li&gt;20% quality improvement without changing the model&lt;/li&gt;
&lt;li&gt;The model was always capable — the context was holding it back&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The highest-ROI investment in any agent system is context management. It's not glamorous. It's sort, filter, compress, truncate, prioritize. But it's the difference between 40% and 60% — and the foundation for everything else.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the AI Agent Architecture series. See also: &lt;a href="https://harrisonsec.com/blog/ai-agent-90-percent-problem/" rel="noopener noreferrer"&gt;The 90% Problem&lt;/a&gt; for the broader framework, and &lt;a href="https://harrisonsec.com/blog/claude-code-context-engineering-compression-pipeline/" rel="noopener noreferrer"&gt;Claude Code Deep Dive Part 3&lt;/a&gt; for how Anthropic solves context at scale.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>contextmanagement</category>
      <category>engineering</category>
      <category>testing</category>
    </item>
    <item>
      <title>Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Wed, 06 May 2026 15:50:16 +0000</pubDate>
      <link>https://dev.to/harrisonsec/consistency-in-distributed-systems-scenarios-trade-offs-and-what-actually-works-42fd</link>
      <guid>https://dev.to/harrisonsec/consistency-in-distributed-systems-scenarios-trade-offs-and-what-actually-works-42fd</guid>
      <description>&lt;p&gt;There's an impulse, when someone first learns about consistency models in distributed systems, to want to classify the taxonomy into neat drawers. Strong here. Eventual there. Linearizable above it. Read-your-writes below. Study the diagram, pass the interview.&lt;/p&gt;

&lt;p&gt;That taxonomy is real, but it's not useful the way people think. Production systems don't pick a consistency model and run with it. They pick a different model per feature, often per &lt;em&gt;type of operation&lt;/em&gt; within a feature, and spend most of their engineering effort on the gaps between what the model provides and what users actually expect. The taxonomy is the menu. The interesting question is which dish each scenario needs.&lt;/p&gt;

&lt;p&gt;This is a working engineer's walk through ten real consistency scenarios — from the obvious ones (money transfers need strong) to the less obvious (collaborative editing, notification feeds, analytic dashboards) — with the specific engineering that makes each one work.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Consistency is not a global system property; it's a per-operation property. A well-designed distributed system picks different consistency levels for different operations based on what users actually notice, what the business actually requires, and what latency budget each operation has. The CAP-theorem framing ("pick 2 of 3") is a caricature; real systems use PACELC (which adds the latency trade-off during normal operation) and pick per-feature.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Frames That Matter
&lt;/h2&gt;

&lt;p&gt;Before scenarios, three frames you actually use in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CAP&lt;/strong&gt; (Consistency, Availability, Partition tolerance, pick 2). Useful as a first-week mental model. Misleading if taken literally, because (a) you can't give up partition tolerance in a real network, and (b) the choice isn't binary — you can tune per operation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PACELC&lt;/strong&gt;: if there's a Partition, pick A (availability) or C (consistency). Else, pick L (latency) or C (consistency). Adds the latency trade-off you pay during normal operation, which is where 99% of design decisions actually live. A system that's "consistent when no partition" but pays 50ms of cross-region round-trip for every write has made a latency-vs-consistency call, not a CAP call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consistency models, from strongest to weakest&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linearizable&lt;/strong&gt;: operations appear to happen instantaneously, in a total order consistent with real time. The strongest practical model. Expensive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sequential&lt;/strong&gt;: operations appear in a total order, but not necessarily aligned with real time. Slightly weaker, slightly cheaper.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Causal&lt;/strong&gt;: if event A causally precedes event B, every observer sees A before B. Preserves the "this reply should appear after the comment it replied to" property.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-your-writes&lt;/strong&gt;: you see the effects of your own operations, even if other users don't yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monotonic read&lt;/strong&gt;: once you see a value, you won't see an older value later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eventual&lt;/strong&gt;: if writes stop, replicas eventually converge. No ordering guarantees during the transient.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't need to memorize these. You need to recognize which one each feature actually needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBTdHJvbmdbIlN0cm9uZ2VyIMK3IG1vcmUgZXhwZW5zaXZlIl0KICAgICAgICBMWyJMaW5lYXJpemFibGU8YnIvPk1vbmV5IHRyYW5zZmVyIMK3IGRpc3RyaWJ1dGVkIGxvY2tzIl0KICAgICAgICBTUVsiU2VxdWVudGlhbDxici8-TXVsdGktbGVhZGVyIHdpdGggY2xvY2sgc3luYyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBNaWRbIk1pZGRsZSBncm91bmQgwrcgdXN1YWxseSB0aGUgcmlnaHQgYW5zd2VyIl0KICAgICAgICBDQVsiQ2F1c2FsPGJyLz5Tb2NpYWwgZmVlZCDCtyByZXBsaWVzIGFmdGVyIGNvbW1lbnRzIl0KICAgICAgICBSWVsiUmVhZC15b3VyLXdyaXRlczxici8-VXNlciBwcm9maWxlIMK3IHNldHRpbmdzIl0KICAgICAgICBNUlsiTW9ub3RvbmljIHJlYWQ8YnIvPlBhZ2luYXRpb24gwrcgZGFzaGJvYXJkcyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBXZWFrWyJXZWFrZXIgwrcgY2hlYXBlciBhbmQgZmFzdGVyIl0KICAgICAgICBFQ1siRXZlbnR1YWw8YnIvPkNvdW50ZXJzIMK3IGFuYWx5dGljcyDCtyBDRE4iXQogICAgZW5kCgogICAgU3Ryb25nIC0tPiBNaWQgLS0-IFdlYWsKCiAgICBjbGFzc0RlZiBzdHJvbmcgZmlsbDojZmVkN2Q3LHN0cm9rZTojYzUzMDMwCiAgICBjbGFzc0RlZiBtaWQgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiB3ZWFrIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgU3Ryb25nIHN0cm9uZwogICAgY2xhc3MgTWlkIG1pZAogICAgY2xhc3MgV2VhayB3ZWFr" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBTdHJvbmdbIlN0cm9uZ2VyIMK3IG1vcmUgZXhwZW5zaXZlIl0KICAgICAgICBMWyJMaW5lYXJpemFibGU8YnIvPk1vbmV5IHRyYW5zZmVyIMK3IGRpc3RyaWJ1dGVkIGxvY2tzIl0KICAgICAgICBTUVsiU2VxdWVudGlhbDxici8-TXVsdGktbGVhZGVyIHdpdGggY2xvY2sgc3luYyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBNaWRbIk1pZGRsZSBncm91bmQgwrcgdXN1YWxseSB0aGUgcmlnaHQgYW5zd2VyIl0KICAgICAgICBDQVsiQ2F1c2FsPGJyLz5Tb2NpYWwgZmVlZCDCtyByZXBsaWVzIGFmdGVyIGNvbW1lbnRzIl0KICAgICAgICBSWVsiUmVhZC15b3VyLXdyaXRlczxici8-VXNlciBwcm9maWxlIMK3IHNldHRpbmdzIl0KICAgICAgICBNUlsiTW9ub3RvbmljIHJlYWQ8YnIvPlBhZ2luYXRpb24gwrcgZGFzaGJvYXJkcyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBXZWFrWyJXZWFrZXIgwrcgY2hlYXBlciBhbmQgZmFzdGVyIl0KICAgICAgICBFQ1siRXZlbnR1YWw8YnIvPkNvdW50ZXJzIMK3IGFuYWx5dGljcyDCtyBDRE4iXQogICAgZW5kCgogICAgU3Ryb25nIC0tPiBNaWQgLS0-IFdlYWsKCiAgICBjbGFzc0RlZiBzdHJvbmcgZmlsbDojZmVkN2Q3LHN0cm9rZTojYzUzMDMwCiAgICBjbGFzc0RlZiBtaWQgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiB3ZWFrIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgU3Ryb25nIHN0cm9uZwogICAgY2xhc3MgTWlkIG1pZAogICAgY2xhc3MgV2VhayB3ZWFr" alt="flowchart LR" width="1904" height="189"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Moving left to right: cheaper, faster, less coordinated — and more work you do in application code to close the gap between what the model gives you and what users expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ten Scenarios
&lt;/h2&gt;

&lt;p&gt;Quick reference — each row is expanded into its own section below.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Consistency Model&lt;/th&gt;
&lt;th&gt;Key Technique&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Money transfer between accounts&lt;/td&gt;
&lt;td&gt;Linearizable&lt;/td&gt;
&lt;td&gt;Synchronous quorum + idempotency keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Inventory decrement, hot key&lt;/td&gt;
&lt;td&gt;Strong w/ sharding&lt;/td&gt;
&lt;td&gt;Reserved-inventory buckets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;User profile update&lt;/td&gt;
&lt;td&gt;Read-your-writes&lt;/td&gt;
&lt;td&gt;Session timestamp + sticky read&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Social media feed&lt;/td&gt;
&lt;td&gt;Causal&lt;/td&gt;
&lt;td&gt;Version vectors / Lamport timestamps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Collaborative document editing&lt;/td&gt;
&lt;td&gt;Eventual + CRDT/OT&lt;/td&gt;
&lt;td&gt;Conflict-free operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Ad click counter&lt;/td&gt;
&lt;td&gt;Eventual&lt;/td&gt;
&lt;td&gt;Local shard + async aggregation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Multi-region primary/secondary&lt;/td&gt;
&lt;td&gt;Eventual + RYW on demand&lt;/td&gt;
&lt;td&gt;Primary routing per write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Distributed lock / leader election&lt;/td&gt;
&lt;td&gt;Linearizable&lt;/td&gt;
&lt;td&gt;Raft/Paxos consensus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Analytics dashboard&lt;/td&gt;
&lt;td&gt;Append-only / none&lt;/td&gt;
&lt;td&gt;Stream → warehouse ETL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Cross-service orchestration&lt;/td&gt;
&lt;td&gt;Saga&lt;/td&gt;
&lt;td&gt;Local txns + compensations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  1. Money Transfer Between Accounts
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: strict linearizability. No double-spend. No lost updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: transactional database with serializable isolation, or a strongly-consistent coordination layer (Paxos/Raft quorum). Typical implementation: single-region primary Postgres with synchronous replication, or a distributed SQL (Spanner, CockroachDB, YugabyteDB) with linearizable reads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you give up&lt;/strong&gt;: latency (especially cross-region), availability during partitions. This is the right trade — a bank doesn't tolerate double-spend to save 30ms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key engineering&lt;/strong&gt;: idempotency keys on every request, deduplication at the persistence layer, well-audited transaction boundaries. Strong consistency at the DB isn't enough if your retry logic double-writes.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Inventory Decrement with High Contention
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: "no overselling" without blocking every request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: the classic "hot key" problem. Options in ascending sophistication:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pessimistic locking&lt;/strong&gt; — &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; on the inventory row. Works; serializes hot items. Under peak traffic on Black Friday, this queues up and tail latencies explode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimistic concurrency&lt;/strong&gt; — read version, decrement, compare-and-swap. Retries on conflict. Better tail latency at moderate contention, worse at very high contention (retry storms).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reserved-inventory buckets&lt;/strong&gt; — maintain N "shards" of available inventory, route requests to a random shard, only one shard hits contention at a time. Sacrifices a small amount of overselling risk (if shard A has 5 left but shard B has 0, a user might get told "out of stock" while 5 remain total) for huge throughput wins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best-effort with async reconciliation&lt;/strong&gt; — accept orders optimistically, reconcile at a background worker, cancel overbooks with apology emails. Used by event-ticketing sites for popular drops.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right choice depends on business rules. If overselling by 1% is unacceptable, pessimistic. If overselling by 0.1% is tolerable and user-experience matters, shard or async reconcile.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. User Profile Update
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: read-your-writes. After I save my display name, I see it on next page load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: sticky reads. Either the session pins to the write replica for a short window, or the application tracks a "last write timestamp" per user and refuses to serve reads from a replica that hasn't caught up.&lt;/p&gt;

&lt;p&gt;The naive alternative — "eventually consistent, just retry" — breaks user expectations immediately. "I updated my name and it didn't save" is one of the most expensive support tickets on a per-incident basis, because the user has no way to distinguish "didn't save" from "saved but replication is lagging."&lt;/p&gt;

&lt;p&gt;The engineering is not glamorous. A session cookie that carries &lt;code&gt;last_write_ts&lt;/code&gt;, a read path that asserts &lt;code&gt;replica.latest_ts &amp;gt;= last_write_ts&lt;/code&gt;, and a fallback to the primary if the assertion fails. Most frameworks don't give you this for free; you build it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBVIGFzIFVzZXIKICAgIHBhcnRpY2lwYW50IEFwcCBhcyBBcHAgU2VydmVyCiAgICBwYXJ0aWNpcGFudCBQIGFzIFByaW1hcnkKICAgIHBhcnRpY2lwYW50IFIgYXMgUmVwbGljYQoKICAgIFUtPj5BcHA6IFVwZGF0ZSBwcm9maWxlCiAgICBBcHAtPj5QOiBXcml0ZSAocmV0dXJucyBjb21taXRfdHMgVDEpCiAgICBQLS0-PkFwcDogb2sKICAgIEFwcC0tPj5VOiBSZXNwb25zZSArIGNvb2tpZSB7bGFzdF93cml0ZV90czogVDF9CgogICAgTm90ZSBvdmVyIFUsQXBwOiBOZXh0IHBhZ2UgbG9hZCDigJQgdXNlciBoYXMgY29va2llIFQxCiAgICBVLT4-QXBwOiBSZWFkIHByb2ZpbGUgKGNvb2tpZSBUMSkKICAgIEFwcC0-PlI6IHJlcGxpY2FfdHMgPj0gVDEgPwoKICAgIGFsdCBSZXBsaWNhIGNhdWdodCB1cAogICAgICAgIFItLT4-QXBwOiB5ZXMKICAgICAgICBBcHAtLT4-VTogUHJvZmlsZSAoZnJvbSByZXBsaWNhIMK3IGZhc3QpCiAgICBlbHNlIFJlcGxpY2EgbGFnZ2luZwogICAgICAgIFItLT4-QXBwOiBubywgcmVwbGljYV90cyA8IFQxCiAgICAgICAgQXBwLT4-UDogUmVhZCBmcm9tIHByaW1hcnkKICAgICAgICBQLS0-PkFwcDogUHJvZmlsZSBkYXRhCiAgICAgICAgQXBwLS0-PlU6IFByb2ZpbGUgKGZyb20gcHJpbWFyeSDCtyBjb3JyZWN0KQogICAgZW5k" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBVIGFzIFVzZXIKICAgIHBhcnRpY2lwYW50IEFwcCBhcyBBcHAgU2VydmVyCiAgICBwYXJ0aWNpcGFudCBQIGFzIFByaW1hcnkKICAgIHBhcnRpY2lwYW50IFIgYXMgUmVwbGljYQoKICAgIFUtPj5BcHA6IFVwZGF0ZSBwcm9maWxlCiAgICBBcHAtPj5QOiBXcml0ZSAocmV0dXJucyBjb21taXRfdHMgVDEpCiAgICBQLS0-PkFwcDogb2sKICAgIEFwcC0tPj5VOiBSZXNwb25zZSArIGNvb2tpZSB7bGFzdF93cml0ZV90czogVDF9CgogICAgTm90ZSBvdmVyIFUsQXBwOiBOZXh0IHBhZ2UgbG9hZCDigJQgdXNlciBoYXMgY29va2llIFQxCiAgICBVLT4-QXBwOiBSZWFkIHByb2ZpbGUgKGNvb2tpZSBUMSkKICAgIEFwcC0-PlI6IHJlcGxpY2FfdHMgPj0gVDEgPwoKICAgIGFsdCBSZXBsaWNhIGNhdWdodCB1cAogICAgICAgIFItLT4-QXBwOiB5ZXMKICAgICAgICBBcHAtLT4-VTogUHJvZmlsZSAoZnJvbSByZXBsaWNhIMK3IGZhc3QpCiAgICBlbHNlIFJlcGxpY2EgbGFnZ2luZwogICAgICAgIFItLT4-QXBwOiBubywgcmVwbGljYV90cyA8IFQxCiAgICAgICAgQXBwLT4-UDogUmVhZCBmcm9tIHByaW1hcnkKICAgICAgICBQLS0-PkFwcDogUHJvZmlsZSBkYXRhCiAgICAgICAgQXBwLS0-PlU6IFByb2ZpbGUgKGZyb20gcHJpbWFyeSDCtyBjb3JyZWN0KQogICAgZW5k" alt="sequenceDiagram" width="1026" height="866"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Social Media Feed
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: causal consistency for comments and replies. Eventual consistency everywhere else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: two-tier. Posts and likes are written to a local region with async replication. Replies are linked to their parent post with an explicit cause-precedes relationship — the reply's store won't surface the reply until the parent has propagated.&lt;/p&gt;

&lt;p&gt;The CRDT-adjacent pattern (version vectors, Lamport timestamps) sits underneath, but you don't usually expose it to the application. What you expose is "here's the list of replies, in causally-consistent order." What the user sees: "I replied to a comment, and my reply appears under it" — which is exactly the mental model they expect.&lt;/p&gt;

&lt;p&gt;What you save by not using strong consistency everywhere: low write latency (local region only), high availability during partitions, and the ability to handle massive fan-out (a celebrity's post propagating to 40M followers doesn't need to wait on a single coordinator).&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Collaborative Document Editing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: offline-first, multi-user concurrent edits, always-eventually-converge, no lost updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: CRDT (conflict-free replicated data type) or OT (operational transformation). This is one of the few spots where CRDTs genuinely shine. The underlying math guarantees that any two replicas will converge to the same state, regardless of the order operations arrive in, as long as all operations eventually reach all replicas.&lt;/p&gt;

&lt;p&gt;Google Docs uses a version of OT. Figma uses multivalue registers and CRDT-adjacent primitives. Notion uses a mix. The common property: any user can edit while offline, sync when reconnected, and the final document reflects all edits.&lt;/p&gt;

&lt;p&gt;What you give up: simplicity. CRDT implementations are subtle, and naive "last-write-wins" semantics are almost never what the user wants (their previous sentence vanished, not merged).&lt;/p&gt;

&lt;p&gt;What you gain: offline support without an ugly "you've been offline, your changes may conflict" modal.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Ad Click Counter
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: eventual consistency, very high write throughput, lossy-okay for a tiny fraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: local counter per shard, periodic aggregation to central store. Writes are fire-and-forget to a stream (Kafka, Kinesis). Reads come from a precomputed aggregate that's a few seconds stale.&lt;/p&gt;

&lt;p&gt;Why this works: no advertiser is going to detect the difference between "47,312 clicks" and "47,318 clicks" in their dashboard. Counting-with-precision across a global distributed system is ten times harder than counting approximately. Do the latter.&lt;/p&gt;

&lt;p&gt;What's non-obvious: the system should be &lt;em&gt;designed&lt;/em&gt; for approximate counts, with explicit tolerance in the SLA ("counts are accurate to within 0.01% and updated every 30 seconds"). If you don't say that upfront, someone will eventually ask "why don't our counts match the backend logs exactly" and you'll be in a two-week project to eliminate errors that never mattered.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Multi-Region Primary / Secondary
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: fast reads in every region, writes can live in one region.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: primary-in-region-A, async replication to regions B/C/D. Reads in B/C/D may lag the primary by milliseconds to seconds. Writes always route to A.&lt;/p&gt;

&lt;p&gt;Consistency model you're serving: &lt;strong&gt;eventual, with read-your-writes available on demand&lt;/strong&gt; (see scenario 3). Reads from the primary are strongly consistent; reads from secondaries are lagged but fast.&lt;/p&gt;

&lt;p&gt;Key engineering: the client SDK should know which operations need primary reads (after a recent write, for "show me the thing I just wrote" operations) and which can hit secondaries (dashboards, history views, anything time-insensitive).&lt;/p&gt;

&lt;p&gt;This is where most backend systems actually live. The bulk of reads go to secondaries — cheap, fast. A small percentage route to primary for freshness. Latency and availability both win.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Distributed Lock / Leader Election
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: exactly one leader, no split-brain, sometimes-unavailable-is-okay.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: a consensus system (Zookeeper, etcd, Consul — all Raft or Paxos variants). Acquire the lock or lease, renew it, do the work, release. If you lose the network partition, the other side knows you lost it because it couldn't renew.&lt;/p&gt;

&lt;p&gt;The classic failure: leader election on top of Redis. Redis is not a consensus system. RedLock has well-documented failure modes — it is not safe for correctness-critical locking. Use etcd. Use Zookeeper. Use a real consensus system. The tempting shortcut will, eventually, bite.&lt;/p&gt;

&lt;p&gt;What consensus buys you: guaranteed linearizability for operations on the lock/lease. What it costs: every operation is a quorum round-trip. That's fine for leader election (infrequent). It's not fine for a hot write path (use a different mechanism).&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Analytics Dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: all the data, eventually, in a queryable form. No urgency on freshness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: stream writes to a durable log (Kafka), have an ETL job populate a columnar warehouse (BigQuery, ClickHouse, Snowflake) on a schedule. Dashboards query the warehouse. Data is minutes to hours stale.&lt;/p&gt;

&lt;p&gt;Consistency model: none, in the traditional sense. You have an append-only log and a materialized view. The view is eventually consistent with the log, and that's the whole contract.&lt;/p&gt;

&lt;p&gt;This is simple but worth calling out because people sometimes try to do analytics against the operational database directly ("we'll run these queries on the primary, it'll be fine"). It will not be fine. Analytic queries are different workload shapes — they want columnar storage, aggressive parallelism, no transactional overhead. Put them in a warehouse.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Cross-Service Orchestration (Saga)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Needs&lt;/strong&gt;: multi-step business flow across services — create order, reserve inventory, charge payment, schedule shipment. Each step might fail. The system should end up in a consistent state either way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: Saga. Each step is a local transaction in its own service. For each step, you also define a &lt;em&gt;compensating&lt;/em&gt; step that undoes it. If a step fails partway through, you run compensations for the earlier steps in reverse:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBPIGFzIE9yY2hlc3RyYXRvcgogICAgcGFydGljaXBhbnQgT3JkZXIgYXMgT3JkZXIgU2VydmljZQogICAgcGFydGljaXBhbnQgSW52IGFzIEludmVudG9yeQogICAgcGFydGljaXBhbnQgUGF5IGFzIFBheW1lbnQKCiAgICBPLT4-T3JkZXI6IENyZWF0ZSBvcmRlcgogICAgT3JkZXItLT4-Tzogb2sKICAgIE8tPj5JbnY6IFJlc2VydmUgaW52ZW50b3J5CiAgICBJbnYtLT4-Tzogb2sKICAgIE8tPj5QYXk6IENoYXJnZSBwYXltZW50CiAgICBQYXktLT4-TzogZmFpbGVkCgogICAgTm90ZSBvdmVyIE8sUGF5OiBSdW4gY29tcGVuc2F0aW9ucyBpbiByZXZlcnNlCiAgICBPLT4-SW52OiBSZWxlYXNlIGludmVudG9yeQogICAgSW52LS0-Pk86IG9rCiAgICBPLT4-T3JkZXI6IENhbmNlbCBvcmRlcgogICAgT3JkZXItLT4-Tzogb2s%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBPIGFzIE9yY2hlc3RyYXRvcgogICAgcGFydGljaXBhbnQgT3JkZXIgYXMgT3JkZXIgU2VydmljZQogICAgcGFydGljaXBhbnQgSW52IGFzIEludmVudG9yeQogICAgcGFydGljaXBhbnQgUGF5IGFzIFBheW1lbnQKCiAgICBPLT4-T3JkZXI6IENyZWF0ZSBvcmRlcgogICAgT3JkZXItLT4-Tzogb2sKICAgIE8tPj5JbnY6IFJlc2VydmUgaW52ZW50b3J5CiAgICBJbnYtLT4-Tzogb2sKICAgIE8tPj5QYXk6IENoYXJnZSBwYXltZW50CiAgICBQYXktLT4-TzogZmFpbGVkCgogICAgTm90ZSBvdmVyIE8sUGF5OiBSdW4gY29tcGVuc2F0aW9ucyBpbiByZXZlcnNlCiAgICBPLT4-SW52OiBSZWxlYXNlIGludmVudG9yeQogICAgSW52LS0-Pk86IG9rCiAgICBPLT4-T3JkZXI6IENhbmNlbCBvcmRlcgogICAgT3JkZXItLT4-Tzogb2s%3D" alt="sequenceDiagram" width="850" height="664"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not all compensations are symmetric — you can't un-send an email, you can't un-refund a payment. But for most business flows, you can design compensations that leave the system in a consistent-enough state.&lt;/p&gt;

&lt;p&gt;The alternative — 2PC (two-phase commit) across all services — is real but rarely used. 2PC requires every participant to support the protocol, holds locks while waiting, and blocks the whole transaction if any participant is slow or down. For services owned by different teams on different storage engines, 2PC doesn't scale.&lt;/p&gt;

&lt;p&gt;Saga engineering concerns: saga orchestrators (a coordinator service that runs the state machine) vs saga choreography (each service emits events that trigger the next). Orchestrators are simpler to reason about. Choreography scales further but can produce spaghetti.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Meta-Rule
&lt;/h2&gt;

&lt;p&gt;Walking through those ten: the choice isn't really "which consistency model is best for my system." It's "which consistency model does this specific operation need, given what users expect to see."&lt;/p&gt;

&lt;p&gt;Most production systems use all of the following, in different places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong/linearizable consistency for anything money-related.&lt;/li&gt;
&lt;li&gt;Read-your-writes for user-visible writes that users need to see immediately.&lt;/li&gt;
&lt;li&gt;Causal consistency for feed-like data where ordering matters.&lt;/li&gt;
&lt;li&gt;Eventual consistency for counters, analytics, and anything where approximate-and-fast beats exact-and-slow.&lt;/li&gt;
&lt;li&gt;CRDTs (narrowly) for collaborative editing and specific offline-first features.&lt;/li&gt;
&lt;li&gt;Saga for cross-service business flows.&lt;/li&gt;
&lt;li&gt;Consensus (Zookeeper/etcd) for the very few things that actually need leader election or distributed locks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The engineering decision is not "pick a consistency level for the whole system." It's "for this specific feature, what consistency level does the user need to experience, what trade-offs does the stronger version cost, and can we engineer the weaker version to feel as good?"&lt;/p&gt;

&lt;p&gt;That last clause matters. A read-your-writes layer on top of eventual consistency often &lt;em&gt;feels&lt;/em&gt; strongly consistent to users while actually being cheap to operate. Users don't experience consistency models; they experience whether their updates show up, whether their comments appear in order, whether their refund matches what they expected. Engineering consistency is about closing the gap between the model you can afford and the experience the user requires.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Anti-Patterns
&lt;/h2&gt;

&lt;p&gt;A few shapes that show up repeatedly in code reviews:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Retry until consistent."&lt;/strong&gt; Seen in code that does a write, then reads from a secondary and loops until it sees the write. Works on the happy path, deadlocks on partition, creates unbounded retry storms under load. Use read-your-writes through a session token instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"We'll use eventual consistency for speed."&lt;/strong&gt; Used as a justification for skipping engineering. Yes, eventual is faster. The engineering to make it &lt;em&gt;feel&lt;/em&gt; correct (causal ordering, conflict resolution, read-your-writes fallback) is what you're skipping — and users will notice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Just use Redis for leader election."&lt;/strong&gt; Already mentioned. Redlock is not safe. If you're doing anything correctness-critical with leader election, use a real consensus system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Saga with no compensations."&lt;/strong&gt; "What happens if step 3 fails?" "Oh, we'll fix it manually." That's a saga you haven't designed. It's a half-finished state machine waiting to corrupt data. Design the compensations before you ship.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Strong consistency everywhere, for safety."&lt;/strong&gt; Default-safe sounds responsible. It also means your read latency is 50ms minimum, you can't serve a region during a partition, and the cost per query is high. Users rarely need strong consistency everywhere. They need it in a few specific places.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Senior Move
&lt;/h2&gt;

&lt;p&gt;Consistency is a user-experience feature, not a system property. The right question at design time isn't "what consistency model does our database provide" — it's "what does the user need to see, in what order, with what freshness, with what tolerance for partial failure."&lt;/p&gt;

&lt;p&gt;Most of the work in a well-designed distributed system is engineering &lt;em&gt;around&lt;/em&gt; the consistency model the storage layer provides: sticky reads, session tokens, version vectors, compensating actions, explicit ordering, user-visible "your change is saved" confirmations. The model is the floor; the engineering lifts the experience to what users actually expect.&lt;/p&gt;

&lt;p&gt;The difference between senior and junior distributed-systems work often shows up here. Junior picks a model and fights everything else to conform. Senior picks the model per-feature, builds the engineering scaffolding that closes the gap, and ships something that feels right to users — even though underneath, ten different operations run on five different consistency levels.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/fail-fast-bounded-resilience-distributed-systems/" rel="noopener noreferrer"&gt;Why Your "Fail-Fast" Strategy is Killing Your Distributed System&lt;/a&gt; — another "the system property is not the user experience" essay.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/rpc-vs-nats-who-owns-completion/" rel="noopener noreferrer"&gt;RPC vs NATS: It's Not About Sync vs Async — It's About Who Owns Completion&lt;/a&gt; — who owns completion is one of the things consistency models don't address.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/four-pillars-modern-concurrency-locks-to-actors/" rel="noopener noreferrer"&gt;From Locks to Actors: The Four Pillars of Modern Concurrency&lt;/a&gt; — the concurrency side of the same general question.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/nats-kafka-mqtt-same-category-different-jobs/" rel="noopener noreferrer"&gt;NATS vs Kafka vs MQTT: Same Category, Very Different Jobs&lt;/a&gt; — the durability choice that enables some of the consistency patterns here.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>distributedsystems</category>
      <category>consistency</category>
      <category>saga</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Don't Pick One AI. Run Three Against Each Other.</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Sun, 03 May 2026 19:18:57 +0000</pubDate>
      <link>https://dev.to/harrisonsec/dont-pick-one-ai-run-three-against-each-other-3d27</link>
      <guid>https://dev.to/harrisonsec/dont-pick-one-ai-run-three-against-each-other-3d27</guid>
      <description>&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;AI can write code, generate content, analyze data, design systems, and manage projects. It's getting better every month. The natural question: what's left for humans?&lt;/p&gt;

&lt;p&gt;The wrong answer: "AI will replace us."&lt;br&gt;
The other wrong answer: "AI is just a tool, nothing changes."&lt;/p&gt;

&lt;p&gt;The right answer is uncomfortable: stop picking the best AI. Run multiple AIs in competition, and become the judge.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Tournament Model
&lt;/h2&gt;

&lt;p&gt;Three rules, learned the hard way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple advisors, competing opinions.&lt;/strong&gt; Don't bind to one AI — its bias becomes yours. Three models running the same task surface blind spots no single model catches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You decide.&lt;/strong&gt; After the AIs argue, you make the call. Not the smartest model — you. The one with context they don't have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Results judge everyone.&lt;/strong&gt; Did the call work? Keep it. Did it fail? Learn and move on. Never blame the AI — you chose to follow that advice.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the operating system for the AI age.&lt;/p&gt;
&lt;h2&gt;
  
  
  In Practice: Three AIs in One Window
&lt;/h2&gt;

&lt;p&gt;Theory is cheap. The reason most people don't run multiple AIs is friction — opening three terminals, signing in to three CLIs, and pasting the same prompt three times kills the loop after one day.&lt;/p&gt;

&lt;p&gt;So I wrote two small scripts and one tmux config to remove the friction. They live in this repo: &lt;strong&gt;&lt;a href="https://github.com/harrison001/ai-tournament" rel="noopener noreferrer"&gt;harrison001/ai-tournament&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;script&lt;/th&gt;
&lt;th&gt;what it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;prj&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;one command opens a tmux window with &lt;strong&gt;codex&lt;/strong&gt;, &lt;strong&gt;claude&lt;/strong&gt;, and &lt;strong&gt;gemini&lt;/strong&gt; running side-by-side, plus a shell pane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tmx&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;fzf picker to switch between tournament sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tmux.conf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;binds &lt;code&gt;&amp;lt;prefix&amp;gt; b&lt;/code&gt; to broadcast — type once, all three AIs receive it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ prj ai-tournament       # opens tmux: codex | claude | gemini | shell
$ &amp;lt;prefix&amp;gt; b              # broadcast ON — status bar turns red
$ &amp;lt;type the question&amp;gt;     # all three receive the same prompt
$ &amp;lt;prefix&amp;gt; b              # broadcast OFF, read each answer, pick the winner
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A real run
&lt;/h3&gt;

&lt;p&gt;To test the loop I asked all three the same dumb question I had asked myself five minutes earlier:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I want to wash my car. The car wash is 30 meters from my house. Should I drive or walk?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here is what came back, unedited:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddkoqng29q5ybem8rcx6.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddkoqng29q5ybem8rcx6.webp" alt="Three AIs answering the same question in tiled tmux panes — codex and claude both say " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;codex&lt;/strong&gt;: &lt;em&gt;"Walk. For 30 meters, driving adds unnecessary startup, parking, and maneuvering, and it is barely faster unless you need to move supplies or the car itself into a wash bay."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;claude&lt;/strong&gt;: &lt;em&gt;"Walk. At 30 meters, driving takes longer once you factor in starting the car, and you'd just be sitting in a freshly washed car driving it home through whatever's on the road. Walk over, drive back clean."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gemini&lt;/strong&gt;: &lt;em&gt;"You should drive. Although 30 meters is a very short distance to walk, your car needs to be physically present at the facility to be cleaned."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two out of three confidently gave a wrong answer. They optimized the surface question — &lt;em&gt;short distance, walking is fine&lt;/em&gt; — and missed the function of a car wash. If I had asked only the popular one, I would have walked over to wait in line for a service that requires a car.&lt;/p&gt;

&lt;p&gt;Only &lt;strong&gt;gemini&lt;/strong&gt; caught the obvious thing: the car has to be there.&lt;/p&gt;

&lt;p&gt;This is what the tournament model is for. It is not "three AIs are smarter than one." Two of them were less smart than one. The point is &lt;strong&gt;the divergence becomes visible&lt;/strong&gt;, and the human is the one who picks. With a single AI, you never see the disagreement — you just inherit whichever bias that model happened to have.&lt;/p&gt;

&lt;p&gt;The car wash is a toy example. Replace it with &lt;em&gt;"should we go gRPC, NATS, or HTTP for service-to-service?"&lt;/em&gt; and the same pattern holds — except the cost of picking the confident-but-wrong answer is no longer a wasted afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Principles
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use Multiple AIs — Don't Bind to One
&lt;/h3&gt;

&lt;p&gt;Claude, Gemini, GPT, Codex — they're all advisors. Each has strengths. Each has blind spots. Using only one AI is like having only one advisor: you inherit all their biases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;One AI:     The model's bias becomes your bias
Three AIs:  Biases cancel out, blind spots get covered
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I write content using three AI models simultaneously. Same task, three outputs. I don't ask them to divide the work — I ask them to &lt;strong&gt;compete&lt;/strong&gt;. The best output wins. The others get discarded.&lt;/p&gt;

&lt;p&gt;This is not "AI-assisted writing." This is a tournament where AI models compete and the human judges.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Compete, Don't Divide
&lt;/h3&gt;

&lt;p&gt;Most people who use multiple AIs assign each one a role: "Claude for writing, GPT for coding, Gemini for research." That's division of labor. It's a planned economy.&lt;/p&gt;

&lt;p&gt;The tournament model is a &lt;strong&gt;market economy&lt;/strong&gt;: same task to all, let results determine who's best.&lt;/p&gt;

&lt;p&gt;Why competition beats division:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Division relies on your judgment of which AI is better at what — and that judgment is constantly wrong as models update&lt;/li&gt;
&lt;li&gt;Competition is self-correcting — if GPT suddenly gets better at writing, it starts winning writing tasks. No reconfiguration needed&lt;/li&gt;
&lt;li&gt;You don't need to solve the impossible problem of "which AI is best" — let them prove it through results&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. The Human Decides — Judgment Is Not Outsourceable
&lt;/h3&gt;

&lt;p&gt;AI can analyze. AI can generate options. AI can evaluate tradeoffs. What AI cannot do: &lt;strong&gt;decide which tradeoff matters in this specific context for this specific person with these specific constraints.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three capabilities make human judgment irreplaceable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Insight&lt;/strong&gt; — Knowing what question to ask. AI can answer any question, but it can't know which question matters right now. Insight comes from understanding the problem deeply enough to ask the question that unlocks everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Critical Thinking&lt;/strong&gt; — Knowing when AI is wrong. AI gives confident, articulate answers regardless of accuracy. The human must evaluate: does this make sense? Is this consistent with what I know? Is there a blind spot?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result Evaluation&lt;/strong&gt; — Knowing if the outcome is good enough. AI can generate a technically correct solution that's wrong for your context. Only the human who understands the full picture — users, business constraints, team dynamics, market timing — can judge whether the output actually serves the goal.&lt;/p&gt;

&lt;p&gt;These three form a loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Insight → Ask the right question
  ↓
AI gives analysis
  ↓
Critical Thinking → Is this analysis trustworthy?
  ↓
Choose and execute
  ↓
Result Evaluation → Did it work?
  ↓
Insight → Why did it work / not work? → Better questions next time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. No Blind Faith, No Emotions — Results Are the Only Standard
&lt;/h3&gt;

&lt;p&gt;Two temptations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI agrees with you → "See, I was right." (Confirmation bias)&lt;/li&gt;
&lt;li&gt;AI disagrees with you → "AI doesn't understand my situation." (Emotional rejection)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tournament model rejects both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI agrees with me    → Good, but does the result confirm it?
AI disagrees with me → Interesting. Let me verify before judging.
Made a choice        → Own the outcome. Right? Improve. Wrong? Learn. Never blame the AI.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Practice as the sole test of truth. Not who said it. Not how confident it sounded. Did it work?&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Human Drives AI, Not the Other Way Around
&lt;/h3&gt;

&lt;p&gt;AI is an amplifier. The question is: amplifying what?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No insight + good AI tools = efficiently producing mediocrity
Good insight + no AI tools = good ideas, slow execution
Good insight + tournament model = insight amplified 10x
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The human provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direction&lt;/strong&gt; — what to work on (insight)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality standard&lt;/strong&gt; — what "good" looks like (evaluation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context&lt;/strong&gt; — the constraints AI doesn't see (judgment)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accountability&lt;/strong&gt; — willingness to own the outcome (leadership)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt; — generate options fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breadth&lt;/strong&gt; — consider more possibilities than a human can&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt; — apply the same standard across large volumes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge&lt;/strong&gt; — access more information than any person can hold&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The human's role isn't to do AI's job slowly. It's to do the job AI can't do at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applied to Real Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Content Creation
&lt;/h3&gt;

&lt;p&gt;The temptation: let AI generate content and publish automatically. Maximum output, minimum effort.&lt;/p&gt;

&lt;p&gt;The result: a flood of mediocre, AI-flavored content. No differentiation. No personal perspective. Platforms and audiences both learn to ignore it.&lt;/p&gt;

&lt;p&gt;The tournament approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Three AI models generate competing drafts on the same topic&lt;/li&gt;
&lt;li&gt;The human evaluates: which captured the insight? Which missed the point?&lt;/li&gt;
&lt;li&gt;The winning draft gets refined — the human adds what AI can't: personal experience, controversial opinion, industry context&lt;/li&gt;
&lt;li&gt;Publication decision: is this good enough to attach my name to?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The output isn't "AI content." It's human content, produced at AI speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Decisions
&lt;/h3&gt;

&lt;p&gt;The temptation: ask one AI "should I use vector databases for agent memory?" and follow its recommendation.&lt;/p&gt;

&lt;p&gt;The result: you inherit that model's training bias. Claude might favor simplicity (it was trained by Anthropic, who chose Markdown files). GPT might favor complexity (it's aligned with enterprise patterns).&lt;/p&gt;

&lt;p&gt;The tournament approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ask all three: "What are the tradeoffs between Markdown files, SQLite + vectors, and self-evolving skills for agent memory?"&lt;/li&gt;
&lt;li&gt;Each gives a different analysis weighted by its own biases&lt;/li&gt;
&lt;li&gt;The human evaluates against the &lt;strong&gt;actual constraints&lt;/strong&gt;: deployment model, team size, user count, latency requirements&lt;/li&gt;
&lt;li&gt;The decision accounts for context that no AI has — your specific situation&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Career Strategy
&lt;/h3&gt;

&lt;p&gt;The temptation: "AI will replace developers, I need to switch careers."&lt;/p&gt;

&lt;p&gt;The reality: AI replaces tasks, not roles. The question is which tasks become your competitive advantage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For employees:  Agent engineering skills (the 90% problem) — because companies 
                have data and scenarios, but need people who can build reliable agents

For founders:   Data + scenario moats — because agent engineering can be hired,
                but proprietary data and deep domain knowledge can't
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In both cases, the competitive advantage is &lt;strong&gt;insight&lt;/strong&gt; — understanding what matters in your specific domain well enough to direct AI effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Anti-Patterns
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Anti-Pattern&lt;/th&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Tournament Alternative&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Only use one AI&lt;/td&gt;
&lt;td&gt;Single advisor's bias = your bias&lt;/td&gt;
&lt;td&gt;Multiple AIs competing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Follow AI blindly&lt;/td&gt;
&lt;td&gt;Lose judgment over time&lt;/td&gt;
&lt;td&gt;AI advises, human decides&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reject AI when it disagrees&lt;/td&gt;
&lt;td&gt;Miss good ideas out of ego&lt;/td&gt;
&lt;td&gt;No emotions, evaluate by results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automate everything&lt;/td&gt;
&lt;td&gt;No quality control, garbage output&lt;/td&gt;
&lt;td&gt;Human at quality gates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Treat AI as just a tool&lt;/td&gt;
&lt;td&gt;Waste AI's analytical capability&lt;/td&gt;
&lt;td&gt;Treat AIs as competing advisors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Test
&lt;/h2&gt;

&lt;p&gt;Here's how to know if you're using AI well:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad sign:&lt;/strong&gt; You can't explain why you chose AI's suggestion over the alternatives.&lt;br&gt;
&lt;strong&gt;Good sign:&lt;/strong&gt; You can articulate the tradeoff — what you gained and what you gave up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad sign:&lt;/strong&gt; You use the same AI for everything.&lt;br&gt;
&lt;strong&gt;Good sign:&lt;/strong&gt; You use different AIs for the same task and pick the best output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad sign:&lt;/strong&gt; You haven't disagreed with AI in the past week.&lt;br&gt;
&lt;strong&gt;Good sign:&lt;/strong&gt; You regularly override AI when your insight says otherwise — and you're right more than you're wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad sign:&lt;/strong&gt; You can't tell the difference between AI output and human output.&lt;br&gt;
&lt;strong&gt;Good sign:&lt;/strong&gt; You use AI for speed and breadth, then add what only you can: context, judgment, and accountability.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Sentence
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;In the AI age, run AIs like a tournament: many compete, you decide, results judge everyone. Your insight is the one thing that scales with AI instead of being replaced by it.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the AI Agent Architecture series. For the technical deep dive behind these ideas: &lt;a href="https://harrisonsec.com/blog/ai-agent-90-percent-problem/" rel="noopener noreferrer"&gt;The 90% Problem&lt;/a&gt; and &lt;a href="https://harrisonsec.com/blog/claude-code-context-engineering-compression-pipeline/" rel="noopener noreferrer"&gt;Claude Code Deep Dive&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiagents</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>Node Turns Waiting Into Events. Go Moves Context Switching Into User Space.</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Tue, 28 Apr 2026 18:01:12 +0000</pubDate>
      <link>https://dev.to/harrisonsec/node-turns-waiting-into-events-go-moves-context-switching-into-user-space-58ik</link>
      <guid>https://dev.to/harrisonsec/node-turns-waiting-into-events-go-moves-context-switching-into-user-space-58ik</guid>
      <description>&lt;p&gt;Most discussions of TypeScript/Node vs Go concurrency stop at the surface: &lt;em&gt;Node is async, Go is threaded.&lt;/em&gt; That framing isn't wrong — it just isn't deep enough to be useful when you're picking a runtime, debugging a tail-latency problem, or explaining to your team why one of the services keeps falling over under CPU load.&lt;/p&gt;

&lt;p&gt;The real difference is not async vs threaded. It's a question about where, in the system, suspended work lives — and what shape it takes when it's resumed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Both Node and Go refuse to let the CPU sit idle while a request waits on I/O. They disagree on the unit of scheduling. Node's unit is the &lt;em&gt;continuation&lt;/em&gt; — the tail of an async function captured as a heap closure. Go's unit is the &lt;em&gt;goroutine&lt;/em&gt; — a full call stack the runtime can suspend and resume in user space. That single decision cascades into every other property of each runtime.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Wrong Question
&lt;/h2&gt;

&lt;p&gt;"Async vs threaded" is the wrong frame because it makes you think the choice is between paradigms. It isn't. Both runtimes have already made the &lt;em&gt;same&lt;/em&gt; fundamental decision: do not block an OS thread waiting for slow external work. The interesting choice is &lt;em&gt;how&lt;/em&gt; they implement that.&lt;/p&gt;

&lt;p&gt;The actually useful question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When a request is waiting for I/O — for a database, an HTTP call, a Redis round-trip, a file read — &lt;strong&gt;what does the CPU do, and where does the suspended state of that request live?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you frame it that way, Node and Go aren't opposites. They're two answers to the same question — and each answer cascades into a different language shape, a different library style, and a different failure mode under load.&lt;/p&gt;

&lt;p&gt;The naive blocking model answers the question with "an OS thread waits for the syscall to return." That model collapses around a few thousand concurrent connections — memory per thread, scheduler overhead, kernel context-switch cost. By 40,000 connections you're out of RAM, not CPU. Node and Go both refuse to do this. They diverge on &lt;em&gt;which resource gets freed up&lt;/em&gt; and &lt;em&gt;how the suspended work is captured for later resumption.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Node's Answer: Turn Waiting Into an Event
&lt;/h2&gt;

&lt;p&gt;Node's model can be summarized in one line: &lt;strong&gt;the JS main thread only executes code that's already ready to run.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Look at this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It reads as if the function is paused, blocking on the database. It isn't. Here's what V8 actually does at the bytecode level when it compiles an &lt;code&gt;async&lt;/code&gt; function: it rewrites the body into a state machine, with each &lt;code&gt;await&lt;/code&gt; becoming a state transition.&lt;/p&gt;

&lt;p&gt;The function above gets transformed into something equivalent to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;asyncFn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;promise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;closure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;                  &lt;span class="c1"&gt;// heap object holding locals&lt;/span&gt;

    &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
          &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;step&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;     &lt;span class="c1"&gt;// await → register continuation&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                         &lt;span class="c1"&gt;// ← function POPS here&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="nx"&gt;closure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;// resume: locals live in closure&lt;/span&gt;
          &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;closure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things to notice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;await&lt;/code&gt; is not a pause.&lt;/strong&gt; It's the point at which V8 returns from the function and pops the JS stack frame. The "rest of the function" is captured as a continuation registered on the awaited Promise via &lt;code&gt;.then&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local variables move to the heap.&lt;/strong&gt; Because the stack frame is gone, locals (&lt;code&gt;user&lt;/code&gt; here) live in a heap closure, accessible only when the state machine resumes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Each &lt;code&gt;await&lt;/code&gt; slices the function into another state.&lt;/strong&gt; A function with two &lt;code&gt;await&lt;/code&gt;s runs in three event-loop turns, with three independently-pushed JS frames, with all live state stored in heap closures between them.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That third point is the most non-obvious. A single &lt;code&gt;async&lt;/code&gt; function is &lt;strong&gt;not&lt;/strong&gt; one unit of execution — it's a sequence of fresh frames separated by event-loop turns:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBFTCBhcyBFdmVudCBMb29wIChsaWJ1dikKICAgIHBhcnRpY2lwYW50IEpTIGFzIEpTIE1haW4gVGhyZWFkIChWOCkKICAgIHBhcnRpY2lwYW50IEggYXMgSGVhcCAoY2xvc3VyZXMpCiAgICBwYXJ0aWNpcGFudCBLIGFzIEtlcm5lbCAvIEkvTwoKICAgIHJlY3QgcmdiKDI1NCwgMjQzLCAxOTkpCiAgICBOb3RlIG92ZXIgRUwsSzogVHVybiAxCiAgICBFTC0-PkpTOiBkaXNwYXRjaCBoYW5kbGVyKCkKICAgIGFjdGl2YXRlIEpTCiAgICBOb3RlIG92ZXIgSlM6IGNvbnN0IGEgPSAxCiAgICBKUy0-PkpTOiBjYWxsIGNvbXB1dGUxKCkg4oaSIHJldHVybnMgUHJvbWlzZQogICAgSlMtPj5IOiBWOCBzdG9yZXMgY2xvc3VyZSB7c3RhdGU6MSwgYX0KICAgIEpTLT4-SDogcmVnaXN0ZXIgc3RlcCBhcyAudGhlbiBoYW5kbGVyCiAgICBKUy0tPj5FTDogaGFuZGxlciBmcmFtZSBQT1BQRUQsIHJldHVybnMgUHJvbWlzZQogICAgZGVhY3RpdmF0ZSBKUwogICAgZW5kCgogICAgRUwtPj5LOiBlcG9sbF93YWl0IChubyBtaWNyb3Rhc2tzKQogICAgTm90ZSBvdmVyIEVMLEs6IC4uLiB0aW1lIHBhc3NlcywgT1MgdGhyZWFkIHBhcmtlZCAuLi4KICAgIEstLT4-RUw6IEkvTyByZWFkeSAoY29tcHV0ZTEgcmVzb2x2ZWQpCiAgICBFTC0-PkVMOiBlbnF1ZXVlIHN0ZXAgaW4gVjggbWljcm90YXNrIHF1ZXVlCgogICAgcmVjdCByZ2IoMjE5LCAyMzQsIDI1NCkKICAgIE5vdGUgb3ZlciBFTCxLOiBUdXJuIDIKICAgIEVMLT4-SlM6IGludm9rZSBzdGVwKHZhbHVlKSDigJQgTkVXIGZyYW1lCiAgICBhY3RpdmF0ZSBKUwogICAgSlMtPj5IOiBsb2FkIGNsb3N1cmUge3N0YXRlOjEsIGF9CiAgICBOb3RlIG92ZXIgSlM6IHggPSB2YWx1ZSwgc3RhdGUg4oaSIDIKICAgIEpTLT4-SlM6IGNhbGwgY29tcHV0ZTIoKSDihpIgcmV0dXJucyBQcm9taXNlCiAgICBKUy0-Pkg6IHJlZ2lzdGVyIHN0ZXAgKG5leHQgc3RhdGUpCiAgICBKUy0tPj5FTDogZnJhbWUgUE9QUEVEIGFnYWluCiAgICBkZWFjdGl2YXRlIEpTCiAgICBlbmQKCiAgICBLLS0-PkVMOiBjb21wdXRlMiByZXNvbHZlZAogICAgRUwtPj5FTDogZW5xdWV1ZSBzdGVwCgogICAgcmVjdCByZ2IoMjIwLCAyNTIsIDIzMSkKICAgIE5vdGUgb3ZlciBFTCxLOiBUdXJuIDMKICAgIEVMLT4-SlM6IGludm9rZSBzdGVwKHZhbHVlKSDigJQgeWV0IGFub3RoZXIgbmV3IGZyYW1lCiAgICBhY3RpdmF0ZSBKUwogICAgSlMtPj5IOiBsb2FkIGNsb3N1cmUge3N0YXRlOjIsIGEsIHh9CiAgICBOb3RlIG92ZXIgSlM6IHkgPSB2YWx1ZSwgc3RhdGUg4oaSIGRvbmUKICAgIEpTLT4-SlM6IHJlcy5qc29uKGEgKyB4ICsgeSkKICAgIEpTLS0-PkVMOiBoYW5kbGVyJ3MgUHJvbWlzZSByZXNvbHZlZAogICAgZGVhY3RpdmF0ZSBKUwogICAgZW5k" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fc2VxdWVuY2VEaWFncmFtCiAgICBhdXRvbnVtYmVyCiAgICBwYXJ0aWNpcGFudCBFTCBhcyBFdmVudCBMb29wIChsaWJ1dikKICAgIHBhcnRpY2lwYW50IEpTIGFzIEpTIE1haW4gVGhyZWFkIChWOCkKICAgIHBhcnRpY2lwYW50IEggYXMgSGVhcCAoY2xvc3VyZXMpCiAgICBwYXJ0aWNpcGFudCBLIGFzIEtlcm5lbCAvIEkvTwoKICAgIHJlY3QgcmdiKDI1NCwgMjQzLCAxOTkpCiAgICBOb3RlIG92ZXIgRUwsSzogVHVybiAxCiAgICBFTC0-PkpTOiBkaXNwYXRjaCBoYW5kbGVyKCkKICAgIGFjdGl2YXRlIEpTCiAgICBOb3RlIG92ZXIgSlM6IGNvbnN0IGEgPSAxCiAgICBKUy0-PkpTOiBjYWxsIGNvbXB1dGUxKCkg4oaSIHJldHVybnMgUHJvbWlzZQogICAgSlMtPj5IOiBWOCBzdG9yZXMgY2xvc3VyZSB7c3RhdGU6MSwgYX0KICAgIEpTLT4-SDogcmVnaXN0ZXIgc3RlcCBhcyAudGhlbiBoYW5kbGVyCiAgICBKUy0tPj5FTDogaGFuZGxlciBmcmFtZSBQT1BQRUQsIHJldHVybnMgUHJvbWlzZQogICAgZGVhY3RpdmF0ZSBKUwogICAgZW5kCgogICAgRUwtPj5LOiBlcG9sbF93YWl0IChubyBtaWNyb3Rhc2tzKQogICAgTm90ZSBvdmVyIEVMLEs6IC4uLiB0aW1lIHBhc3NlcywgT1MgdGhyZWFkIHBhcmtlZCAuLi4KICAgIEstLT4-RUw6IEkvTyByZWFkeSAoY29tcHV0ZTEgcmVzb2x2ZWQpCiAgICBFTC0-PkVMOiBlbnF1ZXVlIHN0ZXAgaW4gVjggbWljcm90YXNrIHF1ZXVlCgogICAgcmVjdCByZ2IoMjE5LCAyMzQsIDI1NCkKICAgIE5vdGUgb3ZlciBFTCxLOiBUdXJuIDIKICAgIEVMLT4-SlM6IGludm9rZSBzdGVwKHZhbHVlKSDigJQgTkVXIGZyYW1lCiAgICBhY3RpdmF0ZSBKUwogICAgSlMtPj5IOiBsb2FkIGNsb3N1cmUge3N0YXRlOjEsIGF9CiAgICBOb3RlIG92ZXIgSlM6IHggPSB2YWx1ZSwgc3RhdGUg4oaSIDIKICAgIEpTLT4-SlM6IGNhbGwgY29tcHV0ZTIoKSDihpIgcmV0dXJucyBQcm9taXNlCiAgICBKUy0-Pkg6IHJlZ2lzdGVyIHN0ZXAgKG5leHQgc3RhdGUpCiAgICBKUy0tPj5FTDogZnJhbWUgUE9QUEVEIGFnYWluCiAgICBkZWFjdGl2YXRlIEpTCiAgICBlbmQKCiAgICBLLS0-PkVMOiBjb21wdXRlMiByZXNvbHZlZAogICAgRUwtPj5FTDogZW5xdWV1ZSBzdGVwCgogICAgcmVjdCByZ2IoMjIwLCAyNTIsIDIzMSkKICAgIE5vdGUgb3ZlciBFTCxLOiBUdXJuIDMKICAgIEVMLT4-SlM6IGludm9rZSBzdGVwKHZhbHVlKSDigJQgeWV0IGFub3RoZXIgbmV3IGZyYW1lCiAgICBhY3RpdmF0ZSBKUwogICAgSlMtPj5IOiBsb2FkIGNsb3N1cmUge3N0YXRlOjIsIGEsIHh9CiAgICBOb3RlIG92ZXIgSlM6IHkgPSB2YWx1ZSwgc3RhdGUg4oaSIGRvbmUKICAgIEpTLT4-SlM6IHJlcy5qc29uKGEgKyB4ICsgeSkKICAgIEpTLS0-PkVMOiBoYW5kbGVyJ3MgUHJvbWlzZSByZXNvbHZlZAogICAgZGVhY3RpdmF0ZSBKUwogICAgZW5k" alt="sequenceDiagram" width="1103" height="1604"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is no "paused" function. There are only &lt;em&gt;captured continuations&lt;/em&gt; and &lt;em&gt;fresh frames that resume them&lt;/em&gt;. The event loop is the dispatcher: it watches for I/O readiness via libuv, for resolved Promises (via V8's microtask queue), for timers — and pulls the corresponding continuation onto the JS thread when it's ready to run. One thread can manage tens of thousands of concurrent connections, because at any moment only a handful of them have work that's actually ready.&lt;/p&gt;

&lt;p&gt;This is event-driven concurrency in its precise sense — the runtime turns "waiting" into a registered event, and only resumes the captured continuation when the event fires.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Visible Side Effect: Function Color
&lt;/h3&gt;

&lt;p&gt;Because the suspension point has to be marked at compile time, async-ness becomes part of the function's &lt;em&gt;type&lt;/em&gt;. A function that does I/O returns &lt;code&gt;Promise&amp;lt;T&amp;gt;&lt;/code&gt;. Its callers must &lt;code&gt;await&lt;/code&gt; it. Once they &lt;code&gt;await&lt;/code&gt;, they themselves return &lt;code&gt;Promise&amp;lt;T&amp;gt;&lt;/code&gt;. The "color" propagates up the call stack until you hit an async-aware entry point — typically the top of an HTTP handler or the event loop itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/" rel="noopener noreferrer"&gt;Bob Nystrom named this the function color problem&lt;/a&gt; in 2015. It's not a notation choice — it's a &lt;strong&gt;logical consequence of the stackless coroutine model&lt;/strong&gt;. V8 cannot save and restore arbitrary JS call stacks. The only way to express suspension is "return a Promise and be marked &lt;code&gt;async&lt;/code&gt;," and once one function does that, every function on the way up has to do the same.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBOb2RlWyI8Yj5Ob2RlIOKAlCBDb2xvciBDYXNjYWRlcyBVcCB0aGUgQ2FsbCBTdGFjazwvYj4iXQogICAgICAgIGRpcmVjdGlvbiBUQgogICAgICAgIG4xWyI8Yj5yZWFkRnJvbURCKCk8L2I-IPCfn6U8YnIvPuKGkiBQcm9taXNlJmx0O0RhdGEmZ3Q7PGJyLz48Yj5kb2VzIEkvTzwvYj4iXQogICAgICAgIG4yWyI8Yj5mZXRjaFVzZXIoKTwvYj4g8J-fpTxici8-4oaSIFByb21pc2UmbHQ7VXNlciZndDs8YnIvPjxiPm11c3QgYXdhaXQgcmVhZEZyb21EQjwvYj4iXQogICAgICAgIG4zWyI8Yj5oYW5kbGVSZXF1ZXN0KCk8L2I-IPCfn6U8YnIvPuKGkiBQcm9taXNlJmx0O1Jlc3BvbnNlJmd0Ozxici8-PGI-bXVzdCBhd2FpdCBmZXRjaFVzZXI8L2I-Il0KICAgICAgICBuNFsiPGI-cm91dGUoJy91c2VyJywgaGFuZGxlcik8L2I-IPCfn6U8YnIvPjxiPm11c3QgYWNjZXB0IFByb21pc2UgcmV0dXJuPC9iPiJdCiAgICAgICAgbjVbIjxiPm1haW4oKTwvYj4g8J-fpTxici8-4oaSIFByb21pc2UmbHQ7dm9pZCZndDs8YnIvPjxiPnRvcC1sZXZlbCBuZWVkcyBhd2FpdDwvYj4iXQogICAgICAgIG4xIC0uY29sb3IgaW5mZWN0cy4tPiBuMgogICAgICAgIG4yIC0uY29sb3IgaW5mZWN0cy4tPiBuMwogICAgICAgIG4zIC0uY29sb3IgaW5mZWN0cy4tPiBuNAogICAgICAgIG40IC0uY29sb3IgaW5mZWN0cy4tPiBuNQogICAgZW5kCgogICAgc3ViZ3JhcGggR29bIjxiPkdvIOKAlCBObyBDb2xvciwgTm8gQ2FzY2FkZTwvYj4iXQogICAgICAgIGRpcmVjdGlvbiBUQgogICAgICAgIGcxWyI8Yj5yZWFkRnJvbURCKCk8L2I-IOKsnDxici8-4oaSIERhdGE8YnIvPjxiPmJsb2NrcyBvbiBJL08gaW50ZXJuYWxseTwvYj4iXQogICAgICAgIGcyWyI8Yj5mZXRjaFVzZXIoKTwvYj4g4qycPGJyLz7ihpIgVXNlcjxici8-PGI-cGxhaW4gY2FsbDwvYj4iXQogICAgICAgIGczWyI8Yj5oYW5kbGVSZXF1ZXN0KCk8L2I-IOKsnDxici8-4oaSIFJlc3BvbnNlPGJyLz48Yj5wbGFpbiBjYWxsPC9iPiJdCiAgICAgICAgZzRbIjxiPnJvdXRlKCcvdXNlcicsIGhhbmRsZXIpPC9iPiDirJw8YnIvPjxiPmhhbmRsZXIgaXMgYSBwbGFpbiBmdW5jPC9iPiJdCiAgICAgICAgZzVbIjxiPm1haW4oKTwvYj4g4qycPGJyLz48Yj5wbGFpbiBmdW5jPC9iPiJdCiAgICAgICAgZzEgLS0-IGcyCiAgICAgICAgZzIgLS0-IGczCiAgICAgICAgZzMgLS0-IGc0CiAgICAgICAgZzQgLS0-IGc1CiAgICBlbmQKCiAgICBOb2RlIH5-fiBHbwoKICAgIGNsYXNzRGVmIHJlZENsYXNzIGZpbGw6I2ZlZTJlMixzdHJva2U6I2RjMjYyNixzdHJva2Utd2lkdGg6MnB4LGNvbG9yOiM3ZjFkMWQKICAgIGNsYXNzRGVmIHBsYWluQ2xhc3MgZmlsbDojZjNmNGY2LHN0cm9rZTojNmI3MjgwLHN0cm9rZS13aWR0aDoycHgsY29sb3I6IzExMTgyNwoKICAgIGNsYXNzIG4xLG4yLG4zLG40LG41IHJlZENsYXNzCiAgICBjbGFzcyBnMSxnMixnMyxnNCxnNSBwbGFpbkNsYXNz" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBOb2RlWyI8Yj5Ob2RlIOKAlCBDb2xvciBDYXNjYWRlcyBVcCB0aGUgQ2FsbCBTdGFjazwvYj4iXQogICAgICAgIGRpcmVjdGlvbiBUQgogICAgICAgIG4xWyI8Yj5yZWFkRnJvbURCKCk8L2I-IPCfn6U8YnIvPuKGkiBQcm9taXNlJmx0O0RhdGEmZ3Q7PGJyLz48Yj5kb2VzIEkvTzwvYj4iXQogICAgICAgIG4yWyI8Yj5mZXRjaFVzZXIoKTwvYj4g8J-fpTxici8-4oaSIFByb21pc2UmbHQ7VXNlciZndDs8YnIvPjxiPm11c3QgYXdhaXQgcmVhZEZyb21EQjwvYj4iXQogICAgICAgIG4zWyI8Yj5oYW5kbGVSZXF1ZXN0KCk8L2I-IPCfn6U8YnIvPuKGkiBQcm9taXNlJmx0O1Jlc3BvbnNlJmd0Ozxici8-PGI-bXVzdCBhd2FpdCBmZXRjaFVzZXI8L2I-Il0KICAgICAgICBuNFsiPGI-cm91dGUoJy91c2VyJywgaGFuZGxlcik8L2I-IPCfn6U8YnIvPjxiPm11c3QgYWNjZXB0IFByb21pc2UgcmV0dXJuPC9iPiJdCiAgICAgICAgbjVbIjxiPm1haW4oKTwvYj4g8J-fpTxici8-4oaSIFByb21pc2UmbHQ7dm9pZCZndDs8YnIvPjxiPnRvcC1sZXZlbCBuZWVkcyBhd2FpdDwvYj4iXQogICAgICAgIG4xIC0uY29sb3IgaW5mZWN0cy4tPiBuMgogICAgICAgIG4yIC0uY29sb3IgaW5mZWN0cy4tPiBuMwogICAgICAgIG4zIC0uY29sb3IgaW5mZWN0cy4tPiBuNAogICAgICAgIG40IC0uY29sb3IgaW5mZWN0cy4tPiBuNQogICAgZW5kCgogICAgc3ViZ3JhcGggR29bIjxiPkdvIOKAlCBObyBDb2xvciwgTm8gQ2FzY2FkZTwvYj4iXQogICAgICAgIGRpcmVjdGlvbiBUQgogICAgICAgIGcxWyI8Yj5yZWFkRnJvbURCKCk8L2I-IOKsnDxici8-4oaSIERhdGE8YnIvPjxiPmJsb2NrcyBvbiBJL08gaW50ZXJuYWxseTwvYj4iXQogICAgICAgIGcyWyI8Yj5mZXRjaFVzZXIoKTwvYj4g4qycPGJyLz7ihpIgVXNlcjxici8-PGI-cGxhaW4gY2FsbDwvYj4iXQogICAgICAgIGczWyI8Yj5oYW5kbGVSZXF1ZXN0KCk8L2I-IOKsnDxici8-4oaSIFJlc3BvbnNlPGJyLz48Yj5wbGFpbiBjYWxsPC9iPiJdCiAgICAgICAgZzRbIjxiPnJvdXRlKCcvdXNlcicsIGhhbmRsZXIpPC9iPiDirJw8YnIvPjxiPmhhbmRsZXIgaXMgYSBwbGFpbiBmdW5jPC9iPiJdCiAgICAgICAgZzVbIjxiPm1haW4oKTwvYj4g4qycPGJyLz48Yj5wbGFpbiBmdW5jPC9iPiJdCiAgICAgICAgZzEgLS0-IGcyCiAgICAgICAgZzIgLS0-IGczCiAgICAgICAgZzMgLS0-IGc0CiAgICAgICAgZzQgLS0-IGc1CiAgICBlbmQKCiAgICBOb2RlIH5-fiBHbwoKICAgIGNsYXNzRGVmIHJlZENsYXNzIGZpbGw6I2ZlZTJlMixzdHJva2U6I2RjMjYyNixzdHJva2Utd2lkdGg6MnB4LGNvbG9yOiM3ZjFkMWQKICAgIGNsYXNzRGVmIHBsYWluQ2xhc3MgZmlsbDojZjNmNGY2LHN0cm9rZTojNmI3MjgwLHN0cm9rZS13aWR0aDoycHgsY29sb3I6IzExMTgyNwoKICAgIGNsYXNzIG4xLG4yLG4zLG40LG41IHJlZENsYXNzCiAgICBjbGFzcyBnMSxnMixnMyxnNCxnNSBwbGFpbkNsYXNz" alt="flowchart LR" width="714" height="997"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Hard Limit
&lt;/h3&gt;

&lt;p&gt;The model fails the moment your code stops waiting. A single CPU-bound operation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* heavy work */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…holds the JS main thread, and &lt;em&gt;every other request on this process is dead&lt;/em&gt; until it returns. The event loop has nowhere else to go. Worker threads, child processes, or splitting CPU work into a separate service are real fixes, but they're escape hatches — they exist because the core model has only one main thread executing JS, and there is exactly one of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Go's Answer: Move Context Switching Into User Space
&lt;/h2&gt;

&lt;p&gt;Go writes synchronous code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sendResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no &lt;code&gt;await&lt;/code&gt;. There is no callback. The function looks like it blocks on the database. And yet the program scales to hundreds of thousands of concurrent operations on modest hardware.&lt;/p&gt;

&lt;p&gt;The trick is that the &lt;em&gt;scheduling boundary has been moved.&lt;/em&gt; Where Node has the programmer mark the suspension point with &lt;code&gt;await&lt;/code&gt; and the runtime captures a continuation, Go lets the programmer write straight-line code and has the &lt;em&gt;runtime&lt;/em&gt; suspend the entire goroutine when it hits a blocking I/O call.&lt;/p&gt;

&lt;p&gt;This is the central insight, and the cleanest one-line statement of Go's concurrency model:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Go's essence is the user-space-ification of context switching.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A goroutine isn't an OS thread. It's a small (initially 2 KB) growable stack and a register snapshot, managed by the Go runtime. The runtime maps a large number of goroutines (G) onto a small number of OS threads (M) using scheduling contexts (P). This is the GMP model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;G&lt;/strong&gt; — a goroutine. The unit of scheduling. Cheap to create, cheap to suspend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;M&lt;/strong&gt; — an OS thread. Usually only &lt;code&gt;GOMAXPROCS&lt;/code&gt; of them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P&lt;/strong&gt; — a scheduling context. Decides which G runs on which M.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;many G  →  Go scheduler  →  few M  →  CPU cores
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a goroutine hits a blocking syscall or a channel wait, the Go runtime suspends the goroutine — saves its stack and registers — detaches it from the current M, and schedules another runnable goroutine onto that M. When the original goroutine's wait completes, it's marked runnable again, and some M eventually picks it up and resumes execution from the suspension point. &lt;strong&gt;None of this enters the kernel.&lt;/strong&gt; No &lt;code&gt;clone(2)&lt;/code&gt;, no kernel-mediated thread switch, no kernel scheduler queue. The bookkeeping is all in user space.&lt;/p&gt;

&lt;p&gt;That's the user-space-ification. The CPU still has to switch contexts when work shifts between goroutines, but the cost is roughly a function call plus a stack swap — not a kernel-mediated thread switch.&lt;/p&gt;

&lt;p&gt;The key contrast with Node's model is in &lt;em&gt;where the suspended state lives:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBOb2RlWyI8Yj5Ob2RlIOKAlCBTdGFja2xlc3MgQ29yb3V0aW5lPC9iPiJdCiAgICAgICAgZGlyZWN0aW9uIFRCCiAgICAgICAgblN0YWNrWyI8Yj5KUyBDYWxsIFN0YWNrPC9iPjxici8-KG9uZSBmcmFtZSBhdCBhIHRpbWUpPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPuKaoCA8Yj5jdXJyZW50bHkgZW1wdHk8L2I-PGJyLz4oYWxsIGFzeW5jIGZucyBwb3BwZWQsPGJyLz53YWl0aW5nIGluIGV2ZW50IGxvb3ApIl0KICAgICAgICBuSGVhcFsiPGI-SGVhcDwvYj4iXQogICAgICAgIG5DMVsiPGI-Y29udGludWF0aW9uICMxPC9iPjxici8-eyBzdGF0ZTogMSw8YnIvPiZuYnNwOyZuYnNwO2xvY2Fsczoge3JlcSwgcmVzLCBhfSw8YnIvPiZuYnNwOyZuYnNwO3N0ZXA6IGZuIHB0ciB9Il0KICAgICAgICBuQzJbIjxiPmNvbnRpbnVhdGlvbiAjMjwvYj48YnIvPnsgc3RhdGU6IDAsIC4uLiB9Il0KICAgICAgICBuQzNbIjxiPmNvbnRpbnVhdGlvbiAjMzwvYj48YnIvPnsgc3RhdGU6IDIsIC4uLiB9Il0KICAgICAgICBuSGVhcCAtLT4gbkMxCiAgICAgICAgbkhlYXAgLS0-IG5DMgogICAgICAgIG5IZWFwIC0tPiBuQzMKICAgICAgICBuTm90ZVsiPGI-RWFjaCA8Y29kZT5hd2FpdDwvY29kZT4gcG9wcyB0aGUgZnJhbWUuPC9iPjxici8-U3RhdGUgbGl2ZXMgb25seSBpbiBoZWFwIGNsb3N1cmVzLjxici8-U3RhY2sgaXMgcmV1c2VkIGFjcm9zcyBhbGwgdHVybnMuIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIEdvWyI8Yj5HbyDigJQgU3RhY2tmdWwgQ29yb3V0aW5lPC9iPiJdCiAgICAgICAgZGlyZWN0aW9uIFRCCiAgICAgICAgZ01bIjxiPk9TIFRocmVhZCAoTSk8L2I-PGJyLz5jdXJyZW50bHkgcnVubmluZyBHMyDilrYiXQogICAgICAgIGdIZWFwWyI8Yj5IZWFwPC9iPiJdCiAgICAgICAgZ0cxWyI8Yj5nb3JvdXRpbmUgRzE8L2I-ICgyIEtCIHN0YWNrKTxici8-4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSBPGJyLz5wcm9jZXNzKCk8YnIvPiZuYnNwOyZuYnNwO-KGsyBzbG93RG91YmxlKCk8YnIvPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwO-KGsyB0aW1lLlNsZWVwKCkg4piFcGFya2VkIl0KICAgICAgICBnRzJbIjxiPmdvcm91dGluZSBHMjwvYj4gKDIgS0Igc3RhY2spPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPmhhbmRsZXIoKTxici8-Jm5ic3A7Jm5ic3A74oazIGRiLlF1ZXJ5KCkg4piFcGFya2VkIl0KICAgICAgICBnRzNbIjxiPmdvcm91dGluZSBHMzwvYj4gKDIgS0Igc3RhY2spPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPmN1cnJlbnRseSBvbiBNIOKWtiJdCiAgICAgICAgZ0hlYXAgLS0-IGdHMQogICAgICAgIGdIZWFwIC0tPiBnRzIKICAgICAgICBnSGVhcCAtLT4gZ0czCiAgICAgICAgZ05vdGVbIjxiPkVhY2ggZ29yb3V0aW5lIG93bnMgaXRzIGZ1bGwgc3RhY2suPC9iPjxici8-UnVudGltZSBzYXZlcy9yZXN0b3JlcyBlbnRpcmUgc3RhY2s8YnIvPm9uIHN1c3BlbmQuIE5vIGZyYW1lIHBvcCBuZWVkZWQuIl0KICAgIGVuZAoKICAgIE5vZGUgfn5-IEdvCgogICAgY2xhc3NEZWYgbm9kZUFsZXJ0IGZpbGw6I2ZlZTJlMixzdHJva2U6I2RjMjYyNixzdHJva2Utd2lkdGg6M3B4LGNvbG9yOiM3ZjFkMWQKICAgIGNsYXNzRGVmIG5vZGVDbGFzcyBmaWxsOiNmZWYzYzcsc3Ryb2tlOiNkOTc3MDYsY29sb3I6IzExMTgyNwogICAgY2xhc3NEZWYgZ29DbGFzcyBmaWxsOiNkYmVhZmUsc3Ryb2tlOiMyNTYzZWIsY29sb3I6IzExMTgyNwogICAgY2xhc3NEZWYgbm90ZUNsYXNzIGZpbGw6I2ZmZmZmZixzdHJva2U6IzM3NDE1MSxzdHJva2Utd2lkdGg6MS41cHgsY29sb3I6IzExMTgyNwoKICAgIGNsYXNzIG5TdGFjayBub2RlQWxlcnQKICAgIGNsYXNzIG5IZWFwLG5DMSxuQzIsbkMzIG5vZGVDbGFzcwogICAgY2xhc3MgZ00sZ0hlYXAsZ0cxLGdHMixnRzMgZ29DbGFzcwogICAgY2xhc3Mgbk5vdGUsZ05vdGUgbm90ZUNsYXNz" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBOb2RlWyI8Yj5Ob2RlIOKAlCBTdGFja2xlc3MgQ29yb3V0aW5lPC9iPiJdCiAgICAgICAgZGlyZWN0aW9uIFRCCiAgICAgICAgblN0YWNrWyI8Yj5KUyBDYWxsIFN0YWNrPC9iPjxici8-KG9uZSBmcmFtZSBhdCBhIHRpbWUpPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPuKaoCA8Yj5jdXJyZW50bHkgZW1wdHk8L2I-PGJyLz4oYWxsIGFzeW5jIGZucyBwb3BwZWQsPGJyLz53YWl0aW5nIGluIGV2ZW50IGxvb3ApIl0KICAgICAgICBuSGVhcFsiPGI-SGVhcDwvYj4iXQogICAgICAgIG5DMVsiPGI-Y29udGludWF0aW9uICMxPC9iPjxici8-eyBzdGF0ZTogMSw8YnIvPiZuYnNwOyZuYnNwO2xvY2Fsczoge3JlcSwgcmVzLCBhfSw8YnIvPiZuYnNwOyZuYnNwO3N0ZXA6IGZuIHB0ciB9Il0KICAgICAgICBuQzJbIjxiPmNvbnRpbnVhdGlvbiAjMjwvYj48YnIvPnsgc3RhdGU6IDAsIC4uLiB9Il0KICAgICAgICBuQzNbIjxiPmNvbnRpbnVhdGlvbiAjMzwvYj48YnIvPnsgc3RhdGU6IDIsIC4uLiB9Il0KICAgICAgICBuSGVhcCAtLT4gbkMxCiAgICAgICAgbkhlYXAgLS0-IG5DMgogICAgICAgIG5IZWFwIC0tPiBuQzMKICAgICAgICBuTm90ZVsiPGI-RWFjaCA8Y29kZT5hd2FpdDwvY29kZT4gcG9wcyB0aGUgZnJhbWUuPC9iPjxici8-U3RhdGUgbGl2ZXMgb25seSBpbiBoZWFwIGNsb3N1cmVzLjxici8-U3RhY2sgaXMgcmV1c2VkIGFjcm9zcyBhbGwgdHVybnMuIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIEdvWyI8Yj5HbyDigJQgU3RhY2tmdWwgQ29yb3V0aW5lPC9iPiJdCiAgICAgICAgZGlyZWN0aW9uIFRCCiAgICAgICAgZ01bIjxiPk9TIFRocmVhZCAoTSk8L2I-PGJyLz5jdXJyZW50bHkgcnVubmluZyBHMyDilrYiXQogICAgICAgIGdIZWFwWyI8Yj5IZWFwPC9iPiJdCiAgICAgICAgZ0cxWyI8Yj5nb3JvdXRpbmUgRzE8L2I-ICgyIEtCIHN0YWNrKTxici8-4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSB4pSBPGJyLz5wcm9jZXNzKCk8YnIvPiZuYnNwOyZuYnNwO-KGsyBzbG93RG91YmxlKCk8YnIvPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwO-KGsyB0aW1lLlNsZWVwKCkg4piFcGFya2VkIl0KICAgICAgICBnRzJbIjxiPmdvcm91dGluZSBHMjwvYj4gKDIgS0Igc3RhY2spPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPmhhbmRsZXIoKTxici8-Jm5ic3A7Jm5ic3A74oazIGRiLlF1ZXJ5KCkg4piFcGFya2VkIl0KICAgICAgICBnRzNbIjxiPmdvcm91dGluZSBHMzwvYj4gKDIgS0Igc3RhY2spPGJyLz7ilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIHilIE8YnIvPmN1cnJlbnRseSBvbiBNIOKWtiJdCiAgICAgICAgZ0hlYXAgLS0-IGdHMQogICAgICAgIGdIZWFwIC0tPiBnRzIKICAgICAgICBnSGVhcCAtLT4gZ0czCiAgICAgICAgZ05vdGVbIjxiPkVhY2ggZ29yb3V0aW5lIG93bnMgaXRzIGZ1bGwgc3RhY2suPC9iPjxici8-UnVudGltZSBzYXZlcy9yZXN0b3JlcyBlbnRpcmUgc3RhY2s8YnIvPm9uIHN1c3BlbmQuIE5vIGZyYW1lIHBvcCBuZWVkZWQuIl0KICAgIGVuZAoKICAgIE5vZGUgfn5-IEdvCgogICAgY2xhc3NEZWYgbm9kZUFsZXJ0IGZpbGw6I2ZlZTJlMixzdHJva2U6I2RjMjYyNixzdHJva2Utd2lkdGg6M3B4LGNvbG9yOiM3ZjFkMWQKICAgIGNsYXNzRGVmIG5vZGVDbGFzcyBmaWxsOiNmZWYzYzcsc3Ryb2tlOiNkOTc3MDYsY29sb3I6IzExMTgyNwogICAgY2xhc3NEZWYgZ29DbGFzcyBmaWxsOiNkYmVhZmUsc3Ryb2tlOiMyNTYzZWIsY29sb3I6IzExMTgyNwogICAgY2xhc3NEZWYgbm90ZUNsYXNzIGZpbGw6I2ZmZmZmZixzdHJva2U6IzM3NDE1MSxzdHJva2Utd2lkdGg6MS41cHgsY29sb3I6IzExMTgyNwoKICAgIGNsYXNzIG5TdGFjayBub2RlQWxlcnQKICAgIGNsYXNzIG5IZWFwLG5DMSxuQzIsbkMzIG5vZGVDbGFzcwogICAgY2xhc3MgZ00sZ0hlYXAsZ0cxLGdHMixnRzMgZ29DbGFzcwogICAgY2xhc3Mgbk5vdGUsZ05vdGUgbm90ZUNsYXNz" alt="flowchart LR" width="1904" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Node, the JS call stack is shared and almost always near-empty — every async function in flight has already popped, with its state sitting in a heap closure. In Go, every goroutine owns its full call chain on its own heap-allocated stack; suspended goroutines look like frozen frames waiting for the runtime to resume them on some OS thread.&lt;/p&gt;

&lt;p&gt;This is also why neither language can simply borrow the other's model. &lt;strong&gt;Node runs on V8&lt;/strong&gt;, which was designed in 2008 for browser JS — single call stack, synchronous semantics, no concept of saving stacks across yields. Adding stackful coroutines would mean rewriting the engine, which is roughly what Java's Project Loom did to the JVM at huge cost. &lt;strong&gt;Go was designed from scratch&lt;/strong&gt; with a runtime that owns stacks, can grow them, and can save them. The choice is locked in by runtime architecture, not language taste.&lt;/p&gt;




&lt;h2&gt;
  
  
  What "User-Space" Actually Buys You
&lt;/h2&gt;

&lt;p&gt;The slogan only matters if user-space context switching is meaningfully cheaper than the kernel-mediated kind. It is — by more than an order of magnitude.&lt;/p&gt;

&lt;p&gt;Two goroutines pinned to one OS thread (&lt;code&gt;GOMAXPROCS=1&lt;/code&gt;), ping-ponging via &lt;code&gt;runtime.Gosched()&lt;/code&gt; and via an unbuffered channel. Two pthreads pinned to one core (&lt;code&gt;taskset -c 0&lt;/code&gt;), ping-ponging via &lt;code&gt;pthread_mutex&lt;/code&gt; + &lt;code&gt;pthread_cond&lt;/code&gt;. (Reproduction code at the end of the post.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measured on Intel N100, Ubuntu 24.04 (kernel 6.8.0), Go 1.23.4, gcc 13.3:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;ns / switch&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Goroutine yield (&lt;code&gt;runtime.Gosched&lt;/code&gt;, GOMAXPROCS=1)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~102 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Goroutine round-trip via unbuffered channel&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~436 ns&lt;/strong&gt; (≈218 ns per G-switch + channel coordination)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pthread switch (mutex+cond ping-pong, single core)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~2,900 ns&lt;/strong&gt; (range 2,818–3,611 across 5 runs of 2M iterations)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Ratio: roughly &lt;strong&gt;28× cheaper&lt;/strong&gt; for the bare scheduler yield, &lt;strong&gt;~13× cheaper&lt;/strong&gt; for the apples-to-apples synchronized round-trip.&lt;/p&gt;

&lt;p&gt;Where the gap comes from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mode switch.&lt;/strong&gt; The user → kernel → user round-trip alone is ~100 ns of entry/exit and ABI-mandated register save/restore. A goroutine switch never crosses that line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduler work in kernel space.&lt;/strong&gt; Linux CFS maintains a red-black tree of runnable threads with locked, cross-CPU runqueues. The Go scheduler does the same job in user space with per-P local runqueues and lock-free fast paths — and skips the kernel locks entirely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache and TLB effects.&lt;/strong&gt; A kernel scheduler may migrate a thread to a different core, costing you cold L1/L2 and an instruction-cache reload. Goroutines normally stay on the same M, so the cache stays warm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What the model does &lt;em&gt;not&lt;/em&gt; buy you: a goroutine that makes a real blocking syscall still pays for a real OS thread switch — the runtime detaches the G from its M and may spin up another M so the rest of the goroutines keep running. Async preemption (Go 1.14+, signal-based) is the runtime's answer to tight loops that never yield, and it has its own cost. Once you saturate &lt;code&gt;GOMAXPROCS&lt;/code&gt;, the user-space runqueue itself starts to show up in profiles.&lt;/p&gt;

&lt;p&gt;The "user-space-ification" buys you &lt;strong&gt;cheap G-to-G switching on a hot M.&lt;/strong&gt; That's where the order-of-magnitude lives. The syscalls, the M-to-M handoffs, the actual kernel work — those are still as expensive as they always were. The model wins by making the &lt;em&gt;common case&lt;/em&gt; — many concurrent goroutines, mostly waiting, occasionally running — almost free.&lt;/p&gt;

&lt;p&gt;(N100 is a low-power Alder Lake-N E-core; absolute numbers will be smaller on a server-class Xeon or EPYC, but the ratio is expected to hold.)&lt;/p&gt;




&lt;h2&gt;
  
  
  The Unit of Scheduling
&lt;/h2&gt;

&lt;p&gt;The cleanest comparison is to ask what each runtime actually schedules:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Node / TypeScript&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Go&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Unit of scheduling&lt;/td&gt;
&lt;td&gt;callback / Promise continuation&lt;/td&gt;
&lt;td&gt;goroutine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What's captured at suspension&lt;/td&gt;
&lt;td&gt;tail of an async function as a heap closure&lt;/td&gt;
&lt;td&gt;full call stack + registers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How code looks&lt;/td&gt;
&lt;td&gt;explicit &lt;code&gt;async&lt;/code&gt;/&lt;code&gt;await&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;straight-line synchronous&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Suspension marked by&lt;/td&gt;
&lt;td&gt;the programmer (&lt;code&gt;await&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;the runtime (any blocking op)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Suspended state lives in&lt;/td&gt;
&lt;td&gt;V8 microtask queue + heap closure&lt;/td&gt;
&lt;td&gt;goroutine stack on the user-space heap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel involvement&lt;/td&gt;
&lt;td&gt;epoll/kqueue/IOCP via libuv&lt;/td&gt;
&lt;td&gt;epoll/kqueue/IOCP via netpoller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU parallelism&lt;/td&gt;
&lt;td&gt;one main JS thread; needs workers/cluster for cores&lt;/td&gt;
&lt;td&gt;M:N scheduler runs goroutines across cores natively&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Function color&lt;/td&gt;
&lt;td&gt;yes (Promise infects up the call stack)&lt;/td&gt;
&lt;td&gt;no (any function may block)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What breaks under CPU load&lt;/td&gt;
&lt;td&gt;the entire event loop&lt;/td&gt;
&lt;td&gt;nothing — scheduler runs another G on another M&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The two columns describe deeply different mental models, but they belong to the same family. They are both &lt;em&gt;user-space concurrency runtimes that avoid kernel thread-per-request.&lt;/em&gt; They differ in where the suspension is captured (the language vs. the call stack) and how broad the scheduler's mandate is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the Boundaries Diverge: CPU-Bound Work
&lt;/h2&gt;

&lt;p&gt;Node and Go look interchangeable on I/O-bound workloads. They diverge sharply the moment CPU work enters the picture.&lt;/p&gt;

&lt;p&gt;Node's event loop has one job: dispatch ready callbacks onto a single JS thread. If a callback runs for 200 ms doing JSON parsing or hashing, the loop is &lt;em&gt;frozen&lt;/em&gt; for those 200 ms. Every other suspended continuation has to wait. Throughput collapses.&lt;/p&gt;

&lt;p&gt;Go's runtime has a different mandate. It doesn't only manage waiting — it also manages execution. If you spawn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;task1&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;task2&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;task3&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…the scheduler is happy to put each goroutine on a different M, run them on different cores in true parallel, and preempt long-running goroutines so they don't starve the rest of the runtime. CPU-bound goroutines aren't a special case to work around. They're just goroutines.&lt;/p&gt;

&lt;p&gt;That's why Go's concurrency model covers more ground:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Node's model mainly solves non-CPU-bound concurrency — network I/O, database waits, downstream API calls. Go's model solves I/O waiting &lt;em&gt;and&lt;/em&gt; CPU parallelism with the same primitive.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't a knock on Node. The event loop is brilliant at what it's designed for: lots of slow waits, light per-request CPU. It's the natural shape of API gateways, BFFs, websocket hubs, real-time aggregation, and most of the JSON-shuffling that makes up modern web backends. But sustained CPU work, mixed CPU + I/O pipelines, long-lived infrastructure services — those are workloads where Go's scheduler-driven model has more headroom built in.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Answers to the Same Question
&lt;/h2&gt;

&lt;p&gt;Strip away the implementation details and the two runtimes are answering the same question with different abstractions:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Concurrency at scale is the problem of what to do with the CPU while a request waits on I/O.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Node's answer: turn the wait into an event, capture the rest of the function as a continuation, resume the continuation when the event fires. One thread cycling through ready continuations.&lt;/p&gt;

&lt;p&gt;Go's answer: run the request on a goroutine, suspend the goroutine in user space when it blocks, schedule another runnable goroutine onto the OS thread, resume the original when its wait completes.&lt;/p&gt;

&lt;p&gt;Two ways of solving the same waste. One state-machines it. The other lowers the cost of context switching far enough that you can afford to keep one execution flow per request.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Two answers to one question: one is events, implemented as a state machine. The other is low-cost user-space context switching.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But there's a deeper layer worth surfacing. The two answers also disagree about &lt;em&gt;whether suspension should be visible in the type system.&lt;/em&gt; Node says yes — &lt;code&gt;Promise&amp;lt;T&amp;gt;&lt;/code&gt; is part of the signature, &lt;code&gt;async&lt;/code&gt; is part of the contract, function color propagates. Go says no — any function may block, and the type doesn't carry that information.&lt;/p&gt;

&lt;p&gt;This visibility-vs-uniformity trade-off shows up far beyond Node and Go. It's the same shape as monadic IO vs implicit IO in Haskell, checked vs unchecked exceptions in Java, capability-based security vs ambient authority. Each pair makes the same trade: composable static reasoning vs ergonomic uniform code. Node and Go are picking sides of a much bigger question.&lt;/p&gt;

&lt;p&gt;You see the consequence in the libraries. Node libraries publish &lt;code&gt;fs.readFile&lt;/code&gt; &lt;em&gt;and&lt;/em&gt; &lt;code&gt;fs.readFileSync&lt;/code&gt;, two retry helpers (one for sync ops, one for async), &lt;code&gt;p-limit&lt;/code&gt;-style bounded-concurrency wrappers around &lt;code&gt;Promise.all&lt;/code&gt;. Go libraries publish &lt;code&gt;os.ReadFile&lt;/code&gt; (one function), one &lt;code&gt;Retry(op func() error, n int) error&lt;/code&gt;, twenty lines of &lt;code&gt;chan&lt;/code&gt; + &lt;code&gt;WaitGroup&lt;/code&gt; for bounded concurrency. The Go versions aren't simpler because Go developers are smarter — they're simpler because the runtime hides the same complexity that Node's type system insists on exposing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Closing Line
&lt;/h2&gt;

&lt;p&gt;If you remember one thing from this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Node turns waiting into events. Go turns execution flows into schedulable units. Both refuse to let the CPU sit idle while I/O blocks — they just disagree on what the unit of scheduling should be.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or, if you want the deeper layer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Node makes "this function might suspend" visible at the type level. Go makes it invisible.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the whole story. Everything else — &lt;code&gt;await&lt;/code&gt; vs &lt;code&gt;go&lt;/code&gt;, libuv vs the netpoller, V8's microtask queue vs GMP, single-thread bottleneck vs CPU-bound resilience, libraries that look complicated vs libraries that look simple — falls out of that one disagreement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: Reproduce the Benchmark
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;goroutine_switch_test.go&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;GOMAXPROCS=1 go test -bench=. -benchtime=5s -count=5&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;bench&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"runtime"&lt;/span&gt;
    &lt;span class="s"&gt;"sync"&lt;/span&gt;
    &lt;span class="s"&gt;"testing"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Channel ping-pong: each iter is a full round-trip = 2 G-switches.&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkGoroutineSwitchChannel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="n"&gt;done&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}{}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResetTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}{}&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StopTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Bare scheduler yield. Each iter ≈ 1 G-switch.&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkGoroutineSwitchGosched&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;half&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;half&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Gosched&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResetTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;half&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Gosched&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;pthread_switch.c&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;gcc -O2 -o pthread_switch pthread_switch.c -lpthread &amp;amp;&amp;amp; taskset -c 0 ./pthread_switch 2000000&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define _GNU_SOURCE
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;pthread.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;time.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;pthread_mutex_t&lt;/span&gt; &lt;span class="n"&gt;mu&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PTHREAD_MUTEX_INITIALIZER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;pthread_cond_t&lt;/span&gt;  &lt;span class="n"&gt;cv&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PTHREAD_COND_INITIALIZER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;    &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;            &lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nf"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;my_turn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;intptr_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;pthread_mutex_lock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;my_turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;pthread_cond_wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;my_turn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;pthread_cond_broadcast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;pthread_mutex_unlock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="nf"&gt;now_ns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;timespec&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;clock_gettime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CLOCK_MONOTONIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tv_sec&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1e9&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tv_nsec&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;iters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;atol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000000L&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;pthread_t&lt;/span&gt; &lt;span class="n"&gt;t0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now_ns&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;pthread_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;t0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;intptr_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;pthread_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;intptr_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;pthread_join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;pthread_join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now_ns&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ns / switch: %.1f&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;GOMAXPROCS=1&lt;/code&gt; forces both goroutines onto the same M so we measure pure G-to-G switching, not cross-core migration. &lt;code&gt;taskset -c 0&lt;/code&gt; pins both pthreads to one CPU so they actually have to context-switch (otherwise they run in parallel on two cores and there is nothing to measure). Both benches do the simplest possible synchronized hand-off — no I/O, no real work — so what is left is the cost of the switch itself.&lt;/p&gt;

</description>
      <category>go</category>
      <category>node</category>
      <category>concurrency</category>
      <category>javascript</category>
    </item>
    <item>
      <title>gRPC Interceptors in Production: Design Patterns That Survive Real Load</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 17:02:20 +0000</pubDate>
      <link>https://dev.to/harrisonsec/grpc-interceptors-in-production-design-patterns-that-survive-real-load-372h</link>
      <guid>https://dev.to/harrisonsec/grpc-interceptors-in-production-design-patterns-that-survive-real-load-372h</guid>
      <description>&lt;p&gt;gRPC interceptors are the middleware pattern, specialized for gRPC. If you've written HTTP middleware before, the shape is familiar — a function that wraps a call, can observe or modify the request, pass to the next handler, then observe or modify the response. The difference: gRPC's type system makes the flavors (unary, server-stream, client-stream, bidi) explicit, and chain ordering matters more than most people realize.&lt;/p&gt;

&lt;p&gt;Most online examples show a single toy interceptor. Production systems stack five to ten of them per service. Getting the composition right — ordering, concern separation, testability — is half of running a gRPC-based microservice well.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — gRPC interceptors are middleware with more explicit types. Chain them outside-in: observability wraps everything, then throttling, then auth, then retry, then the actual service. Keep each interceptor focused on one concern; the moment an interceptor does two things you're writing coupled middleware. Stream interceptors are trickier than unary — don't copy-paste unary logic into stream without thinking. Test the chain composition with bufconn, not just each interceptor in isolation.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Four Interceptor Types
&lt;/h2&gt;

&lt;p&gt;gRPC has four interceptor signatures, two for client, two for server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unary server interceptor&lt;/strong&gt;: wraps a single request → single response call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream server interceptor&lt;/strong&gt;: wraps streaming RPCs (server-stream, client-stream, bidi).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unary client interceptor&lt;/strong&gt;: wraps the client side of a unary call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream client interceptor&lt;/strong&gt;: wraps the client side of a streaming call.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unary interceptors are easy. Stream interceptors are harder because you're wrapping a bidirectional wire, not a single call.&lt;/p&gt;

&lt;p&gt;Example unary server interceptor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;loggingInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"method=%s duration=%s err=%v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Since&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loggingInterceptor&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Straightforward. Now stack five of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chaining and Order
&lt;/h2&gt;

&lt;p&gt;Real services need multiple interceptors. gRPC's standard library gives you &lt;code&gt;grpc.ChainUnaryInterceptor(...)&lt;/code&gt; (since 1.25), or you can use &lt;code&gt;google.golang.org/grpc/interceptor&lt;/code&gt; helpers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChainUnaryInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;observabilityInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c"&gt;// outermost&lt;/span&gt;
        &lt;span class="n"&gt;rateLimitInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;authInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;validationInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;businessLogicContextInterceptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c"&gt;// innermost&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Chain order matters &lt;em&gt;enormously&lt;/em&gt;. Interceptors execute outside-in on the way to the handler, inside-out on the way back. Put the wrong interceptor outside the wrong one and you get bugs that are hard to debug.&lt;/p&gt;

&lt;p&gt;Canonical order I use:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBDbGllbnQoW2dSUEMgY2xpZW50XSkgLS0-IEkxCiAgICBJMVsiT2JzZXJ2YWJpbGl0eTxici8-dHJhY2luZyDCtyBtZXRyaWNzIMK3IGxvZ2dpbmciXSAtLT4gSTIKICAgIEkyWyJSYXRlIGxpbWl0aW5nIC8gcXVvdGEiXSAtLT4gSTMKICAgIEkzWyJBdXRoPGJyLz5hdXRobiDCtyBhdXRoeiJdIC0tPiBJNAogICAgSTRbIlZhbGlkYXRpb24iXSAtLT4gSTUKICAgIEk1WyJSZXRyeSAvIGlkZW1wb3RlbmN5Il0gLS0-IEk2CiAgICBJNlsiQ29udGV4dCBlbnJpY2htZW50Il0gLS0-IEhhbmRsZXJ7eyJCdXNpbmVzcyBoYW5kbGVyIn19CgogICAgY2xhc3NEZWYgb3V0ZXIgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiBtaWQgZmlsbDojZThmNGY4LHN0cm9rZTojMmM1MjgyCiAgICBjbGFzc0RlZiBpbm5lciBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzIEkxIG91dGVyCiAgICBjbGFzcyBJMixJMyxJNCBtaWQKICAgIGNsYXNzIEk1LEk2IGlubmVy" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBDbGllbnQoW2dSUEMgY2xpZW50XSkgLS0-IEkxCiAgICBJMVsiT2JzZXJ2YWJpbGl0eTxici8-dHJhY2luZyDCtyBtZXRyaWNzIMK3IGxvZ2dpbmciXSAtLT4gSTIKICAgIEkyWyJSYXRlIGxpbWl0aW5nIC8gcXVvdGEiXSAtLT4gSTMKICAgIEkzWyJBdXRoPGJyLz5hdXRobiDCtyBhdXRoeiJdIC0tPiBJNAogICAgSTRbIlZhbGlkYXRpb24iXSAtLT4gSTUKICAgIEk1WyJSZXRyeSAvIGlkZW1wb3RlbmN5Il0gLS0-IEk2CiAgICBJNlsiQ29udGV4dCBlbnJpY2htZW50Il0gLS0-IEhhbmRsZXJ7eyJCdXNpbmVzcyBoYW5kbGVyIn19CgogICAgY2xhc3NEZWYgb3V0ZXIgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzc0RlZiBtaWQgZmlsbDojZThmNGY4LHN0cm9rZTojMmM1MjgyCiAgICBjbGFzc0RlZiBpbm5lciBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzIEkxIG91dGVyCiAgICBjbGFzcyBJMixJMyxJNCBtaWQKICAgIGNsYXNzIEk1LEk2IGlubmVy" alt="Client([gRPC client]) --&gt; I1" width="1784" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Outside-in on the way to the handler, inside-out on the way back. Observability must wrap everything — so it sees every rejection, every rate-limit hit, every failed auth — otherwise you have operational blind spots. Details:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Observability (tracing + metrics + logging)&lt;/strong&gt; — outermost. You want to see every request, including the ones that get rejected by later interceptors. If observability is inside auth, unauth'd attempts are invisible — a security-relevant blind spot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rate limiting / quota&lt;/strong&gt; — before auth. Why? Because auth involves token verification (DB lookup, JWT parsing, external identity service), and you don't want unauthenticated requests to cost you CPU. Rate-limit first, authenticate second.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Auth (authentication + authorization)&lt;/strong&gt; — before business logic. Reject unauthenticated/unauthorized requests early.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Validation (request shape, basic sanity)&lt;/strong&gt; — before business logic. Catches malformed requests before they hit service code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retry / idempotency handling&lt;/strong&gt; — closer to business. Only retry what actually made it through auth.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Request context enrichment (trace IDs, user metadata)&lt;/strong&gt; — innermost. Populate context with validated data for the service to use.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Inverted order produces real bugs. I've seen auth outside observability (auth failures weren't logged). Retry outside rate limiter (a retry storm blew through the rate limit). Validation outside observability (validation failures invisible in metrics). Each one a real incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping Interceptors Focused
&lt;/h2&gt;

&lt;p&gt;The rule: &lt;strong&gt;one concern per interceptor&lt;/strong&gt;. The moment you have an "auth-and-logging" interceptor, you're coupling concerns that should evolve separately.&lt;/p&gt;

&lt;p&gt;Concretely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't: single "observability" interceptor that does tracing, metrics, and logging in one function.&lt;/li&gt;
&lt;li&gt;Do: three interceptors (&lt;code&gt;tracingInterceptor&lt;/code&gt;, &lt;code&gt;metricsInterceptor&lt;/code&gt;, &lt;code&gt;loggingInterceptor&lt;/code&gt;), chained.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cost: three function-call overheads instead of one. Marginal.&lt;/p&gt;

&lt;p&gt;Benefit: you can swap tracing backends without touching logging. You can disable metrics in tests without disabling tracing. Each interceptor is testable in isolation.&lt;/p&gt;

&lt;p&gt;This is the same argument for Unix pipes over monolithic commands. Composition beats monoliths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Interceptor Recipes
&lt;/h2&gt;

&lt;p&gt;Real interceptors I've written variants of many times:&lt;/p&gt;

&lt;h3&gt;
  
  
  Tracing (OpenTelemetry)
&lt;/h3&gt;

&lt;p&gt;Use the &lt;code&gt;otelgrpc&lt;/code&gt; integration from &lt;code&gt;go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc&lt;/code&gt;. Don't write your own — the ecosystem is mature. Current idiomatic setup uses a &lt;code&gt;StatsHandler&lt;/code&gt;, which hooks deeper than the interceptor chain and captures stream events correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatsHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;otelgrpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServerHandler&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChainUnaryInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c"&gt;/* your app interceptors */&lt;/span&gt; &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Older codebases still use &lt;code&gt;otelgrpc.UnaryServerInterceptor()&lt;/code&gt; and &lt;code&gt;otelgrpc.StreamServerInterceptor()&lt;/code&gt; — those are deprecated but still work. Migrate when convenient; don't rewrite in a panic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;p&gt;Prometheus histogram of request duration per method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;reqDuration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;promauto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewHistogramVec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;prometheus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HistogramOpts&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"grpc_server_request_duration_seconds"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Buckets&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prometheus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DefBuckets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;metricsInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;reqDuration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithLabelValues&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Since&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Seconds&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: cardinality of &lt;code&gt;method&lt;/code&gt; is bounded (you know your service's methods). Cardinality of &lt;code&gt;code&lt;/code&gt; is bounded (gRPC codes are a fixed enum). Don't add user-id or request-id as labels — that's cardinality-explosion territory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Auth
&lt;/h3&gt;

&lt;p&gt;Extract bearer token from metadata, verify, inject user context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;authInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FromIncomingContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unauthenticated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"no metadata"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"authorization"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unauthenticated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"no auth token"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;verifyToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unauthenticated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"invalid token"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Skip certain public methods&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isPublic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userCtxKey&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key detail: add the user context here, near the boundary. Service code reads it from context. You don't pass claims as argument through every service method.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rate limiting
&lt;/h3&gt;

&lt;p&gt;Token bucket per caller or per method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;rateLimitInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Limiter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInterceptor&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
        &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResourceExhausted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"rate limited"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Production rate limiting is fancier — per-tenant, distributed state in Redis, burst capacity — but the shape is the same. Reject with &lt;code&gt;ResourceExhausted&lt;/code&gt; before doing work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retry (client-side)
&lt;/h3&gt;

&lt;p&gt;Client interceptor that retries on transient errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;retryClientInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempts&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryClientInterceptor&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
        &lt;span class="n"&gt;cc&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ClientConn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;invoker&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryInvoker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CallOption&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;invoker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;isRetryable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;backoff&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;
            &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Retry is one of the most dangerous interceptors. Get it wrong (no idempotency keys, retry non-idempotent operations, retry storm during outage) and it causes more production incidents than it prevents. Pair with &lt;a href="https://github.com/grpc-ecosystem/go-grpc-middleware" rel="noopener noreferrer"&gt;&lt;code&gt;grpc-middleware/retry&lt;/code&gt;&lt;/a&gt; if you can; it's battle-tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stream Interceptor Trap
&lt;/h2&gt;

&lt;p&gt;Stream interceptors are harder. The interceptor signature gives you a &lt;code&gt;grpc.ServerStream&lt;/code&gt;, which is a bidirectional channel. Logging becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;loggingStreamInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;srv&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;ss&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServerStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StreamServerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StreamHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;srv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"stream=%s duration=%s err=%v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Since&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This only logs at stream-end, not per message. If you want per-message observability, you need to wrap the &lt;code&gt;ServerStream&lt;/code&gt; itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;observedStream&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServerStream&lt;/span&gt;
    &lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recv&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;observedStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;SendMsg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddInt64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServerStream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SendMsg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;observedStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;RecvMsg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServerStream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RecvMsg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddInt64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;recv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then pass the wrapper to the handler. This is the pattern for any stream interceptor that needs per-message visibility.&lt;/p&gt;

&lt;p&gt;Common mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting to propagate context to the wrapper.&lt;/strong&gt; The wrapped stream's &lt;code&gt;Context()&lt;/code&gt; should be the enriched context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-message overhead blows up long streams.&lt;/strong&gt; A message-level log line is fine at 100 msgs/sec. At 100K msgs/sec, it's your dominant cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State in the wrapper not thread-safe.&lt;/strong&gt; Streams can be concurrent on the &lt;code&gt;Send&lt;/code&gt; and &lt;code&gt;Recv&lt;/code&gt; sides. Protect counters.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Testing Interceptor Chains
&lt;/h2&gt;

&lt;p&gt;Unit test each interceptor in isolation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestAuthInterceptor_NoToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c"&gt;// no metadata&lt;/span&gt;
    &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryServerInfo&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;FullMethod&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"/my.Service/Method"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"handler should not be called"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;authInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;require&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unauthenticated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integration-test the chain end-to-end using &lt;code&gt;bufconn&lt;/code&gt; (in-memory connection):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestChain_Ordering&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lis&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufconn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChainUnaryInterceptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observability&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;business&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RegisterMyServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;realImpl&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bufnet"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithContextDialer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DialContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTransportCredentials&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;insecure&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewCredentials&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewMyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c"&gt;// assert on behavior end-to-end&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integration tests catch bugs that unit tests don't: metadata propagation, interceptor ordering, context enrichment visible to the handler. Don't skip them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patterns That Save Time
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;grpc-middleware/v2&lt;/code&gt;&lt;/strong&gt; (&lt;code&gt;github.com/grpc-ecosystem/go-grpc-middleware/v2&lt;/code&gt;) for chain helpers, recovery, and batteries-included interceptors. Don't reinvent every wheel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep error semantics consistent&lt;/strong&gt;. Every interceptor should return &lt;code&gt;status.Error(code, msg)&lt;/code&gt; for failures. Don't return raw Go errors — clients can't parse them properly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skip-list for public methods.&lt;/strong&gt; Auth and rate limit often need to skip health check and reflection endpoints. Keep the skip list in one place.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-service vs global interceptors&lt;/strong&gt;. Most interceptors are global (tracing, metrics, auth). A few might be per-service (e.g., a bespoke rate limiter for a specific hot endpoint). Compose accordingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Panic recovery at the outermost layer&lt;/strong&gt;. A panic in a handler shouldn't kill the server. Use the &lt;code&gt;recovery&lt;/code&gt; middleware from &lt;code&gt;grpc-middleware&lt;/code&gt; or write your own, and put it first in the chain.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Discipline That Makes This Work
&lt;/h2&gt;

&lt;p&gt;Interceptors are the right tool for cross-cutting concerns — the things every RPC needs but the service code shouldn't have to think about. The discipline is: one concern per interceptor, careful ordering, consistent error semantics, tested end-to-end.&lt;/p&gt;

&lt;p&gt;The services I've seen do this well have clean business logic (because the cross-cutting stuff is outside it) and reliable operational behavior (because the interceptor chain is tested as a unit, not just piece-by-piece). The services that do it poorly have auth logic sprinkled through their handlers, tracing that randomly misses requests, and rate limiters that let certain code paths bypass.&lt;/p&gt;

&lt;p&gt;Interceptor order is one of those details that looks tactical and turns out to be architectural. Get it right once; the service's behavior improves every release.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-context-distributed-systems-production/" rel="noopener noreferrer"&gt;Go Context in Distributed Systems: What Actually Works in Production&lt;/a&gt; — the context that flows through every interceptor.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/rpc-vs-nats-who-owns-completion/" rel="noopener noreferrer"&gt;RPC vs NATS: It's Not About Sync vs Async — It's About Who Owns Completion&lt;/a&gt; — the shape of gRPC calls as one side of the bigger messaging picture.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/observability-cost-attribution-dual-path-architecture/" rel="noopener noreferrer"&gt;Observability and Cost Attribution: Why One Pipeline Isn't Enough&lt;/a&gt; — why tracing interceptors alone aren't enough for business attribution.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>grpc</category>
      <category>interceptors</category>
    </item>
    <item>
      <title>Go Generics, One Year In: Which Promises Held, Which Didn't</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 16:30:05 +0000</pubDate>
      <link>https://dev.to/harrisonsec/go-generics-one-year-in-which-promises-held-which-didnt-44m7</link>
      <guid>https://dev.to/harrisonsec/go-generics-one-year-in-which-promises-held-which-didnt-44m7</guid>
      <description>&lt;p&gt;Go 1.18 shipped generics in March 2022. The two years before that were dominated by hopeful blog posts ("finally, a real type system!") and the two years after by the predictable backlash ("why did we even bother, Go was simpler"). I've written production Go before and after. The honest answer is somewhere in the middle and closer to "useful for a narrower set of problems than we expected."&lt;/p&gt;

&lt;p&gt;This is a look back from someone who has shipped generic code in anger and reviewed a lot more of it. What held up. What didn't. What habits to adopt and which to avoid.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Go generics are genuinely valuable for &lt;strong&gt;parametric operations on container-shaped types&lt;/strong&gt; — slices, maps, channels, any-key lookup tables, min/max/sum utilities. Less valuable for "clever abstractions" that dress up control flow as type magic. The clearest gains are in the standard library itself (&lt;code&gt;slices&lt;/code&gt;, &lt;code&gt;maps&lt;/code&gt;) and in domain-specific utility packages. Most application code didn't need generics before and doesn't need them after. The mistake is not using generics; it's using them for things interfaces already handled fine.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Generics Actually Are
&lt;/h2&gt;

&lt;p&gt;Go generics are &lt;strong&gt;type parameters on functions and types&lt;/strong&gt;. A function like &lt;code&gt;slices.Contains&lt;/code&gt; can be written once, work for any slice element type, and still be type-checked at compile time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt; &lt;span class="n"&gt;comparable&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three features you should know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Type parameters&lt;/strong&gt;: the &lt;code&gt;[E any]&lt;/code&gt; or &lt;code&gt;[E comparable]&lt;/code&gt; in brackets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constraints&lt;/strong&gt;: tell the compiler what operations the type parameter supports. &lt;code&gt;any&lt;/code&gt;, &lt;code&gt;comparable&lt;/code&gt;, or custom interfaces like &lt;code&gt;constraints.Ordered&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approximate constraints&lt;/strong&gt;: &lt;code&gt;~[]E&lt;/code&gt; means "any type whose underlying type is &lt;code&gt;[]E&lt;/code&gt;" — lets you be flexible about named slice types.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What they aren't: Java-style wildcards, C++ SFINAE, or anything that mimics variance. The design is deliberately narrower than most prior languages. It's more like Rust's generics, minus the trait system's complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Generics Clearly Win
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Standard-library style container and utility functions
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;slices&lt;/code&gt; and &lt;code&gt;maps&lt;/code&gt; packages in the standard library are the canonical example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;slices&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"alice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;slices&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before generics, these were either hand-written per-type (tedious, error-prone), done via &lt;code&gt;interface{}&lt;/code&gt; (type-unsafe, slow), or done via &lt;code&gt;reflect&lt;/code&gt; (slow and error-prone). Generics are strictly better for these.&lt;/p&gt;

&lt;p&gt;The same pattern shows up in third-party libraries: &lt;code&gt;samber/lo&lt;/code&gt; (JS-style utilities), &lt;code&gt;thoas/go-funk&lt;/code&gt; (functional helpers), and many domain-specific ones. If you reach for lodash-style helpers in JavaScript, you'll want similar in Go, and generics made that workable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Concurrency helpers
&lt;/h3&gt;

&lt;p&gt;Generic worker pools, futures, result types — these all benefit from generics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Future&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;done&lt;/span&gt; &lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;val&lt;/span&gt;  &lt;span class="n"&gt;T&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt;  &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Future&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;done&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before generics, you'd have had an &lt;code&gt;interface{}&lt;/code&gt; return and a type assertion at the call site. Now you can express "this future produces a T" in the type. Cleaner at the boundary, safer at the call site.&lt;/p&gt;

&lt;h3&gt;
  
  
  Typed collections
&lt;/h3&gt;

&lt;p&gt;If your system has a genuinely typed container use case — say, an ordered map keyed by a domain ID — generics let you write it once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;OrderedMap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;comparable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;  &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;V&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a rare case where "custom generic container" is the right tool. The majority of code doesn't need this. But when you do need it, the generics version is much better than the &lt;code&gt;interface{}&lt;/code&gt; alternative.&lt;/p&gt;

&lt;h3&gt;
  
  
  Numerical / algorithmic code
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;constraints.Ordered&lt;/code&gt; (or its post-1.21 replacement &lt;code&gt;cmp.Ordered&lt;/code&gt;) is the key constraint for "works for any numeric or ordered type":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Max&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;cmp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ordered&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Math helpers, min/max, sum, average — all cleanly generic. Readable, type-safe, performant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Generics Don't Help, Or Hurt
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "Generic services" and similar framework-y code
&lt;/h3&gt;

&lt;p&gt;I've seen codebases where someone wrote a generic "repository" type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Repository&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Repository&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="n"&gt;FindByID&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Repository&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="n"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The instinct — "all repositories do the same thing" — is mostly wrong. Real repositories differ in query shape, error cases, caching rules, transaction boundaries. Forcing them behind a generic interface either (a) produces a lowest-common-denominator API that doesn't fit any actual use, or (b) gets so many type parameters that readability collapses.&lt;/p&gt;

&lt;p&gt;The Go idiom is usually better: one non-generic &lt;code&gt;UserRepository&lt;/code&gt;, one &lt;code&gt;OrderRepository&lt;/code&gt;, etc. Each concrete, each tuned to its domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Over-constrained helpers
&lt;/h3&gt;

&lt;p&gt;If your "generic" function has five type parameters with custom constraints each, readability dies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Complicated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;comparable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;Hashable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;V&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;F&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;M&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is technically legal. Reading it, you realize it's a glorified map-with-cache-and-error. Interfaces or function types would have been clearer. Generics don't make complex APIs simple; they just let you make them complex in a type-checked way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Behavioral polymorphism
&lt;/h3&gt;

&lt;p&gt;Interfaces are still the right tool when different types have &lt;strong&gt;different behavior&lt;/strong&gt;. A generic &lt;code&gt;Process[T any](x T) error&lt;/code&gt; doesn't help if you actually want different logic per type. You want an interface with a method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Good use of interface&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Processor&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Bad use of generics&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;ProcessGeneric&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// can't actually differentiate behavior&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The separation: &lt;strong&gt;generics for parametric operations (same logic, any type), interfaces for polymorphic behavior (different logic per type).&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance: Usually a Wash
&lt;/h2&gt;

&lt;p&gt;The performance story is more nuanced than either "generics are slow" or "generics are free."&lt;/p&gt;

&lt;p&gt;Go's current generic implementation uses &lt;strong&gt;GCShape stenciling&lt;/strong&gt; — one compiled version per "GC shape" (roughly, per memory layout). This is between full monomorphization (one version per type, like Rust) and type-erased dispatch (one version total, like Java's reified-erased hybrid).&lt;/p&gt;

&lt;p&gt;Practical implications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small primitive types (int, int64)&lt;/strong&gt; often get specialized versions. Competitive with hand-written.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pointer-sized types (most structs, interfaces)&lt;/strong&gt; share code. Slightly slower than hand-written but usually faster than interface-based dispatch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Call overhead is similar to function calls&lt;/strong&gt;, not interface dispatch. No devirtualization issue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compile times increase&lt;/strong&gt;, especially for libraries with many instantiations. This is the real cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Benchmarks I've seen: generic versions are within 5% of hand-written equivalents, and consistently faster than &lt;code&gt;interface{}&lt;/code&gt;-based alternatives. Performance is almost never the deciding factor — readability and design fit matter more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Idioms That Emerged
&lt;/h2&gt;

&lt;p&gt;Over the years since 1.18, a few conventions have stuck:&lt;/p&gt;

&lt;h3&gt;
  
  
  Prefer &lt;code&gt;any&lt;/code&gt; to &lt;code&gt;interface{}&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;any&lt;/code&gt; is a type alias for &lt;code&gt;interface{}&lt;/code&gt; added in 1.18. Shorter, clearer. Use it everywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Single-letter type parameters for simple cases, descriptive for complex
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;T&lt;/code&gt;, &lt;code&gt;K&lt;/code&gt;, &lt;code&gt;V&lt;/code&gt; for the obvious cases. More descriptive when the role is specific:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Reduce&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Out&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;In&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;In&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;initial&lt;/span&gt; &lt;span class="n"&gt;Out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Out&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Put constraints in a dedicated package
&lt;/h3&gt;

&lt;p&gt;If you have several custom constraints, group them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;constraints&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Ordered&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Numeric&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="err"&gt;~&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The standard &lt;code&gt;golang.org/x/exp/constraints&lt;/code&gt; (and later &lt;code&gt;cmp.Ordered&lt;/code&gt; in 1.21) set the pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use &lt;code&gt;~T&lt;/code&gt; approximations for flexibility
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;~[]E&lt;/code&gt; includes named slice types. &lt;code&gt;~int&lt;/code&gt; includes &lt;code&gt;type MyInt int&lt;/code&gt;. Almost always the right choice for generic parametric code; refuses arbitrary extension.&lt;/p&gt;

&lt;h3&gt;
  
  
  Never overload generic helpers to do too much
&lt;/h3&gt;

&lt;p&gt;Each generic function should do one parametric thing. Generic helpers that try to be many things at once collapse under type-parameter weight.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Standard Library Won
&lt;/h2&gt;

&lt;p&gt;The clearest vindication of Go generics is what happened to the standard library. &lt;code&gt;slices&lt;/code&gt;, &lt;code&gt;maps&lt;/code&gt;, &lt;code&gt;cmp.Ordered&lt;/code&gt; — these additions are uncontroversially better than the pre-1.18 alternatives. A lot of code that used to be hand-rolled or based on &lt;code&gt;sort.Interface&lt;/code&gt; has cleaner replacements.&lt;/p&gt;

&lt;p&gt;The user-land picture is more mixed. Libraries that benefit from generics genuinely use them well (&lt;code&gt;samber/lo&lt;/code&gt;, &lt;code&gt;kelindar/column&lt;/code&gt;, many others). Libraries that don't need them mostly haven't been retrofitted with them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Do Now
&lt;/h2&gt;

&lt;p&gt;A few simple rules I apply:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Prefer standard library generic helpers over hand-rolled.&lt;/strong&gt; &lt;code&gt;slices.Contains&lt;/code&gt;, &lt;code&gt;slices.Sort&lt;/code&gt;, &lt;code&gt;maps.Keys&lt;/code&gt; — use them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write a generic helper only when I have at least two concrete use cases for it.&lt;/strong&gt; One use case is a pattern waiting to be born, not necessarily a generic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefer functions to methods on generic types&lt;/strong&gt; when possible. Generic methods have more friction (can't overload by type, can't add methods outside the defining package).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep constraints simple.&lt;/strong&gt; &lt;code&gt;any&lt;/code&gt;, &lt;code&gt;comparable&lt;/code&gt;, &lt;code&gt;cmp.Ordered&lt;/code&gt;, and domain-specific single-type-union constraints cover 95% of cases. More complex constraints usually mean the abstraction is wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never turn interfaces into generics just because you can.&lt;/strong&gt; If the types have genuinely different behavior, an interface is right.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where Generics Actually Sit Now
&lt;/h2&gt;

&lt;p&gt;Generics were oversold before they landed ("Go finally becomes a real language!") and oversampled in the aftermath ("generics everywhere!"). The truth is narrower and more boring: they're a useful addition for a specific class of problems, mostly centered on parametric operations over containers and numerics. They improved the standard library. They haven't changed the shape of most Go code.&lt;/p&gt;

&lt;p&gt;If you've been writing Go and wondering whether you're missing out by not using generics, the answer is almost certainly no. Code without them is still idiomatic. Code with them, when the use case fits, is cleaner. Neither is dominant. Both are fine.&lt;/p&gt;

&lt;p&gt;The one concrete thing I'd say: &lt;strong&gt;learn the generic parts of the standard library&lt;/strong&gt;. &lt;code&gt;slices&lt;/code&gt;, &lt;code&gt;maps&lt;/code&gt;, &lt;code&gt;cmp.Ordered&lt;/code&gt;. Use them reflexively. Stop hand-rolling &lt;code&gt;indexOf&lt;/code&gt; and &lt;code&gt;contains&lt;/code&gt;. Everything else can wait until you have a real problem that generics solve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-profiling-pprof-escape-analysis-inlining/" rel="noopener noreferrer"&gt;Go Profiling in Anger: pprof, Escape Analysis, and Inlining Without Magic&lt;/a&gt; — the performance toolchain that tells you whether your generic code actually matches the hand-written version.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-sync-pool-buffer-reuse-when-it-helps/" rel="noopener noreferrer"&gt;sync.Pool in Go: When It Actually Helps, and When It Quietly Hurts&lt;/a&gt; — another feature most commonly misapplied.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/scale-up-scale-out-every-language-wins-somewhere/" rel="noopener noreferrer"&gt;Scale-Up vs Scale-Out: Why Every Language Wins Somewhere&lt;/a&gt; — the meta-question behind every language-feature debate.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>generics</category>
      <category>typeparameters</category>
    </item>
    <item>
      <title>Go Profiling in Anger: pprof, Escape Analysis, and Inlining Without Magic</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Mon, 20 Apr 2026 16:29:23 +0000</pubDate>
      <link>https://dev.to/harrisonsec/go-profiling-in-anger-pprof-escape-analysis-and-inlining-without-magic-3ij</link>
      <guid>https://dev.to/harrisonsec/go-profiling-in-anger-pprof-escape-analysis-and-inlining-without-magic-3ij</guid>
      <description>&lt;p&gt;Go's performance culture has a ritual quality. "Use sync.Pool." "Avoid interface boxing." "Preallocate slices." Copy-pasted from blog posts and applied without measurement. Sometimes helpful. Often hollow.&lt;/p&gt;

&lt;p&gt;The honest answer is that Go performance work is mostly &lt;strong&gt;just profiling&lt;/strong&gt;. Good profiling tells you what's actually slow. Bad profiling — or no profiling — leaves you guessing. The toolchain that Go ships with is genuinely excellent; more engineers should use it, and fewer should follow checklist optimizations they haven't measured.&lt;/p&gt;

&lt;p&gt;This is a practical, end-to-end guide to pprof, escape analysis, and inlining — the three Go-specific tools that answer most performance questions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Start every Go perf investigation with a CPU pprof of the hot path under realistic load. 80% of issues are obvious in the flame graph. For the remaining 20%, add a heap profile and look for allocation pressure driving GC. Only after you've localized the problem with real data should you reach for micro-optimizations: escape analysis via &lt;code&gt;-gcflags='-m'&lt;/code&gt;, inlining hints, and targeted benchmark-driven rewrites. Skip the profile step, and you are optimizing the wrong thing.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Investigation Flow
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBTdGFydChbUGVyZm9ybWFuY2UgY29uY2Vybl0pIC0tPiBDUFVbVGFrZSBDUFUgcHJvZmlsZTxici8-LWh0dHAgcHByb2YgwrcgMzBzIHVuZGVyIGxvYWRdCiAgICBDUFUgLS0-IEhvdHtIb3QgY29kZTxici8-b2J2aW91cz99CiAgICBIb3QgLS0-fFllc3wgRml4MVtGaXggdGhlIGhvdCBwYXRoIMK3IHJlLW1lYXN1cmVdCiAgICBIb3QgLS0-fE5vIMK3IEdDIGhpZ2h8IEhlYXBbVGFrZSBoZWFwIC8gYWxsb2MgcHJvZmlsZV0KICAgIEhlYXAgLS0-IEFsbG9jU2l0ZXtTcGVjaWZpYzxici8-YWxsb2Mgc2l0ZT99CiAgICBBbGxvY1NpdGUgLS0-fFllc3wgRXNjYXBlW0NoZWNrIC1nY2ZsYWdzPSctbSc8YnIvPmZvciB0aGF0IGZ1bmN0aW9uXQogICAgQWxsb2NTaXRlIC0tPnxOb3wgQmVuY2hNaWNyb1tJc29sYXRlIGluIGJlbmNobWFyazxici8-LWJlbmNobWVtIMK3IC1jb3VudD01XQogICAgRXNjYXBlIC0tPiBGaXgyW0ZpeCBhbGxvYyDCtyByZS1tZWFzdXJlXQogICAgQmVuY2hNaWNybyAtLT4gRml4M1tPcHRpbWl6ZSBvciBhY2NlcHRdCiAgICBGaXgxIC0tPiBWZXJpZnlbUHJvZmlsZSBhZ2FpbiDCtyBjb25maXJtXQogICAgRml4MiAtLT4gVmVyaWZ5CiAgICBGaXgzIC0tPiBWZXJpZnkKCiAgICBjbGFzc0RlZiBzdGFydCBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIGFjdGlvbiBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzRGVmIHZlcmlmeSBmaWxsOiNmZWY1ZTcsc3Ryb2tlOiNiNzc5MWYKICAgIGNsYXNzIFN0YXJ0IHN0YXJ0CiAgICBjbGFzcyBGaXgxLEZpeDIsRml4MyBhY3Rpb24KICAgIGNsYXNzIFZlcmlmeSB2ZXJpZnk%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBTdGFydChbUGVyZm9ybWFuY2UgY29uY2Vybl0pIC0tPiBDUFVbVGFrZSBDUFUgcHJvZmlsZTxici8-LWh0dHAgcHByb2YgwrcgMzBzIHVuZGVyIGxvYWRdCiAgICBDUFUgLS0-IEhvdHtIb3QgY29kZTxici8-b2J2aW91cz99CiAgICBIb3QgLS0-fFllc3wgRml4MVtGaXggdGhlIGhvdCBwYXRoIMK3IHJlLW1lYXN1cmVdCiAgICBIb3QgLS0-fE5vIMK3IEdDIGhpZ2h8IEhlYXBbVGFrZSBoZWFwIC8gYWxsb2MgcHJvZmlsZV0KICAgIEhlYXAgLS0-IEFsbG9jU2l0ZXtTcGVjaWZpYzxici8-YWxsb2Mgc2l0ZT99CiAgICBBbGxvY1NpdGUgLS0-fFllc3wgRXNjYXBlW0NoZWNrIC1nY2ZsYWdzPSctbSc8YnIvPmZvciB0aGF0IGZ1bmN0aW9uXQogICAgQWxsb2NTaXRlIC0tPnxOb3wgQmVuY2hNaWNyb1tJc29sYXRlIGluIGJlbmNobWFyazxici8-LWJlbmNobWVtIMK3IC1jb3VudD01XQogICAgRXNjYXBlIC0tPiBGaXgyW0ZpeCBhbGxvYyDCtyByZS1tZWFzdXJlXQogICAgQmVuY2hNaWNybyAtLT4gRml4M1tPcHRpbWl6ZSBvciBhY2NlcHRdCiAgICBGaXgxIC0tPiBWZXJpZnlbUHJvZmlsZSBhZ2FpbiDCtyBjb25maXJtXQogICAgRml4MiAtLT4gVmVyaWZ5CiAgICBGaXgzIC0tPiBWZXJpZnkKCiAgICBjbGFzc0RlZiBzdGFydCBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIGFjdGlvbiBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzRGVmIHZlcmlmeSBmaWxsOiNmZWY1ZTcsc3Ryb2tlOiNiNzc5MWYKICAgIGNsYXNzIFN0YXJ0IHN0YXJ0CiAgICBjbGFzcyBGaXgxLEZpeDIsRml4MyBhY3Rpb24KICAgIGNsYXNzIFZlcmlmeSB2ZXJpZnk%3D" alt="Start([Performance concern]) --&gt; CPU[Take CPU profile&lt;br/&gt;-http pprof · 30s under load]" width="803" height="1086"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  CPU Profiling: The First Thing, Always
&lt;/h2&gt;

&lt;p&gt;Every Go binary can expose a pprof HTTP endpoint in two lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="s"&gt;"net/http/pprof"&lt;/span&gt;
&lt;span class="c"&gt;// later&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"localhost:6060"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under load, grab a CPU profile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go tool pprof &lt;span class="nt"&gt;-http&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;:9999 http://localhost:6060/debug/pprof/profile?seconds&lt;span class="o"&gt;=&lt;/span&gt;30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens a flame graph in your browser. The wide blocks are where CPU time is spent. Usually the answer is immediate — "oh, JSON encoding is 40% of my CPU; let me switch to a faster encoder." Or "regex compilation is in the hot path because someone forgot to pre-compile."&lt;/p&gt;

&lt;p&gt;A few things that look surprising on first profile but shouldn't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;runtime.mallocgc&lt;/code&gt; taking 10%+&lt;/strong&gt; is GC pressure. You're allocating a lot. Look at heap profile next.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;runtime.schedule&lt;/code&gt; or &lt;code&gt;runtime.findrunnable&lt;/code&gt; taking 5%+&lt;/strong&gt; means you have too many goroutines churning. Check if you're spawning per-request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;syscall.Syscall&lt;/code&gt; high&lt;/strong&gt; means you're system-call-heavy — usually I/O. Either buffer/batch, or consider epoll-direct if it's in your hot path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;mutex.Lock&lt;/code&gt; visible&lt;/strong&gt; means contention. Either shrink the lock hold time or shard the lock.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don't guess your way through these. Click into each, read the stack, find the user code that caused it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Heap Profiling: When CPU Points to GC
&lt;/h2&gt;

&lt;p&gt;If &lt;code&gt;runtime.mallocgc&lt;/code&gt; shows up in your CPU profile as a non-trivial chunk, heap profile tells you why:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go tool pprof &lt;span class="nt"&gt;-http&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;:9999 http://localhost:6060/debug/pprof/heap
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go tool pprof &lt;span class="nt"&gt;-http&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;:9999 http://localhost:6060/debug/pprof/allocs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;heap&lt;/code&gt; shows current memory usage. &lt;code&gt;allocs&lt;/code&gt; shows cumulative allocations since program start — this is usually what you want to optimize.&lt;/p&gt;

&lt;p&gt;In the flame graph, look for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Specific allocation sites taking disproportionate share.&lt;/strong&gt; A single line of code creating 50% of allocations is an obvious target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calls to &lt;code&gt;makeslice&lt;/code&gt;, &lt;code&gt;makemap&lt;/code&gt;, &lt;code&gt;newobject&lt;/code&gt;&lt;/strong&gt; with known-size inputs. If you know the size, preallocate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interface boxing in hot paths.&lt;/strong&gt; Every time you pass a concrete type through an &lt;code&gt;interface{}&lt;/code&gt; argument in a tight loop, the runtime may heap-allocate the boxed value.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;String concatenation with &lt;code&gt;+&lt;/code&gt;.&lt;/strong&gt; This is the textbook preventable allocation — use &lt;code&gt;strings.Builder&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal isn't "zero allocations" — that's usually not practical. The goal is "allocations per operation in a tight, repeated path are bounded and understood."&lt;/p&gt;

&lt;h2&gt;
  
  
  Escape Analysis: The Compiler's Story
&lt;/h2&gt;

&lt;p&gt;Go's compiler decides at compile time whether a variable lives on the stack (free, garbage-collected with the function) or the heap (allocated, GC-tracked). This is called escape analysis.&lt;/p&gt;

&lt;p&gt;To see the analysis for your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go build &lt;span class="nt"&gt;-gcflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'-m'&lt;/span&gt; ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./foo.go:12:6: can inline hotFunction
./foo.go:15:10: &amp;amp;Thing{} escapes to heap
./foo.go:18:14: make([]int, 100) does not escape
./foo.go:22:6: parameter "x" escapes to heap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key things to read for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;escapes to heap&lt;/code&gt;&lt;/strong&gt; — this allocation is heap-allocated. If it's in a hot path, investigate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;does not escape&lt;/code&gt;&lt;/strong&gt; — stack-allocated, free. You want most short-lived locals to do this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;parameter escapes to heap&lt;/code&gt;&lt;/strong&gt; — the caller's passed value escapes because this function keeps a reference to it. Often fixable by taking a copy or not storing a reference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most common surprise: &lt;strong&gt;passing a value to a function that eventually hands it to &lt;code&gt;interface{}&lt;/code&gt; causes the value to escape&lt;/strong&gt;. A pattern like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;handleRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"got request"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// req.ID boxes to interface{} and may escape&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;req.ID&lt;/code&gt; escapes because of the &lt;code&gt;...interface{}&lt;/code&gt; argument. In a tight path, this is measurable. Fix: use a typed logger that takes concrete types, or accept the cost because logging on the hot path is usually not the hot path.&lt;/p&gt;

&lt;p&gt;Escape analysis is one of those things where reading the output a few times is worth it. You start seeing your code differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inlining: When the Compiler Eliminates the Call
&lt;/h2&gt;

&lt;p&gt;Go's compiler inlines small functions to avoid call overhead. Seeing what got inlined:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go build &lt;span class="nt"&gt;-gcflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'-m'&lt;/span&gt; ./... 2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'can inline|cannot inline'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./foo.go:12:6: can inline hotFunction
./foo.go:18:6: cannot inline bigFunction: function too complex: cost 117 exceeds budget 80
./foo.go:22:6: cannot inline interfacingFunction: call to unknown method
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Default budget is 80 AST nodes. Hard blockers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Calls through interfaces.&lt;/strong&gt; The compiler doesn't know what concrete method gets called. No inlining.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calls to functions that contain loops with &lt;code&gt;for range&lt;/code&gt; over a channel.&lt;/strong&gt; Historically blocked, though the mid-stack inliner has improved this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recursive functions.&lt;/strong&gt; Obvious.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functions over the budget.&lt;/strong&gt; Refactor smaller if the call is hot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When to care:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Never in normal code. Go inlines what it can; your code runs.&lt;/li&gt;
&lt;li&gt;Sometimes in tight hot loops where the call overhead is 10%+ of the total work. Benchmark shows it.&lt;/li&gt;
&lt;li&gt;Occasionally when you control an interface boundary and can replace it with a concrete type on a hot path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don't structure your code around inlining. Code readability beats hypothetical call-overhead wins in nearly every case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks: The Ground Truth
&lt;/h2&gt;

&lt;p&gt;Every perf claim should be backed by a benchmark. &lt;code&gt;testing.B&lt;/code&gt; is the tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkEncodeResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;newResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReportAllocs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResetTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-bench&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;BenchmarkEncode &lt;span class="nt"&gt;-benchmem&lt;/span&gt; &lt;span class="nt"&gt;-count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;-count=5&lt;/code&gt; runs each bench 5 times, so you can compare variance. Don't trust a single run. Hardware, OS scheduling, thermals — all add noise.&lt;/p&gt;

&lt;p&gt;For comparing two implementations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-bench&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;BenchmarkEncodeResponse &lt;span class="nt"&gt;-benchmem&lt;/span&gt; &lt;span class="nt"&gt;-count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10 ./... &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; old.txt
&lt;span class="gp"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;change code&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-bench&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;BenchmarkEncodeResponse &lt;span class="nt"&gt;-benchmem&lt;/span&gt; &lt;span class="nt"&gt;-count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10 ./... &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; new.txt
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;benchstat old.txt new.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;benchstat&lt;/code&gt; (&lt;code&gt;golang.org/x/perf/cmd/benchstat&lt;/code&gt;) gives you statistical significance. If the difference isn't statistically meaningful, you didn't actually improve anything — you just rolled the dice differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 80/20 of Go Performance
&lt;/h2&gt;

&lt;p&gt;After enough of this work, a few patterns dominate the real wins:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Query shape, not language.&lt;/strong&gt; A slow endpoint is usually doing 10 DB queries when it could do 1. Go is almost never the bottleneck; the data layer is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network hop count.&lt;/strong&gt; Every inter-service call adds latency. Merging two small services or co-locating tight integrations beats any language-level optimization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching at the right layer.&lt;/strong&gt; A well-placed LRU cache saves more than micro-optimizing the uncached path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preallocating known-size slices/maps.&lt;/strong&gt; &lt;code&gt;make([]int, 0, n)&lt;/code&gt; when you know n is almost free. The default &lt;code&gt;make([]int, 0)&lt;/code&gt; reallocates as you append.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoiding interface boxing in loops.&lt;/strong&gt; This is the one micro-optimization that regularly shows up in real profiles.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else — &lt;code&gt;sync.Pool&lt;/code&gt;, escape analysis hand-tuning, loop unrolling — is a long-tail optimization. Worth it when profiling tells you it is. Premature otherwise.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Habit I Recommend
&lt;/h2&gt;

&lt;p&gt;Before adding any optimization, do exactly three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take a profile with the optimization off. Save it.&lt;/li&gt;
&lt;li&gt;Apply the optimization.&lt;/li&gt;
&lt;li&gt;Take a profile with the optimization on. Compare.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the comparison doesn't show clear improvement on the metric you cared about, revert. Do not add complexity without evidence.&lt;/p&gt;

&lt;p&gt;This sounds obvious. Almost nobody does it. Most perf work in Go codebases accumulates dead optimizations that add nothing or actively hurt — but nobody knows which, because nobody benchmarked.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Habit That Compounds
&lt;/h2&gt;

&lt;p&gt;Go's performance tooling is better than Go's performance culture gives it credit for. pprof, escape analysis, inlining diagnostics, and benchmarks are built in. They're precise. They tell you the truth.&lt;/p&gt;

&lt;p&gt;The reason most Go code isn't as fast as it could be isn't that Go is slow (it isn't). It's that engineers copy-paste optimizations they haven't measured, call the work done, and move on. The few engineers who profile first and optimize second write code that's actually fast — and usually simpler than the ritual-heavy version.&lt;/p&gt;

&lt;p&gt;Profile first. Everything else follows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-sync-pool-buffer-reuse-when-it-helps/" rel="noopener noreferrer"&gt;sync.Pool in Go: When It Actually Helps, and When It Quietly Hurts&lt;/a&gt; — the one Go optimization most likely to be misapplied.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-millions-connections-user-space-context-switching/" rel="noopener noreferrer"&gt;Why Go Handles Millions of Connections: User-Space Context Switching, Explained&lt;/a&gt; — understanding the runtime is the prerequisite to understanding profiles.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/testing-real-world-go-backends/" rel="noopener noreferrer"&gt;Testing Real-World Go Backends Isn't What Many People Think&lt;/a&gt; — benchmarking is the last mile of testing.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>performance</category>
      <category>pprof</category>
    </item>
  </channel>
</rss>
