<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: amir</title>
    <description>The latest articles on DEV Community by amir (@amirsefati).</description>
    <link>https://dev.to/amirsefati</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F161199%2F1386903f-a273-4172-a7ba-0585d3e4d5dd.jpeg</url>
      <title>DEV Community: amir</title>
      <link>https://dev.to/amirsefati</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amirsefati"/>
    <language>en</language>
    <item>
      <title>From Elasticsearch Bottlenecks to Weaviate: How We Built Fast Hybrid Search in Production</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Wed, 03 Jun 2026 09:36:55 +0000</pubDate>
      <link>https://dev.to/amirsefati/from-elasticsearch-bottlenecks-to-weaviate-how-we-built-fast-hybrid-search-in-production-b4i</link>
      <guid>https://dev.to/amirsefati/from-elasticsearch-bottlenecks-to-weaviate-how-we-built-fast-hybrid-search-in-production-b4i</guid>
      <description>&lt;p&gt;For years, Elasticsearch was one of those tools I would almost automatically reach for whenever a system needed search.&lt;/p&gt;

&lt;p&gt;And honestly, for many use cases, it is still excellent.&lt;/p&gt;

&lt;p&gt;If you need full-text search, filtering, aggregations, faceting, observability queries, or log exploration, Elasticsearch is a very mature and powerful engine. Its lexical search capabilities, especially through BM25 and inverted indexes, are battle-tested.&lt;/p&gt;

&lt;p&gt;But my problem started when the product requirement changed from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Find documents that contain these words.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Find documents that match the meaning of what the user is asking, but still respect exact keywords when they matter.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is where pure keyword search was no longer enough.&lt;/p&gt;

&lt;p&gt;We needed &lt;strong&gt;real hybrid search&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not just semantic search.&lt;br&gt;&lt;br&gt;
Not just BM25.&lt;br&gt;&lt;br&gt;
Not a manually glued-together ranking system.&lt;/p&gt;

&lt;p&gt;We needed a search engine that could combine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exact keyword matching,&lt;/li&gt;
&lt;li&gt;semantic similarity,&lt;/li&gt;
&lt;li&gt;metadata filtering,&lt;/li&gt;
&lt;li&gt;predictable latency,&lt;/li&gt;
&lt;li&gt;and production-level throughput.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first, I tried to make Elasticsearch do it.&lt;/p&gt;

&lt;p&gt;That decision taught me a lot.&lt;/p&gt;

&lt;p&gt;Eventually, it also pushed me toward Weaviate.&lt;/p&gt;

&lt;p&gt;This article is a practical breakdown of what went wrong, why hybrid search is harder than it looks, how Weaviate approaches the problem, and how I think about evaluating vector search quality in production.&lt;/p&gt;


&lt;h2&gt;
  
  
  The problem: keyword search was no longer enough
&lt;/h2&gt;

&lt;p&gt;Traditional search engines are very good at lexical matching.&lt;/p&gt;

&lt;p&gt;If a user searches for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linux kernel tuning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BM25 can rank documents that contain terms like &lt;code&gt;linux&lt;/code&gt;, &lt;code&gt;kernel&lt;/code&gt;, and &lt;code&gt;tuning&lt;/code&gt; very well.&lt;/p&gt;

&lt;p&gt;But what happens when the user searches for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;how to reduce context switching overhead in a container runtime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The best document might not contain that exact sentence.&lt;/p&gt;

&lt;p&gt;It might talk about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux namespaces&lt;/li&gt;
&lt;li&gt;cgroups&lt;/li&gt;
&lt;li&gt;scheduler behavior&lt;/li&gt;
&lt;li&gt;CPU pressure&lt;/li&gt;
&lt;li&gt;process isolation&lt;/li&gt;
&lt;li&gt;runtime overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A pure keyword engine may miss or rank it lower because the vocabulary is different.&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;semantic search&lt;/strong&gt; becomes useful.&lt;/p&gt;

&lt;p&gt;Instead of only matching words, semantic search converts both documents and queries into vectors, usually called &lt;strong&gt;embeddings&lt;/strong&gt;. These embeddings represent the meaning of the text in a high-dimensional space.&lt;/p&gt;

&lt;p&gt;Documents with similar meaning end up close to each other in that vector space.&lt;/p&gt;

&lt;p&gt;So now the search engine can understand that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"improve API latency under load"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;is related to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"reduce p99 response time during high traffic"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;even if the exact words are different.&lt;/p&gt;

&lt;p&gt;But semantic search alone is also not perfect.&lt;/p&gt;

&lt;p&gt;Sometimes exact terms matter a lot.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PostgreSQL jsonb index performance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this query, the terms &lt;code&gt;PostgreSQL&lt;/code&gt;, &lt;code&gt;jsonb&lt;/code&gt;, and &lt;code&gt;index&lt;/code&gt; are not optional. A purely semantic result about generic database performance may sound similar, but it is not good enough.&lt;/p&gt;

&lt;p&gt;That is why hybrid search matters.&lt;/p&gt;

&lt;p&gt;A strong search system should understand both:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What the user literally typed&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What the user actually means&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  My first approach: forcing Elasticsearch to behave like a vector search engine
&lt;/h2&gt;

&lt;p&gt;The first implementation was based on Elasticsearch.&lt;/p&gt;

&lt;p&gt;The idea looked simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use BM25 for keyword scoring.&lt;/li&gt;
&lt;li&gt;Use vector similarity for semantic scoring.&lt;/li&gt;
&lt;li&gt;Combine both scores into one final score.&lt;/li&gt;
&lt;li&gt;Return the top results.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Conceptually, it made sense.&lt;/p&gt;

&lt;p&gt;In practice, it became painful.&lt;/p&gt;

&lt;p&gt;The challenge was score fusion.&lt;/p&gt;

&lt;p&gt;BM25 scores and vector similarity scores are not naturally comparable.&lt;/p&gt;

&lt;p&gt;BM25 might produce values like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;12.4
27.8
43.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;while cosine similarity might produce values like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.71
0.82
0.91
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you combine them naively, the ranking becomes unstable.&lt;/p&gt;

&lt;p&gt;You cannot simply do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;final_score = bm25_score + cosine_similarity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;because the scales are completely different.&lt;/p&gt;

&lt;p&gt;You need normalization, weighting, ranking logic, and careful testing.&lt;/p&gt;

&lt;p&gt;In my first version, I tried using custom scripting logic to combine lexical and vector scores at query time.&lt;/p&gt;

&lt;p&gt;That was the mistake.&lt;/p&gt;




&lt;h2&gt;
  
  
  The production bottleneck
&lt;/h2&gt;

&lt;p&gt;Custom scoring logic can look elegant in a prototype.&lt;/p&gt;

&lt;p&gt;Under production traffic, it can become expensive very quickly.&lt;/p&gt;

&lt;p&gt;The main issue was that vector math and score fusion were happening during query execution. That meant every search request had to do extra scoring work on top of the normal search pipeline.&lt;/p&gt;

&lt;p&gt;As traffic increased, the symptoms became obvious:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU usage started spiking.&lt;/li&gt;
&lt;li&gt;Tail latency became unstable.&lt;/li&gt;
&lt;li&gt;p95 and p99 response times became much worse.&lt;/li&gt;
&lt;li&gt;Scaling required more resources than expected.&lt;/li&gt;
&lt;li&gt;Search quality improvements were difficult to test safely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The biggest lesson was not that Elasticsearch is bad.&lt;/p&gt;

&lt;p&gt;The lesson was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A system optimized for lexical search should not always be forced to become the core of a high-throughput vector search architecture.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Elasticsearch can support vector search in modern versions, and for many teams it may be enough. But in my case, the implementation became too complex and too expensive for the kind of hybrid search behavior we needed.&lt;/p&gt;

&lt;p&gt;I wanted a system where hybrid search was not an afterthought.&lt;/p&gt;

&lt;p&gt;That is when I started looking seriously at Weaviate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Weaviate felt like the right tool
&lt;/h2&gt;

&lt;p&gt;Weaviate is an open-source vector database written in Go.&lt;/p&gt;

&lt;p&gt;That immediately caught my attention because I work a lot with Go, backend architecture, and performance-sensitive systems. But the language itself was not the main reason.&lt;/p&gt;

&lt;p&gt;The real reason was the architecture.&lt;/p&gt;

&lt;p&gt;Weaviate was designed around vector search from the beginning.&lt;/p&gt;

&lt;p&gt;Instead of treating embeddings as an extra field attached to a traditional search engine, Weaviate treats vectors as a first-class part of the storage and search model.&lt;/p&gt;

&lt;p&gt;At a high level, it gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vector search,&lt;/li&gt;
&lt;li&gt;BM25 keyword search,&lt;/li&gt;
&lt;li&gt;hybrid search,&lt;/li&gt;
&lt;li&gt;metadata filtering,&lt;/li&gt;
&lt;li&gt;schema-based data modeling,&lt;/li&gt;
&lt;li&gt;GraphQL and REST APIs,&lt;/li&gt;
&lt;li&gt;and production-oriented indexing options.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For my use case, the most important part was that Weaviate could combine semantic and keyword search natively.&lt;/p&gt;

&lt;p&gt;No custom query-time score script.&lt;/p&gt;

&lt;p&gt;No fragile manual scoring layer.&lt;/p&gt;

&lt;p&gt;No forcing two different ranking systems together in application code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick mental model: how vector search works
&lt;/h2&gt;

&lt;p&gt;Before going deeper into Weaviate, it is useful to understand the basic idea behind vector search.&lt;/p&gt;

&lt;p&gt;A machine learning model converts text into an embedding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Linux kernel performance tuning"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;becomes something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[0.012, -0.431, 0.227, 0.091, ...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That vector may have hundreds or thousands of dimensions depending on the embedding model.&lt;/p&gt;

&lt;p&gt;The same thing happens to every document in your database.&lt;/p&gt;

&lt;p&gt;When a user sends a query, the query is also converted into a vector. Then the search engine tries to find the nearest vectors.&lt;/p&gt;

&lt;p&gt;The simplest way to think about this is distance.&lt;/p&gt;

&lt;p&gt;For example, Euclidean distance between two vectors can be represented as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;d(p, q) = sqrt(sum((q_i - p_i)^2))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Another common approach is cosine similarity, which compares the angle between two vectors.&lt;/p&gt;

&lt;p&gt;In real production systems, searching every vector one by one would be too slow at scale. If you have millions of documents, exact brute-force search is usually not practical.&lt;/p&gt;

&lt;p&gt;That is why vector databases use &lt;strong&gt;Approximate Nearest Neighbor&lt;/strong&gt; algorithms, usually called ANN.&lt;/p&gt;

&lt;p&gt;ANN algorithms trade a tiny amount of perfect accuracy for a massive improvement in speed.&lt;/p&gt;

&lt;p&gt;One of the most popular ANN algorithms is &lt;strong&gt;HNSW&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  HNSW: the graph behind fast vector search
&lt;/h2&gt;

&lt;p&gt;Weaviate uses HNSW, which stands for &lt;strong&gt;Hierarchical Navigable Small World&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can think of HNSW as a graph structure built on top of your vectors.&lt;/p&gt;

&lt;p&gt;Instead of scanning every vector, the engine navigates through a graph to quickly move toward the nearest neighbors.&lt;/p&gt;

&lt;p&gt;A simplified mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Similar vectors are connected.&lt;/li&gt;
&lt;li&gt;The graph has multiple layers.&lt;/li&gt;
&lt;li&gt;Higher layers allow faster long-distance jumps.&lt;/li&gt;
&lt;li&gt;Lower layers refine the search near the best candidates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes search much faster than brute force while still returning high-quality results.&lt;/p&gt;

&lt;p&gt;For large-scale semantic search, this matters a lot.&lt;/p&gt;

&lt;p&gt;A vector database is not only about storing embeddings. The real value is in how efficiently it can index, traverse, filter, and rank them under load.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hybrid search: why score fusion is harder than it looks
&lt;/h2&gt;

&lt;p&gt;The hardest part of hybrid search is not running BM25.&lt;/p&gt;

&lt;p&gt;It is not running vector search either.&lt;/p&gt;

&lt;p&gt;The hard part is combining the results correctly.&lt;/p&gt;

&lt;p&gt;BM25 and vector search produce different types of scores.&lt;/p&gt;

&lt;p&gt;BM25 is based on term frequency, inverse document frequency, field length, and lexical relevance.&lt;/p&gt;

&lt;p&gt;Vector similarity is based on distance or similarity in embedding space.&lt;/p&gt;

&lt;p&gt;These scores do not mean the same thing.&lt;/p&gt;

&lt;p&gt;So instead of directly adding scores together, a better strategy is often to combine rankings.&lt;/p&gt;

&lt;p&gt;This is where rank fusion becomes useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reciprocal Rank Fusion: a better way to combine results
&lt;/h2&gt;

&lt;p&gt;One common technique for hybrid ranking is &lt;strong&gt;Reciprocal Rank Fusion&lt;/strong&gt;, usually shortened to RRF.&lt;/p&gt;

&lt;p&gt;Instead of saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This document has a BM25 score of 30 and a vector score of 0.88, so let’s add them.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;RRF says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Where did this document rank in the keyword result list, and where did it rank in the vector result list?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then it combines ranks.&lt;/p&gt;

&lt;p&gt;A simplified formula looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RRF_score = 1 / (k + rank_keyword) + 1 / (k + rank_vector)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact implementation details can vary, but the idea is powerful: ranking position becomes more important than raw score scale.&lt;/p&gt;

&lt;p&gt;This avoids many of the problems caused by incompatible scoring systems.&lt;/p&gt;

&lt;p&gt;In Weaviate, hybrid search also exposes an &lt;code&gt;alpha&lt;/code&gt; parameter, which lets you control the balance between keyword and vector search.&lt;/p&gt;

&lt;p&gt;Conceptually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;alpha = 0.0  -&amp;gt; keyword-focused search
alpha = 1.0  -&amp;gt; vector-focused search
alpha = 0.5  -&amp;gt; balanced hybrid search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is extremely useful in real systems.&lt;/p&gt;

&lt;p&gt;Different product areas may need different search behavior.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documentation search may need more semantic matching.&lt;/li&gt;
&lt;li&gt;SKU or product-code search may need stronger keyword matching.&lt;/li&gt;
&lt;li&gt;Support ticket search may need a balance of both.&lt;/li&gt;
&lt;li&gt;Log or error search may need exact matching for IDs and stack traces.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Being able to tune this without rewriting the whole scoring system is a big win.&lt;/p&gt;




&lt;h2&gt;
  
  
  Example: hybrid search with the Go client
&lt;/h2&gt;

&lt;p&gt;Here is a simplified example using the Weaviate Go client.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/weaviate/weaviate-go-client/v5/weaviate"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/weaviate/weaviate-go-client/v5/weaviate/graphql"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;weaviate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Host&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;"localhost:8080"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Scheme&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"http"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;weaviate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GraphQL&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;WithClassName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Article"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;WithFields&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;graphql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;graphql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;graphql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"_additional"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Fields&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;graphql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;WithHybrid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GraphQL&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HybridArgumentBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
            &lt;span class="n"&gt;WithQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"best practices for linux kernel tuning"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
            &lt;span class="n"&gt;WithAlpha&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;WithLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%+v&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, &lt;code&gt;alpha = 0.7&lt;/code&gt; means the query is more semantic than lexical, but keyword matching still contributes to the final ranking.&lt;/p&gt;

&lt;p&gt;That is exactly the kind of control I wanted.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production benchmark: what changed after migration
&lt;/h2&gt;

&lt;p&gt;In one internal test, I used a dataset of around 1.5 million text documents, including articles, technical notes, and internal documentation.&lt;/p&gt;

&lt;p&gt;The goal was not to create a perfect academic benchmark. The goal was to compare the behavior of the previous implementation with the new architecture under similar load.&lt;/p&gt;

&lt;p&gt;The difference was significant.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Search Type&lt;/th&gt;
&lt;th&gt;Fusion Strategy&lt;/th&gt;
&lt;th&gt;Approx. p99 Latency&lt;/th&gt;
&lt;th&gt;CPU Behavior Under Load&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;Hybrid&lt;/td&gt;
&lt;td&gt;Custom query-time scoring&lt;/td&gt;
&lt;td&gt;~850 ms&lt;/td&gt;
&lt;td&gt;Frequent CPU spikes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weaviate&lt;/td&gt;
&lt;td&gt;Hybrid&lt;/td&gt;
&lt;td&gt;Native hybrid ranking&lt;/td&gt;
&lt;td&gt;~45 ms&lt;/td&gt;
&lt;td&gt;Much more stable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The exact numbers will depend on hardware, schema, vector dimensions, indexing configuration, filters, and query patterns.&lt;/p&gt;

&lt;p&gt;But the important part was not only the latency improvement.&lt;/p&gt;

&lt;p&gt;The bigger win was operational simplicity.&lt;/p&gt;

&lt;p&gt;After the migration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ranking logic became easier to reason about,&lt;/li&gt;
&lt;li&gt;query latency became more predictable,&lt;/li&gt;
&lt;li&gt;CPU usage was more stable,&lt;/li&gt;
&lt;li&gt;scaling was easier,&lt;/li&gt;
&lt;li&gt;and experiments with different &lt;code&gt;alpha&lt;/code&gt; values became much safer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters a lot in production.&lt;/p&gt;

&lt;p&gt;Performance is not only about the fastest possible average response time.&lt;/p&gt;

&lt;p&gt;It is also about predictability.&lt;/p&gt;

&lt;p&gt;A search system that is fast 90% of the time but unstable at p99 can still hurt the user experience badly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The hidden challenge: evaluating search quality
&lt;/h2&gt;

&lt;p&gt;Performance is only half of the story.&lt;/p&gt;

&lt;p&gt;A fast search engine that returns bad results is still a bad search engine.&lt;/p&gt;

&lt;p&gt;After moving to vector or hybrid search, one of the most important questions becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do we know the new search is actually better?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This gets especially important when changing embedding models.&lt;/p&gt;

&lt;p&gt;For example, imagine moving from one embedding model to a newer one.&lt;/p&gt;

&lt;p&gt;The new model may produce better vectors.&lt;/p&gt;

&lt;p&gt;Or it may be worse for your specific domain.&lt;/p&gt;

&lt;p&gt;You cannot rely only on intuition.&lt;/p&gt;

&lt;p&gt;You need evaluation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Metric 1: Mean Reciprocal Rank
&lt;/h2&gt;

&lt;p&gt;Mean Reciprocal Rank, or MRR, measures how early the first relevant result appears.&lt;/p&gt;

&lt;p&gt;If the correct result is ranked first, the reciprocal rank is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 / 1 = 1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it appears in position 5:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 / 5 = 0.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you average this across many test queries.&lt;/p&gt;

&lt;p&gt;MRR is useful when the user usually needs one best answer.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"How do I configure cgroups v2 memory limits?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the best document is result number one, the search is doing well.&lt;/p&gt;

&lt;p&gt;If the best document is buried on page three, the user experience is poor.&lt;/p&gt;




&lt;h2&gt;
  
  
  Metric 2: Recall@K
&lt;/h2&gt;

&lt;p&gt;Recall@K answers a different question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Out of all relevant documents, how many did we return in the top K results?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example, Recall@10 checks how many relevant documents appear in the first 10 results.&lt;/p&gt;

&lt;p&gt;This is useful when there may be multiple correct results.&lt;/p&gt;

&lt;p&gt;For documentation, research, support tickets, or internal knowledge bases, Recall@K can be more useful than only checking the top result.&lt;/p&gt;




&lt;h2&gt;
  
  
  Metric 3: NDCG
&lt;/h2&gt;

&lt;p&gt;NDCG stands for &lt;strong&gt;Normalized Discounted Cumulative Gain&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It is useful when relevance is not binary.&lt;/p&gt;

&lt;p&gt;A result can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;perfect,&lt;/li&gt;
&lt;li&gt;good,&lt;/li&gt;
&lt;li&gt;somewhat related,&lt;/li&gt;
&lt;li&gt;or irrelevant.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NDCG rewards highly relevant documents appearing near the top of the result list.&lt;/p&gt;

&lt;p&gt;This is closer to how real users experience search.&lt;/p&gt;

&lt;p&gt;A search result ranked #1 matters more than a result ranked #9.&lt;/p&gt;




&lt;h2&gt;
  
  
  Metric 4: visual inspection with UMAP or t-SNE
&lt;/h2&gt;

&lt;p&gt;Metrics are important, but visual inspection can also help.&lt;/p&gt;

&lt;p&gt;One useful technique is to export a sample of your embeddings and reduce them to two dimensions using UMAP or t-SNE.&lt;/p&gt;

&lt;p&gt;Then you can plot them and inspect whether similar documents form clear clusters.&lt;/p&gt;

&lt;p&gt;For example, if you are indexing technical articles, you might expect clusters like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux kernel&lt;/li&gt;
&lt;li&gt;PostgreSQL&lt;/li&gt;
&lt;li&gt;Kubernetes&lt;/li&gt;
&lt;li&gt;Go concurrency&lt;/li&gt;
&lt;li&gt;distributed systems&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If everything is mixed together randomly, your embedding model may not be representing your domain well.&lt;/p&gt;

&lt;p&gt;This does not replace proper evaluation, but it can reveal issues quickly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical lessons from the migration
&lt;/h2&gt;

&lt;p&gt;Here are the biggest lessons I took from this project.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Do not treat vector search as just another field type
&lt;/h3&gt;

&lt;p&gt;Embeddings change the architecture of search.&lt;/p&gt;

&lt;p&gt;They are not just another column in your database.&lt;/p&gt;

&lt;p&gt;You need to think about indexing, memory, distance metrics, filtering, ranking, and evaluation.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Hybrid search is usually better than pure semantic search
&lt;/h3&gt;

&lt;p&gt;Pure semantic search can be impressive in demos, but production users often search with exact terms.&lt;/p&gt;

&lt;p&gt;They search for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;product codes,&lt;/li&gt;
&lt;li&gt;error messages,&lt;/li&gt;
&lt;li&gt;function names,&lt;/li&gt;
&lt;li&gt;ticket IDs,&lt;/li&gt;
&lt;li&gt;file names,&lt;/li&gt;
&lt;li&gt;database fields,&lt;/li&gt;
&lt;li&gt;framework names.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good system should not ignore those signals.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Score fusion deserves serious attention
&lt;/h3&gt;

&lt;p&gt;Combining BM25 and vector similarity incorrectly can make search worse.&lt;/p&gt;

&lt;p&gt;Rank fusion techniques are often safer than manually combining raw scores.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. p99 latency matters more than a beautiful demo
&lt;/h3&gt;

&lt;p&gt;A search demo with 100 documents is easy.&lt;/p&gt;

&lt;p&gt;A production system with millions of documents, filters, concurrent users, and unpredictable queries is different.&lt;/p&gt;

&lt;p&gt;Always test tail latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Measure quality before changing embedding models
&lt;/h3&gt;

&lt;p&gt;A newer model is not automatically better for your data.&lt;/p&gt;

&lt;p&gt;Build a small evaluation dataset with real queries and expected results.&lt;/p&gt;

&lt;p&gt;Even 50 to 100 carefully selected queries can reveal a lot.&lt;/p&gt;




&lt;h2&gt;
  
  
  When I would still use Elasticsearch
&lt;/h2&gt;

&lt;p&gt;This migration does not mean I would never use Elasticsearch again.&lt;/p&gt;

&lt;p&gt;I would still choose Elasticsearch for many use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;log search,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;analytics-heavy search,&lt;/li&gt;
&lt;li&gt;faceted search,&lt;/li&gt;
&lt;li&gt;complex aggregations,&lt;/li&gt;
&lt;li&gt;mature operational environments already built around Elastic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Elasticsearch is still a great tool.&lt;/p&gt;

&lt;p&gt;The point is not “Elasticsearch bad, Weaviate good.”&lt;/p&gt;

&lt;p&gt;The point is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Choose the system that matches the shape of your problem.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For my specific case, the problem was high-throughput hybrid semantic search.&lt;/p&gt;

&lt;p&gt;For that, Weaviate was a better fit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;The biggest engineering mistake is often not choosing a bad tool.&lt;/p&gt;

&lt;p&gt;It is choosing a good tool for the wrong job.&lt;/p&gt;

&lt;p&gt;Elasticsearch is a powerful search engine, but when I needed production-grade hybrid search with strong vector search behavior, custom score fusion became too expensive and too hard to maintain.&lt;/p&gt;

&lt;p&gt;Weaviate gave me a cleaner architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;native vector search,&lt;/li&gt;
&lt;li&gt;BM25 support,&lt;/li&gt;
&lt;li&gt;hybrid ranking,&lt;/li&gt;
&lt;li&gt;HNSW indexing,&lt;/li&gt;
&lt;li&gt;metadata filtering,&lt;/li&gt;
&lt;li&gt;and a much simpler path to experimentation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The migration improved latency, reduced operational complexity, and made ranking behavior easier to tune.&lt;/p&gt;

&lt;p&gt;But the most important lesson was deeper:&lt;/p&gt;

&lt;p&gt;Search quality is a product feature, not just an infrastructure feature.&lt;/p&gt;

&lt;p&gt;You need to measure it, tune it, and treat it as part of the user experience.&lt;/p&gt;

&lt;p&gt;If you are building search for technical documentation, e-commerce, internal knowledge bases, support systems, or AI-powered products, hybrid search is probably worth serious consideration.&lt;/p&gt;

&lt;p&gt;And if you are currently fighting custom score scripts, unstable p99 latency, and hard-to-debug ranking behavior, it may be time to ask whether your search engine is doing the job it was originally designed to do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;Have you implemented hybrid search in production?&lt;/p&gt;

&lt;p&gt;I would be interested to hear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;whether you used Elasticsearch, OpenSearch, Weaviate, Qdrant, Pinecone, or another system,&lt;/li&gt;
&lt;li&gt;how you handled score fusion,&lt;/li&gt;
&lt;li&gt;and how you measured search quality beyond just latency.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>search</category>
      <category>weaviate</category>
      <category>elasticsearch</category>
      <category>go</category>
    </item>
    <item>
      <title>Demystifying the Linux Page Cache: The Kernel Optimization Hiding Behind Every Fast I/O</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Tue, 02 Jun 2026 15:48:08 +0000</pubDate>
      <link>https://dev.to/amirsefati/demystifying-the-linux-page-cache-the-kernel-optimization-hiding-behind-every-fast-io-40ee</link>
      <guid>https://dev.to/amirsefati/demystifying-the-linux-page-cache-the-kernel-optimization-hiding-behind-every-fast-io-40ee</guid>
      <description>&lt;p&gt;For the last few months, I have been spending a lot of time reading and researching Linux kernel internals.&lt;/p&gt;

&lt;p&gt;Not just from the surface.&lt;/p&gt;

&lt;p&gt;I mean going deeper into how the kernel actually manages memory, file I/O, processes, namespaces, cgroups, filesystems, and all the invisible mechanisms that make our backend systems feel fast.&lt;/p&gt;

&lt;p&gt;As backend engineers, we usually talk about performance in terms of application code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;optimize the query&lt;/li&gt;
&lt;li&gt;add Redis&lt;/li&gt;
&lt;li&gt;reduce allocations&lt;/li&gt;
&lt;li&gt;use a better index&lt;/li&gt;
&lt;li&gt;improve concurrency&lt;/li&gt;
&lt;li&gt;tune the API response time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And all of those things matter.&lt;/p&gt;

&lt;p&gt;But after working on production systems for years, I have learned that sometimes the real bottleneck is not inside your application code.&lt;/p&gt;

&lt;p&gt;Sometimes the real story is happening below your code, inside the operating system.&lt;/p&gt;

&lt;p&gt;One of the most important examples of that is the &lt;strong&gt;Linux Page Cache&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It is one of those kernel features that quietly improves almost every backend system we run. It makes file reads faster, batches writes, reduces disk pressure, and gives us the illusion that storage is much faster than it actually is.&lt;/p&gt;

&lt;p&gt;But it also comes with trade-offs.&lt;/p&gt;

&lt;p&gt;And if you do not understand those trade-offs, you can easily misread performance numbers, misunderstand memory usage, or even risk losing data in the wrong failure scenario.&lt;/p&gt;

&lt;p&gt;This article is my attempt to explain the Linux Page Cache from a backend engineer’s point of view.&lt;/p&gt;

&lt;p&gt;Not as a kernel developer writing C inside &lt;code&gt;mm/filemap.c&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But as someone who builds systems on top of Linux and wants to understand what the kernel is really doing behind the scenes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Page Cache Exists
&lt;/h2&gt;

&lt;p&gt;The reason Page Cache exists is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disk is slow. RAM is fast.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Even with modern SSDs and NVMe drives, accessing persistent storage is still much slower than accessing memory.&lt;/p&gt;

&lt;p&gt;When our application reads from a file, the kernel could theoretically go to the disk every single time.&lt;/p&gt;

&lt;p&gt;But that would be extremely inefficient.&lt;/p&gt;

&lt;p&gt;So Linux uses available memory as a cache for file data.&lt;/p&gt;

&lt;p&gt;That cache is called the &lt;strong&gt;Page Cache&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When a process reads a file, Linux usually does not just copy data directly from disk to the application and forget about it.&lt;/p&gt;

&lt;p&gt;Instead, the kernel loads file data into memory pages and keeps those pages around.&lt;/p&gt;

&lt;p&gt;So the next time the same file data is requested, Linux can serve it directly from RAM instead of touching the disk again.&lt;/p&gt;

&lt;p&gt;That is the basic idea.&lt;/p&gt;

&lt;p&gt;But the impact is huge.&lt;/p&gt;

&lt;p&gt;Imagine a web server reading the same static files again and again.&lt;/p&gt;

&lt;p&gt;Or a database repeatedly touching the same data files.&lt;/p&gt;

&lt;p&gt;Or a log processor scanning files that were recently written.&lt;/p&gt;

&lt;p&gt;Without Page Cache, every access would become much more expensive.&lt;/p&gt;

&lt;p&gt;With Page Cache, many of those reads become memory-speed operations.&lt;/p&gt;




&lt;h2&gt;
  
  
  The First Important Idea: Linux Uses Free RAM Aggressively
&lt;/h2&gt;

&lt;p&gt;One thing that confuses many developers is Linux memory usage.&lt;/p&gt;

&lt;p&gt;You run &lt;code&gt;free -h&lt;/code&gt;, and it looks like most of your RAM is used.&lt;/p&gt;

&lt;p&gt;At first, it feels scary.&lt;/p&gt;

&lt;p&gt;But often, that memory is not “wasted” or permanently consumed by applications.&lt;/p&gt;

&lt;p&gt;A large part of it may be used by the kernel for cache.&lt;/p&gt;

&lt;p&gt;Linux has a very practical philosophy:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Empty RAM is wasted RAM.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So if memory is available, the kernel uses it to cache useful data.&lt;/p&gt;

&lt;p&gt;If an application later needs more memory, Linux can reclaim cache pages and give that memory back.&lt;/p&gt;

&lt;p&gt;This is why a Linux server may look like it is using a lot of RAM even when your applications are not actually consuming that much.&lt;/p&gt;

&lt;p&gt;The kernel is trying to help you.&lt;/p&gt;

&lt;p&gt;It is using memory to avoid unnecessary disk I/O.&lt;/p&gt;

&lt;p&gt;That is Page Cache doing its job.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Happens During a File Read?
&lt;/h2&gt;

&lt;p&gt;Let’s say your application calls &lt;code&gt;read()&lt;/code&gt; on a file.&lt;/p&gt;

&lt;p&gt;At a high level, Linux checks whether the requested file data already exists in the Page Cache.&lt;/p&gt;

&lt;p&gt;There are two possible outcomes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache Hit
&lt;/h3&gt;

&lt;p&gt;If the data is already in memory, the kernel can copy it to your application without going to disk.&lt;/p&gt;

&lt;p&gt;This is fast.&lt;/p&gt;

&lt;p&gt;Very fast compared to storage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache Miss
&lt;/h3&gt;

&lt;p&gt;If the data is not in memory, Linux has to read it from disk.&lt;/p&gt;

&lt;p&gt;But after reading it, the kernel stores it in the Page Cache.&lt;/p&gt;

&lt;p&gt;So the first read may be slower, but future reads can be much faster.&lt;/p&gt;

&lt;p&gt;This is one reason benchmarks can be misleading.&lt;/p&gt;

&lt;p&gt;If you run the same file-read benchmark twice, the second run may look much faster because the data is already cached.&lt;/p&gt;

&lt;p&gt;Your application did not magically become better.&lt;/p&gt;

&lt;p&gt;The kernel just remembered the file data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Kernel Also Predicts What You Might Read Next
&lt;/h2&gt;

&lt;p&gt;Linux does not only cache what you already read.&lt;/p&gt;

&lt;p&gt;It also tries to predict what you are going to read.&lt;/p&gt;

&lt;p&gt;For sequential file access, the kernel may perform &lt;strong&gt;readahead&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That means when you read one part of a file, Linux may load the next parts into memory before you explicitly ask for them.&lt;/p&gt;

&lt;p&gt;This is extremely useful for workloads like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reading large logs&lt;/li&gt;
&lt;li&gt;streaming files&lt;/li&gt;
&lt;li&gt;scanning datasets&lt;/li&gt;
&lt;li&gt;serving static assets&lt;/li&gt;
&lt;li&gt;processing backups&lt;/li&gt;
&lt;li&gt;importing CSV files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From the application’s point of view, it may feel like the disk is faster than expected.&lt;/p&gt;

&lt;p&gt;But in reality, the kernel is doing smart work in the background.&lt;/p&gt;

&lt;p&gt;It sees a pattern and tries to stay ahead of your application.&lt;/p&gt;




&lt;h2&gt;
  
  
  Writes Are Even More Interesting
&lt;/h2&gt;

&lt;p&gt;Reads are easy to understand:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If data is cached, serve it from RAM.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Writes are more subtle.&lt;/p&gt;

&lt;p&gt;When your application writes data to a file, Linux usually does not immediately force that data to physical storage.&lt;/p&gt;

&lt;p&gt;Instead, the kernel writes the data into memory and marks the affected pages as &lt;strong&gt;dirty&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A dirty page means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This page has been modified in memory, but the change has not necessarily been persisted to disk yet.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is where the kernel gives your application a very useful illusion.&lt;/p&gt;

&lt;p&gt;Your application calls &lt;code&gt;write()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The kernel accepts the data.&lt;/p&gt;

&lt;p&gt;The call returns successfully.&lt;/p&gt;

&lt;p&gt;But that does not always mean the data is already safely stored on disk.&lt;/p&gt;

&lt;p&gt;It may only mean the data is now in the kernel’s memory and scheduled to be written later.&lt;/p&gt;

&lt;p&gt;This is one of the biggest performance tricks in the operating system.&lt;/p&gt;

&lt;p&gt;And also one of the most important trade-offs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Delayed Writes Are So Powerful
&lt;/h2&gt;

&lt;p&gt;Imagine an application appending tiny log entries to a file thousands of times per second.&lt;/p&gt;

&lt;p&gt;If Linux forced every small write to disk immediately, performance would be terrible.&lt;/p&gt;

&lt;p&gt;Storage devices are much better at handling larger, sequential writes than many tiny random writes.&lt;/p&gt;

&lt;p&gt;So Linux delays and combines writes.&lt;/p&gt;

&lt;p&gt;Many small writes can be collected in memory and later flushed to disk in a more efficient way.&lt;/p&gt;

&lt;p&gt;This process is called &lt;strong&gt;writeback&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The kernel has background mechanisms that periodically flush dirty pages to storage.&lt;/p&gt;

&lt;p&gt;This gives us much better throughput.&lt;/p&gt;

&lt;p&gt;Instead of turning every &lt;code&gt;write()&lt;/code&gt; call into an expensive physical I/O operation, Linux turns many small writes into fewer, larger writes.&lt;/p&gt;

&lt;p&gt;That is a huge win for performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Dangerous Part: &lt;code&gt;write()&lt;/code&gt; Is Not &lt;code&gt;fsync()&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;This is where many bugs and misunderstandings happen.&lt;/p&gt;

&lt;p&gt;A successful &lt;code&gt;write()&lt;/code&gt; does not always mean your data is durable.&lt;/p&gt;

&lt;p&gt;It usually means the kernel accepted your data.&lt;/p&gt;

&lt;p&gt;If the machine loses power before dirty pages are flushed, some recently written data may be lost.&lt;/p&gt;

&lt;p&gt;That is why databases, queues, and storage engines care so much about &lt;code&gt;fsync()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;When durability matters, you need to force the kernel to flush data to stable storage.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;fsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;write()&lt;/code&gt; says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Please accept this data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;fsync()&lt;/code&gt; says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Now make sure it is actually persisted.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This distinction is extremely important when building systems where data loss is not acceptable.&lt;/p&gt;

&lt;p&gt;For normal logs, delayed writeback may be fine.&lt;/p&gt;

&lt;p&gt;For a financial transaction, it is not something you can ignore.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Databases Behave Differently
&lt;/h2&gt;

&lt;p&gt;Databases like PostgreSQL, MySQL, RocksDB, and others are very careful with disk I/O.&lt;/p&gt;

&lt;p&gt;They know the kernel is caching data.&lt;/p&gt;

&lt;p&gt;They know writes may be delayed.&lt;/p&gt;

&lt;p&gt;They know crashes can happen.&lt;/p&gt;

&lt;p&gt;So they use techniques like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;write-ahead logging&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fsync&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;checkpoints&lt;/li&gt;
&lt;li&gt;direct I/O in some configurations&lt;/li&gt;
&lt;li&gt;controlled flushing&lt;/li&gt;
&lt;li&gt;careful ordering of writes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A database cannot simply trust that because &lt;code&gt;write()&lt;/code&gt; returned successfully, everything is safe.&lt;/p&gt;

&lt;p&gt;It needs stronger guarantees.&lt;/p&gt;

&lt;p&gt;This is also why database performance tuning is complicated.&lt;/p&gt;

&lt;p&gt;You are not only tuning SQL queries.&lt;/p&gt;

&lt;p&gt;You are tuning the interaction between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;application&lt;/li&gt;
&lt;li&gt;database engine&lt;/li&gt;
&lt;li&gt;filesystem&lt;/li&gt;
&lt;li&gt;Linux kernel&lt;/li&gt;
&lt;li&gt;Page Cache&lt;/li&gt;
&lt;li&gt;storage device&lt;/li&gt;
&lt;li&gt;cloud virtualization layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That stack is deep.&lt;/p&gt;

&lt;p&gt;And Page Cache sits right in the middle of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Page Cache and &lt;code&gt;mmap&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Another important part of this story is &lt;code&gt;mmap&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;With normal file I/O, you call &lt;code&gt;read()&lt;/code&gt; and &lt;code&gt;write()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;mmap&lt;/code&gt;, a file can be mapped into a process’s virtual memory.&lt;/p&gt;

&lt;p&gt;Then the application can access file data almost like normal memory.&lt;/p&gt;

&lt;p&gt;But behind the scenes, Page Cache is still involved.&lt;/p&gt;

&lt;p&gt;When the process touches a memory-mapped page, the kernel may load the corresponding file data into the Page Cache.&lt;/p&gt;

&lt;p&gt;This is powerful because it can reduce copying and make file access feel very natural from the application side.&lt;/p&gt;

&lt;p&gt;But it also means that memory-mapped I/O is deeply connected to the kernel’s virtual memory system.&lt;/p&gt;

&lt;p&gt;This is where the boundary between “file” and “memory” becomes very thin.&lt;/p&gt;

&lt;p&gt;And that is one of the reasons Linux I/O is such a fascinating topic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Page Cache Can Make Benchmarks Lie
&lt;/h2&gt;

&lt;p&gt;When I started looking deeper into kernel internals, one of the first practical lessons was this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Never trust a file I/O benchmark unless you understand the cache state.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, if you benchmark reading a large file once, the first run may include real disk I/O.&lt;/p&gt;

&lt;p&gt;The second run may mostly hit Page Cache.&lt;/p&gt;

&lt;p&gt;So it looks much faster.&lt;/p&gt;

&lt;p&gt;But that does not necessarily mean your code improved.&lt;/p&gt;

&lt;p&gt;It may only mean the file is already cached.&lt;/p&gt;

&lt;p&gt;This matters when testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;file parsers&lt;/li&gt;
&lt;li&gt;log processors&lt;/li&gt;
&lt;li&gt;database imports&lt;/li&gt;
&lt;li&gt;backup tools&lt;/li&gt;
&lt;li&gt;search indexing jobs&lt;/li&gt;
&lt;li&gt;media processing pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to test cold disk performance, you need to be intentional.&lt;/p&gt;

&lt;p&gt;If you want to test warm cache performance, that is also valid.&lt;/p&gt;

&lt;p&gt;But you should know which one you are measuring.&lt;/p&gt;

&lt;p&gt;Otherwise, you are not benchmarking your application.&lt;/p&gt;

&lt;p&gt;You are benchmarking a combination of your application and the kernel’s memory state.&lt;/p&gt;




&lt;h2&gt;
  
  
  Page Cache Can Also Hurt You
&lt;/h2&gt;

&lt;p&gt;Page Cache is usually helpful.&lt;/p&gt;

&lt;p&gt;But not always.&lt;/p&gt;

&lt;p&gt;There are cases where it can create problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Cache Pollution
&lt;/h3&gt;

&lt;p&gt;Imagine a server running a database.&lt;/p&gt;

&lt;p&gt;Most of the time, the database benefits from hot data staying in memory.&lt;/p&gt;

&lt;p&gt;Now imagine a backup process reads a huge 500GB file sequentially.&lt;/p&gt;

&lt;p&gt;That large read can fill the Page Cache with data that may never be used again.&lt;/p&gt;

&lt;p&gt;As a result, more important cached pages may be evicted.&lt;/p&gt;

&lt;p&gt;This is called cache pollution.&lt;/p&gt;

&lt;p&gt;The kernel tries to manage this intelligently, but no heuristic is perfect.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Memory Pressure
&lt;/h3&gt;

&lt;p&gt;Because Page Cache uses RAM, it competes with applications for memory.&lt;/p&gt;

&lt;p&gt;Usually Linux can reclaim cache pages when needed.&lt;/p&gt;

&lt;p&gt;But under heavy memory pressure, the system may start doing more reclaim work, causing latency spikes.&lt;/p&gt;

&lt;p&gt;For backend services, this can show up as random performance drops.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Dirty Page Spikes
&lt;/h3&gt;

&lt;p&gt;If an application writes faster than storage can flush, dirty pages can accumulate.&lt;/p&gt;

&lt;p&gt;At some point, Linux may throttle writers to prevent memory from being overwhelmed by dirty data.&lt;/p&gt;

&lt;p&gt;From the application’s point of view, write latency may suddenly increase.&lt;/p&gt;

&lt;p&gt;This is not because your code changed.&lt;/p&gt;

&lt;p&gt;It is because the kernel is protecting the system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Useful Commands to Observe Page Cache Behavior
&lt;/h2&gt;

&lt;p&gt;You do not need to be a kernel developer to observe some of this behavior.&lt;/p&gt;

&lt;p&gt;Linux exposes useful information through &lt;code&gt;/proc&lt;/code&gt; and common tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Check memory usage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;free &lt;span class="nt"&gt;-h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look at the &lt;code&gt;buff/cache&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;That is where you often see memory used for kernel buffers and cache.&lt;/p&gt;

&lt;h3&gt;
  
  
  Check dirty pages
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /proc/meminfo | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"Dirty|Writeback"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Dirty:              123456 kB
Writeback:             0 kB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Dirty&lt;/code&gt; shows memory waiting to be written back.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Writeback&lt;/code&gt; shows memory currently being written.&lt;/p&gt;

&lt;h3&gt;
  
  
  Watch I/O activity
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;iostat &lt;span class="nt"&gt;-xz&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps you see disk utilization, await time, and whether the storage device is under pressure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Watch process I/O
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pidstat &lt;span class="nt"&gt;-d&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps you understand which processes are doing reads and writes.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Simple Experiment
&lt;/h2&gt;

&lt;p&gt;You can see Page Cache behavior with a simple test.&lt;/p&gt;

&lt;p&gt;Create a large file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;dd &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/zero &lt;span class="nv"&gt;of&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;testfile &lt;span class="nv"&gt;bs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1M &lt;span class="nv"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1024
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now read it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;time cat &lt;/span&gt;testfile &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the same command again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;time cat &lt;/span&gt;testfile &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second run may be faster because the file data is already in Page Cache.&lt;/p&gt;

&lt;p&gt;Now, depending on your system and permissions, you can drop caches for testing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sync
echo &lt;/span&gt;3 | &lt;span class="nb"&gt;sudo tee&lt;/span&gt; /proc/sys/vm/drop_caches
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then read again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;time cat &lt;/span&gt;testfile &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important note:&lt;/p&gt;

&lt;p&gt;Do not randomly drop caches on production servers.&lt;/p&gt;

&lt;p&gt;This is only for controlled testing.&lt;/p&gt;




&lt;h2&gt;
  
  
  How This Changed the Way I Think About Backend Performance
&lt;/h2&gt;

&lt;p&gt;The more I study Linux internals, the more I realize that backend engineering is not only about writing application code.&lt;/p&gt;

&lt;p&gt;A production system is a conversation between many layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your code&lt;/li&gt;
&lt;li&gt;runtime&lt;/li&gt;
&lt;li&gt;memory allocator&lt;/li&gt;
&lt;li&gt;database&lt;/li&gt;
&lt;li&gt;filesystem&lt;/li&gt;
&lt;li&gt;kernel&lt;/li&gt;
&lt;li&gt;storage&lt;/li&gt;
&lt;li&gt;network&lt;/li&gt;
&lt;li&gt;container runtime&lt;/li&gt;
&lt;li&gt;orchestration platform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you only look at your code, you may miss the real reason behind a performance issue.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An API becomes slow because disk writeback is saturated.&lt;/li&gt;
&lt;li&gt;A database has latency spikes because dirty pages are being flushed.&lt;/li&gt;
&lt;li&gt;A benchmark looks amazing because everything is cached.&lt;/li&gt;
&lt;li&gt;A log processor slows down because it is causing cache pollution.&lt;/li&gt;
&lt;li&gt;A container looks memory-heavy because the host is using RAM for cache.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding Page Cache gives you better intuition.&lt;/p&gt;

&lt;p&gt;It helps you ask better questions.&lt;/p&gt;

&lt;p&gt;It helps you debug production issues with more confidence.&lt;/p&gt;

&lt;p&gt;And it reminds you that the operating system is not just a passive layer.&lt;/p&gt;

&lt;p&gt;The kernel is constantly making decisions on your behalf.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Lessons for Backend Engineers
&lt;/h2&gt;

&lt;p&gt;Here are the lessons I keep in mind now:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Do not panic when Linux uses RAM
&lt;/h3&gt;

&lt;p&gt;High memory usage is not always bad.&lt;/p&gt;

&lt;p&gt;Check whether memory is used by applications or by cache.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Benchmark carefully
&lt;/h3&gt;

&lt;p&gt;Always understand whether your benchmark is testing cold reads or cached reads.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;write()&lt;/code&gt; is not durability
&lt;/h3&gt;

&lt;p&gt;If data must survive a crash, understand &lt;code&gt;fsync()&lt;/code&gt; and the durability model of your storage system.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Watch dirty pages
&lt;/h3&gt;

&lt;p&gt;Dirty pages can explain sudden write latency spikes.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Be careful with large sequential jobs
&lt;/h3&gt;

&lt;p&gt;Backups, imports, and scans can affect cache behavior for other workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Learn the kernel slowly
&lt;/h3&gt;

&lt;p&gt;You do not need to understand everything at once.&lt;/p&gt;

&lt;p&gt;But each concept you learn gives you better production intuition.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The Linux Page Cache is one of the most important performance features in the operating system.&lt;/p&gt;

&lt;p&gt;It hides disk latency.&lt;/p&gt;

&lt;p&gt;It makes repeated reads fast.&lt;/p&gt;

&lt;p&gt;It batches writes.&lt;/p&gt;

&lt;p&gt;It uses free memory intelligently.&lt;/p&gt;

&lt;p&gt;And it does all of this quietly, behind almost every backend system we deploy.&lt;/p&gt;

&lt;p&gt;But like every powerful abstraction, it has trade-offs.&lt;/p&gt;

&lt;p&gt;It can make benchmarks misleading.&lt;/p&gt;

&lt;p&gt;It can delay durability.&lt;/p&gt;

&lt;p&gt;It can create latency spikes under write pressure.&lt;/p&gt;

&lt;p&gt;It can affect databases, log processors, and large file workloads in unexpected ways.&lt;/p&gt;

&lt;p&gt;For me, studying the Page Cache is part of a bigger journey: understanding Linux not just as a server environment, but as a complex engineering system.&lt;/p&gt;

&lt;p&gt;The deeper I go into the kernel, the more respect I have for the invisible work it does every second.&lt;/p&gt;

&lt;p&gt;And the more I believe that strong backend engineers should not stop at frameworks, databases, and APIs.&lt;/p&gt;

&lt;p&gt;At some point, we also need to understand the machine underneath.&lt;/p&gt;

&lt;p&gt;Because sometimes the bug is not in your code.&lt;/p&gt;

&lt;p&gt;Sometimes the answer is in the kernel.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>kernel</category>
      <category>backend</category>
      <category>performance</category>
    </item>
    <item>
      <title>The 1978 Paper Behind Go’s Concurrency Model</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Sun, 31 May 2026 11:31:35 +0000</pubDate>
      <link>https://dev.to/amirsefati/the-1978-paper-behind-gos-concurrency-model-4n55</link>
      <guid>https://dev.to/amirsefati/the-1978-paper-behind-gos-concurrency-model-4n55</guid>
      <description>&lt;p&gt;Every Go developer eventually hears this sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not communicate by sharing memory; share memory by communicating.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At first, it sounds like a nice Go proverb.&lt;/p&gt;

&lt;p&gt;But the more concurrent systems you build, the more you realize it is not just a slogan. It is a completely different way of thinking about software design.&lt;/p&gt;

&lt;p&gt;When I first started working deeply with Go concurrency, I mostly thought about goroutines as “lightweight threads” and channels as “safe queues.” That mental model is useful at the beginning, but it is not the full story.&lt;/p&gt;

&lt;p&gt;The real story goes much deeper.&lt;/p&gt;

&lt;p&gt;Go’s concurrency model is heavily inspired by a paper from 1978: &lt;strong&gt;Communicating Sequential Processes&lt;/strong&gt;, written by &lt;strong&gt;Tony Hoare&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article is my attempt to explain that idea in a practical way: why shared-memory concurrency becomes painful, what CSP was trying to solve, and how Go turned that theory into something we can use every day in production systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem: shared state looks simple until it does not
&lt;/h2&gt;

&lt;p&gt;Most concurrency problems start innocently.&lt;/p&gt;

&lt;p&gt;You have multiple workers. They need access to the same data. So you put the data in memory and let the workers read and write it.&lt;/p&gt;

&lt;p&gt;Something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then concurrency enters the system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the code is broken.&lt;/p&gt;

&lt;p&gt;Multiple goroutines can read and write &lt;code&gt;value&lt;/code&gt; at the same time. The final result becomes unpredictable. So we add a mutex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mu&lt;/span&gt;    &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mutex&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This fixes the data race.&lt;/p&gt;

&lt;p&gt;But this is where the next problem starts.&lt;/p&gt;

&lt;p&gt;Mutexes are not bad. They are necessary in many real systems. The issue is that shared state plus locks can easily become the default architecture.&lt;/p&gt;

&lt;p&gt;And once that happens, your system becomes harder to reason about.&lt;/p&gt;

&lt;p&gt;You start asking questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who owns this data?&lt;/li&gt;
&lt;li&gt;Which goroutine is allowed to update it?&lt;/li&gt;
&lt;li&gt;How long is this lock held?&lt;/li&gt;
&lt;li&gt;Can this function call another function that also needs the same lock?&lt;/li&gt;
&lt;li&gt;Can this code deadlock under load?&lt;/li&gt;
&lt;li&gt;Why is p99 latency suddenly worse?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially painful in backend systems with high-throughput pipelines, network services, log processors, container runtimes, or in-memory systems where thousands of operations happen concurrently.&lt;/p&gt;

&lt;p&gt;The code may look safe because it has locks.&lt;/p&gt;

&lt;p&gt;But safe does not always mean simple.&lt;/p&gt;

&lt;p&gt;And simple is what keeps production systems maintainable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The hidden cost of “just add a mutex”
&lt;/h2&gt;

&lt;p&gt;A mutex protects memory, but it also serializes access.&lt;/p&gt;

&lt;p&gt;That means one slow operation inside a lock can block many other goroutines.&lt;/p&gt;

&lt;p&gt;I have seen patterns like this in real backend code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Test User"&lt;/span&gt;

&lt;span class="n"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;callExternalAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;writeAuditLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Technically, the shared data is protected.&lt;/p&gt;

&lt;p&gt;Architecturally, this is a problem.&lt;/p&gt;

&lt;p&gt;The lock is not only protecting the small mutation of &lt;code&gt;user.Name&lt;/code&gt;. It is now protecting email sending, external API calls, logging, and anything else that happens inside that critical section.&lt;/p&gt;

&lt;p&gt;That means every goroutine waiting for this lock must wait for the whole flow.&lt;/p&gt;

&lt;p&gt;The service may have goroutines. It may look concurrent. But a big part of the request path has silently become sequential.&lt;/p&gt;

&lt;p&gt;This is why concurrency bugs are not always about missing locks.&lt;/p&gt;

&lt;p&gt;Sometimes the problem is too much locking.&lt;/p&gt;

&lt;p&gt;Sometimes the real bug is ownership.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tony Hoare’s idea: stop sharing memory directly
&lt;/h2&gt;

&lt;p&gt;In 1978, Tony Hoare introduced &lt;strong&gt;Communicating Sequential Processes&lt;/strong&gt;, usually called CSP.&lt;/p&gt;

&lt;p&gt;The idea was very different from the classic shared-memory model.&lt;/p&gt;

&lt;p&gt;Instead of many processes fighting over the same memory, CSP describes a system as independent sequential processes that communicate by sending messages.&lt;/p&gt;

&lt;p&gt;The important parts are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Each process has its own local state.&lt;/li&gt;
&lt;li&gt;Processes do not casually share variables.&lt;/li&gt;
&lt;li&gt;When they need to coordinate, they communicate explicitly.&lt;/li&gt;
&lt;li&gt;Communication is part of the design, not an afterthought.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That sounds simple, but it changes how you design systems.&lt;/p&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which lock protects this data?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You start asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which goroutine owns this data?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a much more powerful question.&lt;/p&gt;

&lt;p&gt;Because if one goroutine owns the state, other goroutines do not need to mutate it directly. They send messages to the owner.&lt;/p&gt;

&lt;p&gt;This is the mental shift behind Go channels.&lt;/p&gt;




&lt;h2&gt;
  
  
  Go did not just add concurrency. It gave concurrency a shape.
&lt;/h2&gt;

&lt;p&gt;Go could have chosen the same path as many other languages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;threads&lt;/li&gt;
&lt;li&gt;locks&lt;/li&gt;
&lt;li&gt;shared memory&lt;/li&gt;
&lt;li&gt;concurrency libraries&lt;/li&gt;
&lt;li&gt;complex abstractions built on top&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But Go made a different design decision.&lt;/p&gt;

&lt;p&gt;It gave concurrency first-class language support with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;go&lt;/code&gt; for starting goroutines&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;chan&lt;/code&gt; for communication&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;select&lt;/code&gt; for coordinating multiple communication operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because concurrency is not treated as a library feature bolted onto the language.&lt;/p&gt;

&lt;p&gt;It is part of the language’s design.&lt;/p&gt;

&lt;p&gt;A goroutine is not exactly the same as a CSP process, and Go is not a pure CSP language. Go still allows shared memory, mutexes, atomics, and low-level synchronization when needed.&lt;/p&gt;

&lt;p&gt;But the spirit of CSP is clearly there:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Build independent units of execution and make them communicate explicitly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is why Go feels so natural for network servers, infrastructure tools, distributed systems, streaming pipelines, and cloud-native software.&lt;/p&gt;




&lt;h2&gt;
  
  
  A simple CSP-style pipeline in Go
&lt;/h2&gt;

&lt;p&gt;Let’s look at a practical example.&lt;/p&gt;

&lt;p&gt;Imagine a small pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate jobs.&lt;/li&gt;
&lt;li&gt;Process jobs.&lt;/li&gt;
&lt;li&gt;Return results.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A shared-memory mindset might create a global queue, protect it with a mutex, and let workers pull from it.&lt;/p&gt;

&lt;p&gt;A CSP-style mindset is different.&lt;/p&gt;

&lt;p&gt;The data flows through channels.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ID&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;JobID&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="k"&gt;chan&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="k"&gt;chan&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;JobID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"processed job %d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"job=%d result=%q&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JobID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no shared slice.&lt;br&gt;
There is no global queue.&lt;br&gt;
There is no explicit mutex.&lt;/p&gt;

&lt;p&gt;The producer owns job creation.&lt;br&gt;
The worker owns processing.&lt;br&gt;
The main goroutine owns result collection.&lt;/p&gt;

&lt;p&gt;The data moves through the system.&lt;/p&gt;

&lt;p&gt;That is the important part.&lt;/p&gt;

&lt;p&gt;We are not asking multiple goroutines to mutate the same object at the same time. We are designing a flow of ownership.&lt;/p&gt;


&lt;h2&gt;
  
  
  Scaling the pipeline with multiple workers
&lt;/h2&gt;

&lt;p&gt;Now let’s make it more realistic.&lt;/p&gt;

&lt;p&gt;A single worker is not enough for heavy workloads. We want multiple workers processing jobs concurrently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"sync"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ID&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;WorkerID&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;JobID&lt;/span&gt;    &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;Value&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="k"&gt;chan&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workerID&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;chan&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;WorkerID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;workerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;JobID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"processed job %d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;workerCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;jobCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;

    &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobCount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;workerCount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workerID&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"worker=%d job=%d result=%q&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WorkerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JobID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is one of the places where Go shines.&lt;/p&gt;

&lt;p&gt;The design is still readable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The producer sends jobs.&lt;/li&gt;
&lt;li&gt;Workers receive jobs.&lt;/li&gt;
&lt;li&gt;Workers send results.&lt;/li&gt;
&lt;li&gt;The result channel closes when all workers finish.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We still use &lt;code&gt;sync.WaitGroup&lt;/code&gt;, because Go is pragmatic. CSP-style design does not mean “never use the sync package.”&lt;/p&gt;

&lt;p&gt;It means we use synchronization intentionally.&lt;/p&gt;

&lt;p&gt;The channel handles data flow.&lt;br&gt;
The wait group handles lifecycle coordination.&lt;/p&gt;

&lt;p&gt;That separation is clean.&lt;/p&gt;


&lt;h2&gt;
  
  
  The real value: ownership becomes visible
&lt;/h2&gt;

&lt;p&gt;The biggest advantage of CSP-style design is not that it removes every mutex.&lt;/p&gt;

&lt;p&gt;The biggest advantage is that it makes ownership visible.&lt;/p&gt;

&lt;p&gt;In shared-memory systems, ownership is often hidden.&lt;/p&gt;

&lt;p&gt;You see a pointer passed around. You see a struct used in multiple places. You see a lock somewhere. But it is not always obvious who is responsible for the state.&lt;/p&gt;

&lt;p&gt;With channels, ownership is easier to see.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This line says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am sending this job to another part of the system.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And this line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am receiving this job and now I am responsible for processing it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That clarity matters.&lt;/p&gt;

&lt;p&gt;In production systems, many bugs are not caused by complex algorithms. They are caused by unclear ownership.&lt;/p&gt;

&lt;p&gt;Who closes this channel?&lt;br&gt;
Who updates this state?&lt;br&gt;
Who retries this job?&lt;br&gt;
Who owns cancellation?&lt;br&gt;
Who handles backpressure?&lt;/p&gt;

&lt;p&gt;A channel-based design forces you to answer these questions earlier.&lt;/p&gt;


&lt;h2&gt;
  
  
  Backpressure is built into the model
&lt;/h2&gt;

&lt;p&gt;Another underrated benefit of channels is backpressure.&lt;/p&gt;

&lt;p&gt;An unbuffered channel forces the sender and receiver to synchronize:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the producer sends faster than the worker receives, the producer blocks.&lt;/p&gt;

&lt;p&gt;That is not a bug. That is backpressure.&lt;/p&gt;

&lt;p&gt;Buffered channels allow some temporary queueing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the producer can get ahead by 100 jobs, but not forever.&lt;/p&gt;

&lt;p&gt;This is powerful in real systems.&lt;/p&gt;

&lt;p&gt;For example, imagine a log processing service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP requests receive logs.&lt;/li&gt;
&lt;li&gt;A parser normalizes them.&lt;/li&gt;
&lt;li&gt;A batcher writes them to storage.&lt;/li&gt;
&lt;li&gt;A separate worker sends alerts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without backpressure, one fast layer can overwhelm a slower layer.&lt;/p&gt;

&lt;p&gt;With channels, you can make pressure visible and controlled.&lt;/p&gt;

&lt;p&gt;You can decide where the system should block, buffer, drop, retry, or shed load.&lt;/p&gt;

&lt;p&gt;That is architecture, not just syntax.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where channels are a bad fit
&lt;/h2&gt;

&lt;p&gt;It is also important to be honest: channels are not magic.&lt;/p&gt;

&lt;p&gt;Not every concurrency problem becomes better with channels.&lt;/p&gt;

&lt;p&gt;Sometimes a mutex is simpler and faster.&lt;/p&gt;

&lt;p&gt;For example, this is perfectly reasonable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Metrics&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mu&lt;/span&gt;       &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mutex&lt;/span&gt;
    &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt;   &lt;span class="kt"&gt;int64&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Metrics&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;IncRequests&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For small, local, short-lived critical sections, a mutex is often the cleanest solution.&lt;/p&gt;

&lt;p&gt;Channels can become messy when they are used only because they feel “more Go-like.”&lt;/p&gt;

&lt;p&gt;Bad channel usage can create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;goroutine leaks&lt;/li&gt;
&lt;li&gt;unclear lifecycle&lt;/li&gt;
&lt;li&gt;blocked sends&lt;/li&gt;
&lt;li&gt;blocked receives&lt;/li&gt;
&lt;li&gt;complicated shutdown logic&lt;/li&gt;
&lt;li&gt;over-engineered code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Always use channels.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The point is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use channels when communication and ownership are the core problem.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Use mutexes when protecting a small piece of shared state is the core problem.&lt;/p&gt;

&lt;p&gt;Senior Go engineering is knowing the difference.&lt;/p&gt;




&lt;h2&gt;
  
  
  A practical rule I use
&lt;/h2&gt;

&lt;p&gt;When I design concurrent Go code, I usually ask myself:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Is this state owned by one goroutine?
&lt;/h3&gt;

&lt;p&gt;If yes, channels can be a great fit. Other goroutines can send commands or data to the owner.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Is this just a small shared counter or cache?
&lt;/h3&gt;

&lt;p&gt;A mutex or atomic may be better.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Do I need backpressure?
&lt;/h3&gt;

&lt;p&gt;Channels make this easier to model.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Do I need cancellation?
&lt;/h3&gt;

&lt;p&gt;Then I design the channel flow together with &lt;code&gt;context.Context&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Who closes the channel?
&lt;/h3&gt;

&lt;p&gt;If I cannot answer this clearly, the design is not finished.&lt;/p&gt;

&lt;p&gt;That last question is very important.&lt;/p&gt;

&lt;p&gt;Channel ownership is not only about sending data. It is also about lifecycle.&lt;/p&gt;

&lt;p&gt;Usually, the sender closes the channel. Receivers should not close a channel they do not own.&lt;/p&gt;




&lt;h2&gt;
  
  
  CSP thinking in modern backend systems
&lt;/h2&gt;

&lt;p&gt;The reason I like CSP is that it maps well to real backend architecture.&lt;/p&gt;

&lt;p&gt;A backend service is not just functions calling functions.&lt;/p&gt;

&lt;p&gt;It is a system of flows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;requests flow into handlers&lt;/li&gt;
&lt;li&gt;jobs flow into queues&lt;/li&gt;
&lt;li&gt;events flow into processors&lt;/li&gt;
&lt;li&gt;logs flow into pipelines&lt;/li&gt;
&lt;li&gt;metrics flow into aggregators&lt;/li&gt;
&lt;li&gt;commands flow into state owners&lt;/li&gt;
&lt;li&gt;results flow back to clients&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you think in flows, Go becomes very natural.&lt;/p&gt;

&lt;p&gt;This is also why Go became popular in infrastructure software. Tools like Docker, Kubernetes, and Terraform are written in Go not only because Go compiles to a static binary, but also because its concurrency model fits the kind of problems infrastructure software needs to solve.&lt;/p&gt;

&lt;p&gt;Infrastructure software is full of concurrent work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;watching state&lt;/li&gt;
&lt;li&gt;reconciling resources&lt;/li&gt;
&lt;li&gt;handling network calls&lt;/li&gt;
&lt;li&gt;streaming logs&lt;/li&gt;
&lt;li&gt;scheduling tasks&lt;/li&gt;
&lt;li&gt;managing timeouts&lt;/li&gt;
&lt;li&gt;coordinating workers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CSP-style thinking gives these systems a clean structure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;Tony Hoare’s CSP paper is almost 50 years old, but the idea still feels modern.&lt;/p&gt;

&lt;p&gt;That is rare in software.&lt;/p&gt;

&lt;p&gt;Many technologies become outdated quickly, but good mental models survive.&lt;/p&gt;

&lt;p&gt;Go did not invent the idea of communicating sequential processes. But Go made the idea practical for everyday engineers.&lt;/p&gt;

&lt;p&gt;That is the beauty of Go’s concurrency model.&lt;/p&gt;

&lt;p&gt;It takes a deep computer science concept and gives us a small set of simple tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;chan&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tools are simple.&lt;/p&gt;

&lt;p&gt;The design thinking behind them is powerful.&lt;/p&gt;

&lt;p&gt;For me, the biggest lesson is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Concurrency becomes easier when we stop thinking only about shared state and start thinking about ownership, communication, and flow.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the real value of CSP in Go.&lt;/p&gt;

&lt;p&gt;Not just fewer mutexes.&lt;/p&gt;

&lt;p&gt;Better architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Tony Hoare — &lt;em&gt;Communicating Sequential Processes&lt;/em&gt; (1978)&lt;/li&gt;
&lt;li&gt;Rob Pike — &lt;em&gt;Go Concurrency Patterns&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Rob Pike — &lt;em&gt;Concurrency Is Not Parallelism&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Go Blog — &lt;em&gt;Share Memory By Communicating&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>concurrency</category>
      <category>architecture</category>
    </item>
    <item>
      <title>How sync.Pool Helped Me Stabilize p99 Latency in a High-Throughput Log Processing Pipeline</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Fri, 29 May 2026 09:48:11 +0000</pubDate>
      <link>https://dev.to/amirsefati/how-syncpool-helped-me-stabilize-p99-latency-in-a-high-throughput-log-processing-pipeline-1e00</link>
      <guid>https://dev.to/amirsefati/how-syncpool-helped-me-stabilize-p99-latency-in-a-high-throughput-log-processing-pipeline-1e00</guid>
      <description>&lt;p&gt;A while ago, I was working on a log processing system that was very similar in spirit to Sentry.&lt;/p&gt;

&lt;p&gt;The system was responsible for receiving events from different services, parsing JSON payloads, normalizing metadata, enriching logs with extra context, and then pushing the final events into storage and downstream queues.&lt;/p&gt;

&lt;p&gt;At first, everything looked fine.&lt;/p&gt;

&lt;p&gt;The code was clean. The architecture was simple. The throughput was acceptable in local testing.&lt;/p&gt;

&lt;p&gt;But when the traffic increased, something interesting happened:&lt;/p&gt;

&lt;p&gt;The average latency was still okay, but the p99 latency started to become unstable.&lt;/p&gt;

&lt;p&gt;That was the first sign that the bottleneck was not just CPU or database performance. Something deeper was happening inside the runtime.&lt;/p&gt;

&lt;p&gt;In this article, I want to explain how I found the problem, why the system became slower under load, and how using &lt;code&gt;sync.Pool&lt;/code&gt; helped reduce allocation pressure, make the garbage collector work less, and keep latency more stable.&lt;/p&gt;

&lt;p&gt;This is not a magical optimization. It is a very specific tool for a very specific kind of problem.&lt;/p&gt;

&lt;p&gt;But when you are building high-throughput systems, especially systems that process JSON, logs, network payloads, or temporary buffers, understanding object pooling can make a big difference.&lt;/p&gt;




&lt;h2&gt;
  
  
  The System I Was Building
&lt;/h2&gt;

&lt;p&gt;The service was a log ingestion pipeline.&lt;/p&gt;

&lt;p&gt;The flow looked something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client / SDK
   ↓
HTTP ingestion API
   ↓
JSON decode
   ↓
Validation
   ↓
Normalization
   ↓
Enrichment
   ↓
Batching
   ↓
Queue / Storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each incoming request contained one or more log events.&lt;/p&gt;

&lt;p&gt;A simplified event looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"payment-api"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"failed to charge customer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-28T12:40:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"6f9c9b9d7a1a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cus_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eu-west"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"retry_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For every request, the service had to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read the request body&lt;/li&gt;
&lt;li&gt;decode JSON&lt;/li&gt;
&lt;li&gt;create internal event structs&lt;/li&gt;
&lt;li&gt;normalize fields&lt;/li&gt;
&lt;li&gt;create temporary buffers&lt;/li&gt;
&lt;li&gt;encode the final payload again&lt;/li&gt;
&lt;li&gt;send it to another component&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This kind of workload creates many short-lived objects.&lt;/p&gt;

&lt;p&gt;And in Go, short-lived objects are usually fine — until you create too many of them in a hot path.&lt;/p&gt;




&lt;h2&gt;
  
  
  The First Symptom: Average Latency Looked Fine, p99 Did Not
&lt;/h2&gt;

&lt;p&gt;At low traffic, the service looked healthy.&lt;/p&gt;

&lt;p&gt;Something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Requests/sec:        2,000
Average latency:     8ms
p95 latency:         18ms
p99 latency:         35ms
CPU usage:           Normal
Memory usage:        Stable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But when the traffic increased, the picture changed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Requests/sec:        15,000+
Average latency:     16ms
p95 latency:         90ms
p99 latency:         280ms - 400ms
CPU usage:           Higher than expected
Memory usage:        Sawtooth pattern
GC activity:         Frequent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The average latency was not terrible, but the tail latency was bad.&lt;/p&gt;

&lt;p&gt;And for this kind of system, p99 matters a lot.&lt;/p&gt;

&lt;p&gt;If a log processing system becomes slow during incidents, it creates a very bad situation: when you need observability the most, your observability pipeline becomes the bottleneck.&lt;/p&gt;

&lt;p&gt;That is exactly the kind of failure mode I wanted to avoid.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why p99 Latency Matters More Than Average Latency
&lt;/h2&gt;

&lt;p&gt;Average latency can hide real production problems.&lt;/p&gt;

&lt;p&gt;For example, imagine this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;95 requests finish in 10ms
4 requests finish in 50ms
1 request finishes in 500ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The average may still look acceptable.&lt;/p&gt;

&lt;p&gt;But that one slow request is part of your p99.&lt;/p&gt;

&lt;p&gt;In high-throughput backend systems, p99 latency usually tells a more honest story than average latency.&lt;/p&gt;

&lt;p&gt;When p99 starts jumping under load, it usually means some part of the system occasionally blocks, pauses, waits, or does too much work.&lt;/p&gt;

&lt;p&gt;In my case, one of the main causes was allocation pressure and frequent garbage collection.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Initial Code: Simple, But Allocation Heavy
&lt;/h2&gt;

&lt;p&gt;The first version of the code was easy to read.&lt;/p&gt;

&lt;p&gt;For every request, I created new buffers and temporary objects.&lt;/p&gt;

&lt;p&gt;Something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"bytes"&lt;/span&gt;
    &lt;span class="s"&gt;"encoding/json"&lt;/span&gt;
    &lt;span class="s"&gt;"io"&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;LogEvent&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Service&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"service"`&lt;/span&gt;
    &lt;span class="n"&gt;Level&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"level"`&lt;/span&gt;
    &lt;span class="n"&gt;Message&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"message"`&lt;/span&gt;
    &lt;span class="n"&gt;Timestamp&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"timestamp"`&lt;/span&gt;
    &lt;span class="n"&gt;TraceID&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"trace_id"`&lt;/span&gt;
    &lt;span class="n"&gt;Metadata&lt;/span&gt;  &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="s"&gt;`json:"metadata"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;ingestHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"failed to read body"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusBadRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;LogEvent&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unmarshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"invalid json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusBadRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;LogEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalizeEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"failed to encode events"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusInternalServerError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// sendToQueue(buf.Bytes())&lt;/span&gt;

    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WriteHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusAccepted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;normalizeEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;LogEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;LogEvent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"info"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Metadata&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first glance, this is not bad code.&lt;/p&gt;

&lt;p&gt;Actually, for many applications, this is completely fine.&lt;/p&gt;

&lt;p&gt;But under heavy load, the problem was that this path was executed thousands of times per second.&lt;/p&gt;

&lt;p&gt;That means the service was constantly creating:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new byte slices
new buffers
new maps
new event slices
new encoder objects
new temporary JSON structures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most of these objects were short-lived.&lt;/p&gt;

&lt;p&gt;They were created, used for a few milliseconds, and then became garbage.&lt;/p&gt;

&lt;p&gt;The garbage collector had to clean them again and again.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem: Too Many Temporary Objects
&lt;/h2&gt;

&lt;p&gt;In Go, allocation itself is not always expensive.&lt;/p&gt;

&lt;p&gt;The real cost often appears later.&lt;/p&gt;

&lt;p&gt;Every object that escapes to the heap becomes something the garbage collector may need to track.&lt;/p&gt;

&lt;p&gt;When the system creates too many temporary objects, the GC has more work to do.&lt;/p&gt;

&lt;p&gt;That can lead to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;more frequent GC cycles
more CPU used by the runtime
less CPU available for actual request processing
latency spikes during high traffic
unstable p95 and p99 latency
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was exactly what I saw.&lt;/p&gt;

&lt;p&gt;The service was not slow because the logic was complex.&lt;/p&gt;

&lt;p&gt;It was slow because the hot path was creating too much garbage.&lt;/p&gt;

&lt;p&gt;That is an important distinction.&lt;/p&gt;

&lt;p&gt;Sometimes performance problems are not caused by bad algorithms. Sometimes they are caused by too much memory churn.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Confirmed the Problem
&lt;/h2&gt;

&lt;p&gt;Before changing anything, I wanted to confirm the source of the issue.&lt;/p&gt;

&lt;p&gt;I checked runtime metrics and profiling data.&lt;/p&gt;

&lt;p&gt;The signs were clear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;high allocation rate
frequent GC cycles
bytes.Buffer allocations in the hot path
JSON processing allocations
temporary slices created per request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A simplified benchmark showed the same pattern.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkIngestWithoutPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;generatePayload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReportAllocs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResetTimer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;processLogsWithoutPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result looked like this in my local benchmark:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BenchmarkIngestWithoutPool-10       8200    145000 ns/op    128 KB/op    420 allocs/op
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact numbers are not the important part.&lt;/p&gt;

&lt;p&gt;The important part was the pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;allocs/op was high
bytes/op was high
GC activity increased with throughput
p99 latency became unstable under pressure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That told me the optimization target was not just request logic.&lt;/p&gt;

&lt;p&gt;The target was allocation behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where &lt;code&gt;sync.Pool&lt;/code&gt; Fits
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;sync.Pool&lt;/code&gt; is a temporary object pool provided by Go.&lt;/p&gt;

&lt;p&gt;It allows you to reuse objects instead of allocating new ones every time.&lt;/p&gt;

&lt;p&gt;A good mental model is this:&lt;/p&gt;

&lt;p&gt;Instead of creating and throwing away the same type of object thousands of times per second, you keep reusable objects in a pool.&lt;/p&gt;

&lt;p&gt;You get one when you need it.&lt;/p&gt;

&lt;p&gt;You reset it.&lt;/p&gt;

&lt;p&gt;You use it.&lt;/p&gt;

&lt;p&gt;Then you put it back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Get  →  Reset  →  Use  →  Put back
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This can reduce pressure on the garbage collector because fewer temporary objects are allocated in the hot path.&lt;/p&gt;

&lt;p&gt;But there is an important warning:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sync.Pool&lt;/code&gt; is not a cache.&lt;/p&gt;

&lt;p&gt;Objects inside the pool can be removed by the garbage collector at any time.&lt;/p&gt;

&lt;p&gt;So you should not use it to store important state.&lt;/p&gt;

&lt;p&gt;It is best for temporary, reusable objects like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bytes.Buffer
[]byte buffers
temporary encoders
scratch objects
serialization helpers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That made it a good fit for my log processing pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Improved Version: Reusing Buffers
&lt;/h2&gt;

&lt;p&gt;The first improvement was to reuse &lt;code&gt;bytes.Buffer&lt;/code&gt; objects.&lt;/p&gt;

&lt;p&gt;Instead of creating a new buffer for every request, I created a pool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"bytes"&lt;/span&gt;
    &lt;span class="s"&gt;"encoding/json"&lt;/span&gt;
    &lt;span class="s"&gt;"io"&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;
    &lt;span class="s"&gt;"sync"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;bufferPool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;LogEvent&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Service&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"service"`&lt;/span&gt;
    &lt;span class="n"&gt;Level&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"level"`&lt;/span&gt;
    &lt;span class="n"&gt;Message&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"message"`&lt;/span&gt;
    &lt;span class="n"&gt;Timestamp&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"timestamp"`&lt;/span&gt;
    &lt;span class="n"&gt;TraceID&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;                 &lt;span class="s"&gt;`json:"trace_id"`&lt;/span&gt;
    &lt;span class="n"&gt;Metadata&lt;/span&gt;  &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="s"&gt;`json:"metadata"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;ingestHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"failed to read body"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusBadRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;LogEvent&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unmarshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"invalid json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusBadRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;LogEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalizeEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"failed to encode events"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusInternalServerError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Important:&lt;/span&gt;
    &lt;span class="c"&gt;// If the downstream function stores this data or uses it asynchronously,&lt;/span&gt;
    &lt;span class="c"&gt;// copy it before returning the buffer to the pool.&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// sendToQueue(payload)&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;

    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WriteHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusAccepted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;normalizeEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;LogEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;LogEvent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"info"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Metadata&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This change looks small, but it matters under load.&lt;/p&gt;

&lt;p&gt;The buffer is no longer allocated from zero for every request.&lt;/p&gt;

&lt;p&gt;Instead, the service reuses an existing buffer.&lt;/p&gt;

&lt;p&gt;That means fewer allocations, less memory churn, and less GC pressure.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Most Important Rule: Never Reuse Dirty Objects
&lt;/h2&gt;

&lt;p&gt;When using a pool, always reset the object before reusing it.&lt;/p&gt;

&lt;p&gt;This is critical.&lt;/p&gt;

&lt;p&gt;For &lt;code&gt;bytes.Buffer&lt;/code&gt;, call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a slice, reset it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a struct, clear the fields manually or create a reset method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;EventBuilder&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;level&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;fields&lt;/span&gt;  &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;EventBuilder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you forget to reset objects properly, you can accidentally leak data between requests.&lt;/p&gt;

&lt;p&gt;That is not just a performance bug.&lt;/p&gt;

&lt;p&gt;In a log processing system, it can become a serious correctness or security issue.&lt;/p&gt;

&lt;p&gt;Imagine one customer's metadata appearing inside another customer's event because a pooled object was not cleaned correctly.&lt;/p&gt;

&lt;p&gt;That is why object pooling should be used carefully.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Better Pool for Reusable Event Builders
&lt;/h2&gt;

&lt;p&gt;In my case, buffers were only one part of the problem.&lt;/p&gt;

&lt;p&gt;The pipeline also had a temporary event builder used during normalization and enrichment.&lt;/p&gt;

&lt;p&gt;A simplified version looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;EventBuilder&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Level&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Message&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;TraceID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Fields&lt;/span&gt;  &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewEventBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;EventBuilder&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;EventBuilder&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Fields&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Creating this for every event caused extra allocations, especially because of the map.&lt;/p&gt;

&lt;p&gt;So I moved it to a pool as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;eventBuilderPool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;EventBuilder&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Fields&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;EventBuilder&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Level&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Message&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;TraceID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Fields&lt;/span&gt;  &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;EventBuilder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TraceID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fields&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fields&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;buildEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="n"&gt;LogEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;EventBuilder&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;eventBuilderPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;EventBuilder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;
    &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;
    &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;
    &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TraceID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TraceID&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Metadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fields&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;releaseEventBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;EventBuilder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;eventBuilderPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%f"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%t"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The idea is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Do not allocate a new builder for every event.
Reuse a builder.
Clean it carefully.
Return it to the pool.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helped reduce the number of allocations per event.&lt;/p&gt;

&lt;p&gt;But again, this requires discipline.&lt;/p&gt;

&lt;p&gt;If the builder is still used somewhere else, do not put it back into the pool.&lt;/p&gt;

&lt;p&gt;Only return an object to the pool when you are 100% sure nobody else will use it.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Common Bug: Returning a Buffer Too Early
&lt;/h2&gt;

&lt;p&gt;This is one of the most dangerous mistakes with &lt;code&gt;sync.Pool&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Bad example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;sendAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="c"&gt;// dangerous&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is dangerous because &lt;code&gt;sendAsync&lt;/code&gt; may use the byte slice later, after the buffer has already been returned to the pool.&lt;/p&gt;

&lt;p&gt;Another request can get the same buffer and overwrite the data.&lt;/p&gt;

&lt;p&gt;The result can be corrupted payloads.&lt;/p&gt;

&lt;p&gt;The safer version is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;bufferPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}()&lt;/span&gt;

&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sendAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes, this copy creates an allocation.&lt;/p&gt;

&lt;p&gt;But correctness is more important.&lt;/p&gt;

&lt;p&gt;Pooling should not create hidden data races or corrupted messages.&lt;/p&gt;

&lt;p&gt;The goal is not to remove every allocation from the system.&lt;/p&gt;

&lt;p&gt;The goal is to remove unnecessary allocations safely.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmark Before and After
&lt;/h2&gt;

&lt;p&gt;After applying pooling to the hottest temporary objects, the benchmark improved.&lt;/p&gt;

&lt;p&gt;The simplified benchmark before pooling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BenchmarkIngestWithoutPool-10       8200    145000 ns/op    128 KB/op    420 allocs/op
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After pooling reusable buffers and temporary event builders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BenchmarkIngestWithPool-10         13500     89000 ns/op     62 KB/op    210 allocs/op
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this benchmark, the improvement was roughly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~38% lower processing time per operation
~51% lower memory usage per operation
~50% fewer allocations per operation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the real win was visible under load.&lt;/p&gt;

&lt;p&gt;Before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Requests/sec:        15,000+
Average latency:     16ms
p95 latency:         90ms
p99 latency:         280ms - 400ms
GC cycles:           Frequent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Requests/sec:        15,000+
Average latency:     11ms
p95 latency:         42ms
p99 latency:         95ms - 140ms
GC cycles:           Reduced
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These numbers are from a simplified internal benchmark scenario, not a universal promise.&lt;/p&gt;

&lt;p&gt;Your results will depend on payload size, CPU, memory, JSON structure, traffic pattern, and how many allocations exist in your hot path.&lt;/p&gt;

&lt;p&gt;But the direction was clear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;less allocation pressure
less GC work
more stable tail latency
better throughput under pressure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was exactly what the system needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Works Well for Log Processing Systems
&lt;/h2&gt;

&lt;p&gt;Log processing systems are a good use case for pooling because they usually process many similar objects repeatedly.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;network buffers
JSON payload buffers
temporary event builders
batch buffers
compression buffers
serialization buffers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern repeats thousands of times per second.&lt;/p&gt;

&lt;p&gt;That makes object reuse valuable.&lt;/p&gt;

&lt;p&gt;In a normal CRUD API, this optimization may not matter.&lt;/p&gt;

&lt;p&gt;But in ingestion systems, queues, observability pipelines, proxies, gateways, and stream processors, small allocation costs can become very large at scale.&lt;/p&gt;

&lt;p&gt;This is where senior-level performance work usually starts:&lt;/p&gt;

&lt;p&gt;Not by guessing.&lt;/p&gt;

&lt;p&gt;Not by making the code complex immediately.&lt;/p&gt;

&lt;p&gt;But by measuring the system, finding the hot path, and reducing unnecessary work where it actually matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Not to Use &lt;code&gt;sync.Pool&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;sync.Pool&lt;/code&gt; is powerful, but it is not something I use everywhere.&lt;/p&gt;

&lt;p&gt;I avoid it when:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the object is very small
the code is not in a hot path
allocation rate is already low
pooling makes the code harder to understand
the object contains sensitive data and cleanup is risky
ownership is unclear
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, pooling a tiny struct in a low-traffic admin endpoint is probably useless.&lt;/p&gt;

&lt;p&gt;It makes the code more complex without improving the system.&lt;/p&gt;

&lt;p&gt;That is not good engineering.&lt;/p&gt;

&lt;p&gt;Good performance engineering is not about adding clever tricks everywhere.&lt;/p&gt;

&lt;p&gt;It is about knowing where the system actually pays the cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Rules I Follow
&lt;/h2&gt;

&lt;p&gt;Here are the rules I personally follow when using &lt;code&gt;sync.Pool&lt;/code&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Measure first
&lt;/h3&gt;

&lt;p&gt;Do not add pooling just because it looks professional.&lt;/p&gt;

&lt;p&gt;Check allocations with benchmarks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReportAllocs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use profiling when possible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-bench&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-benchmem&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production-like analysis, use &lt;code&gt;pprof&lt;/code&gt; and runtime metrics.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Pool only hot-path temporary objects
&lt;/h3&gt;

&lt;p&gt;Good candidates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bytes.Buffer
large []byte slices
temporary builders
compression buffers
serialization helpers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bad candidates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;business state
long-lived objects
request-specific objects still used asynchronously
objects with unclear ownership
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Always reset before reuse
&lt;/h3&gt;

&lt;p&gt;Never trust the object you get from the pool.&lt;/p&gt;

&lt;p&gt;Clean it before using it.&lt;/p&gt;

&lt;p&gt;Clean it before putting it back.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Be careful with references
&lt;/h3&gt;

&lt;p&gt;Do not return an object to the pool while another goroutine, function, or queue still references it.&lt;/p&gt;

&lt;p&gt;This is especially important with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[]byte
bytes.Buffer
maps
slices
pointers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Do not use &lt;code&gt;sync.Pool&lt;/code&gt; as a cache
&lt;/h3&gt;

&lt;p&gt;The garbage collector can clear pooled objects.&lt;/p&gt;

&lt;p&gt;So never depend on the pool for correctness.&lt;/p&gt;

&lt;p&gt;The system must work even if the pool is empty.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Senior Engineering Lesson
&lt;/h2&gt;

&lt;p&gt;The biggest lesson for me was this:&lt;/p&gt;

&lt;p&gt;Performance problems are not always about slow code.&lt;/p&gt;

&lt;p&gt;Sometimes they are about how much temporary work the code creates.&lt;/p&gt;

&lt;p&gt;In my log processing service, the business logic was not very complicated.&lt;/p&gt;

&lt;p&gt;The real issue was that every request created too many short-lived objects.&lt;/p&gt;

&lt;p&gt;At low traffic, this was invisible.&lt;/p&gt;

&lt;p&gt;At high traffic, it became a GC problem.&lt;/p&gt;

&lt;p&gt;And once GC became more active, p99 latency became unstable.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;sync.Pool&lt;/code&gt; did not magically make the system fast.&lt;/p&gt;

&lt;p&gt;But it removed unnecessary pressure from the runtime.&lt;/p&gt;

&lt;p&gt;That gave the service more breathing room under load.&lt;/p&gt;

&lt;p&gt;The final architecture was still simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;receive logs
parse JSON
normalize events
reuse temporary buffers
batch efficiently
send downstream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the implementation became more careful about memory.&lt;/p&gt;

&lt;p&gt;That is the difference between code that works and code that survives real production traffic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;sync.Pool&lt;/code&gt; is not something every Go application needs.&lt;/p&gt;

&lt;p&gt;But if you are building high-throughput systems like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;log processors
observability pipelines
API gateways
stream processors
network services
JSON-heavy ingestion APIs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then object pooling is worth understanding.&lt;/p&gt;

&lt;p&gt;The key is not to use it blindly.&lt;/p&gt;

&lt;p&gt;The key is to measure first, identify allocation-heavy hot paths, and then reuse objects safely.&lt;/p&gt;

&lt;p&gt;For my Sentry-like log processing pipeline, pooling buffers and temporary builders helped reduce allocations, lower GC pressure, and stabilize p99 latency under high load.&lt;/p&gt;

&lt;p&gt;And in production systems, stable p99 latency is often more important than a beautiful average latency number.&lt;/p&gt;

&lt;p&gt;Because users do not feel your average.&lt;/p&gt;

&lt;p&gt;They feel the slow requests.&lt;/p&gt;

</description>
      <category>go</category>
      <category>performance</category>
      <category>backend</category>
      <category>observability</category>
    </item>
    <item>
      <title>Why We Accidentally Blocked Our Users: A Deep Dive into Idempotency in Distributed Systems</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Thu, 28 May 2026 10:31:54 +0000</pubDate>
      <link>https://dev.to/amirsefati/why-we-accidentally-blocked-our-users-a-deep-dive-into-idempotency-in-distributed-systems-4aik</link>
      <guid>https://dev.to/amirsefati/why-we-accidentally-blocked-our-users-a-deep-dive-into-idempotency-in-distributed-systems-4aik</guid>
      <description>&lt;p&gt;I learned one of my most important distributed-systems lessons the hard way.&lt;/p&gt;

&lt;p&gt;We were working on a payment flow connected to an external payment gateway. On paper, the architecture looked solid: microservices, clean database transactions, retry logic, monitoring, and enough security checks to make us feel safe before deployment.&lt;/p&gt;

&lt;p&gt;Then production reminded us that real users do not live inside clean architecture diagrams.&lt;/p&gt;

&lt;p&gt;Support tickets started coming in. Some users could not complete payments. Some accounts were being blocked too aggressively. At first, it looked like suspicious behavior: multiple payment attempts, repeated payloads, and requests arriving only seconds apart.&lt;/p&gt;

&lt;p&gt;But when I dug into the logs, the real problem was not fraud.&lt;/p&gt;

&lt;p&gt;It was our own backend.&lt;/p&gt;

&lt;p&gt;A user with a slow network clicked the &lt;strong&gt;Pay&lt;/strong&gt; button, waited, saw nothing happen, and clicked again. In another case, the browser retried a request after a timeout. Our backend received multiple identical payment requests within a very short window, and our naive security logic treated them like duplicate transaction anomalies or replay attempts.&lt;/p&gt;

&lt;p&gt;We were punishing users for having bad internet.&lt;/p&gt;

&lt;p&gt;That incident changed the way I think about payment systems, retries, APIs, and side effects. In distributed systems, you cannot control the network. You cannot control the user's browser. You cannot guarantee that a response will reach the client.&lt;/p&gt;

&lt;p&gt;But you &lt;strong&gt;must&lt;/strong&gt; control what happens when the same intent reaches your backend more than once.&lt;/p&gt;

&lt;p&gt;That is where idempotency becomes essential.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Is Not the Retry. The Problem Is the Side Effect.
&lt;/h2&gt;

&lt;p&gt;Retries are normal.&lt;/p&gt;

&lt;p&gt;Clients retry. Browsers retry. Mobile networks fail. Gateways timeout. Load balancers drop connections. Users double-click buttons. Background workers reprocess jobs. Message queues deliver the same message more than once.&lt;/p&gt;

&lt;p&gt;The dangerous part is not receiving the same request multiple times.&lt;/p&gt;

&lt;p&gt;The dangerous part is executing the same side effect multiple times.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;charging a card twice&lt;/li&gt;
&lt;li&gt;creating two orders&lt;/li&gt;
&lt;li&gt;sending duplicate invoices&lt;/li&gt;
&lt;li&gt;blocking a user after repeated attempts&lt;/li&gt;
&lt;li&gt;reducing inventory multiple times&lt;/li&gt;
&lt;li&gt;creating duplicate ledger entries&lt;/li&gt;
&lt;li&gt;triggering the same notification several times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In backend engineering, especially around payments and financial workflows, the real question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do I stop duplicate requests?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The better question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do I make duplicate requests safe?&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Idempotency Actually Means
&lt;/h2&gt;

&lt;p&gt;In mathematics, an operation is idempotent when applying it multiple times produces the same result as applying it once.&lt;/p&gt;

&lt;p&gt;In API design, some HTTP methods are naturally expected to be idempotent.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;GET&lt;/code&gt; should not change state.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;PUT&lt;/code&gt; can be idempotent because replacing a resource with the same representation multiple times should leave the system in the same final state.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;DELETE&lt;/code&gt; can also be idempotent because deleting the same resource multiple times still means the resource does not exist.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;POST&lt;/code&gt; is different.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;POST /payments&lt;/code&gt; request usually means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Create a new payment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the client sends the same request twice, the backend may create two payments unless we intentionally design against it.&lt;/p&gt;

&lt;p&gt;That is the core problem.&lt;/p&gt;

&lt;p&gt;A payment request represents an &lt;strong&gt;intent&lt;/strong&gt;, not just an HTTP payload. If the user intended to pay once, the system should execute that intent once, even if the request arrives multiple times.&lt;/p&gt;




&lt;h2&gt;
  
  
  Enter the Idempotency Key
&lt;/h2&gt;

&lt;p&gt;An idempotency key is a unique value generated by the client and sent with the request, usually as an HTTP header:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Idempotency-Key: 7f3f0f4c-98a4-4d8c-9b91-7a2f9e4c5d11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This request represents one specific user action. If you see this key again, do not execute the action again. Return the result of the first execution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That simple idea changes the system from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I received another request, so I will process it again.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I received the same intent again, so I will return the same result.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is especially important for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;payments&lt;/li&gt;
&lt;li&gt;order creation&lt;/li&gt;
&lt;li&gt;wallet transfers&lt;/li&gt;
&lt;li&gt;account provisioning&lt;/li&gt;
&lt;li&gt;invoice generation&lt;/li&gt;
&lt;li&gt;message publishing&lt;/li&gt;
&lt;li&gt;background job processing&lt;/li&gt;
&lt;li&gt;any workflow where duplicate execution is dangerous&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Good Idempotency Layer Is More Than a Boolean Flag
&lt;/h2&gt;

&lt;p&gt;A common mistake is implementing idempotency like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if key exists:
    reject request
else:
    process request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That looks simple, but it is not enough for production systems.&lt;/p&gt;

&lt;p&gt;A real implementation needs to answer several questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the request currently being processed?&lt;/li&gt;
&lt;li&gt;Did the request complete successfully?&lt;/li&gt;
&lt;li&gt;Should we return a cached response?&lt;/li&gt;
&lt;li&gt;Did the request fail with a retryable error?&lt;/li&gt;
&lt;li&gt;Is the same key being reused with a different payload?&lt;/li&gt;
&lt;li&gt;What happens if two identical requests arrive at the exact same millisecond?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For that reason, I prefer thinking about idempotency as a small state machine.&lt;/p&gt;




&lt;h2&gt;
  
  
  The State Machine I Like to Use
&lt;/h2&gt;

&lt;p&gt;A practical idempotency record can have these states:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;STARTED
COMPLETED
FAILED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;code&gt;STARTED&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The server received the key and started processing the request.&lt;/p&gt;

&lt;p&gt;This state is important because it protects you from concurrent duplicates. If another request arrives with the same key while the first request is still running, the system should not execute the action again.&lt;/p&gt;

&lt;p&gt;Usually, I return something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;409 Conflict
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;with a response explaining that the request is already in progress.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;COMPLETED&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The operation finished successfully.&lt;/p&gt;

&lt;p&gt;At this point, the server stores the final response and returns that same response for future requests with the same key.&lt;/p&gt;

&lt;p&gt;This is the key behavior that makes retries safe.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;FAILED&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The operation failed with a server-side or retryable error.&lt;/p&gt;

&lt;p&gt;This part depends on your business rules, but in many systems I prefer allowing the client to retry after a true internal failure. The important thing is to be explicit about which failures are cached and which failures are not.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Redis Works Well Here
&lt;/h2&gt;

&lt;p&gt;You can implement idempotency using PostgreSQL with a unique constraint. In some systems, that is perfectly fine.&lt;/p&gt;

&lt;p&gt;But in high-throughput APIs, I usually prefer Redis for the idempotency layer because it gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fast key lookup&lt;/li&gt;
&lt;li&gt;atomic operations&lt;/li&gt;
&lt;li&gt;natural TTL support&lt;/li&gt;
&lt;li&gt;simple distributed locking primitives&lt;/li&gt;
&lt;li&gt;low overhead for temporary request metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The TTL part matters a lot.&lt;/p&gt;

&lt;p&gt;Idempotency keys should not live forever. For many payment-style workflows, a 24-hour TTL is a reasonable starting point. Some systems may need shorter or longer retention depending on reconciliation, compliance, and product behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Race Condition You Must Handle
&lt;/h2&gt;

&lt;p&gt;The biggest bug in naive idempotency implementations is the race condition.&lt;/p&gt;

&lt;p&gt;Imagine two identical requests arrive at the same time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request A checks key -&amp;gt; key does not exist
Request B checks key -&amp;gt; key does not exist
Request A processes payment
Request B processes payment
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have two charges.&lt;/p&gt;

&lt;p&gt;This is why the first write must be atomic.&lt;/p&gt;

&lt;p&gt;In Redis, you can use &lt;code&gt;SET&lt;/code&gt; with &lt;code&gt;NX&lt;/code&gt; and &lt;code&gt;EX&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lockKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`idempotency:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;acquired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;lockKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;STARTED&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;payloadHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;NX&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;EX&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;86400&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;acquired&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Another request already created this key.&lt;/span&gt;
  &lt;span class="c1"&gt;// Now inspect its current state.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;NX&lt;/code&gt; means "only set this key if it does not already exist."&lt;/p&gt;

&lt;p&gt;That one detail is critical. It turns the check-and-set operation into one atomic step.&lt;/p&gt;




&lt;h2&gt;
  
  
  Returning the Cached Response
&lt;/h2&gt;

&lt;p&gt;When the first request completes, we update the idempotency record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`idempotency:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;COMPLETED&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;payloadHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;responseBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;paymentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pay_123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;succeeded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;completedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;EX&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;86400&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, if the same request arrives again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`idempotency:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;COMPLETED&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;responseBody&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The client gets a successful response, but the payment is not executed again.&lt;/p&gt;

&lt;p&gt;That is the whole point.&lt;/p&gt;

&lt;p&gt;The retry becomes harmless.&lt;/p&gt;




&lt;h2&gt;
  
  
  Payload Fingerprinting: The Edge Case Many People Miss
&lt;/h2&gt;

&lt;p&gt;There is one subtle bug that can become very dangerous.&lt;/p&gt;

&lt;p&gt;What if the client reuses the same idempotency key for a different request?&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Key: abc-123
Amount: $10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Key: abc-123
Amount: $1,000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your server only checks the key, it may return the cached response for the old request. That can create serious data integrity problems.&lt;/p&gt;

&lt;p&gt;The fix is payload fingerprinting.&lt;/p&gt;

&lt;p&gt;When the request first arrives, create a stable hash of the meaningful request body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;crypto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createPayloadHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then store that hash with the idempotency key.&lt;/p&gt;

&lt;p&gt;On every retry, compare the incoming payload hash with the stored hash.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payloadHash&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;incomingPayloadHash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Idempotency key was reused with a different payload.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents key reuse bugs from silently corrupting your workflow.&lt;/p&gt;

&lt;p&gt;In my opinion, this is not optional for financial systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Be Careful with 4xx and 5xx Responses
&lt;/h2&gt;

&lt;p&gt;Not every response should be cached the same way.&lt;/p&gt;

&lt;h3&gt;
  
  
  2xx responses
&lt;/h3&gt;

&lt;p&gt;Usually safe to cache.&lt;/p&gt;

&lt;p&gt;The operation succeeded, and future retries should return the same response.&lt;/p&gt;

&lt;h3&gt;
  
  
  5xx responses
&lt;/h3&gt;

&lt;p&gt;This depends on where the failure happened.&lt;/p&gt;

&lt;p&gt;If your service failed before executing the side effect, a retry may be safe.&lt;/p&gt;

&lt;p&gt;If your service timed out after calling the payment gateway, you may not know whether the external side effect happened. In that case, you need reconciliation, gateway lookup, or a more careful state transition.&lt;/p&gt;

&lt;p&gt;This is where many systems get complicated.&lt;/p&gt;

&lt;h3&gt;
  
  
  4xx responses
&lt;/h3&gt;

&lt;p&gt;Be very careful.&lt;/p&gt;

&lt;p&gt;If the user sent invalid input, you may not want to cache that failure forever. Maybe the user fixes the payload and retries. Maybe the frontend generated the key before validation was complete.&lt;/p&gt;

&lt;p&gt;Personally, I do not like blindly caching all 4xx responses for long periods. For many product flows, it is better to reject the invalid request and ask the client to generate a new idempotency key after the user changes the input.&lt;/p&gt;

&lt;p&gt;The important thing is to define this behavior intentionally.&lt;/p&gt;




&lt;h2&gt;
  
  
  Client-Side Key Generation
&lt;/h2&gt;

&lt;p&gt;The idempotency key should usually be generated on the client.&lt;/p&gt;

&lt;p&gt;For example, in a checkout page, generate the key when the user starts a specific payment attempt and reuse that key for retries of the same attempt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;idempotencyKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/payments&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Idempotency-Key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;idempotencyKey&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;USD&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the server generates the key too late, it may not help with network failures between the client and server.&lt;/p&gt;

&lt;p&gt;The client needs to be able to say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am retrying the same action.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That requires the client to own the key for that action.&lt;/p&gt;




&lt;h2&gt;
  
  
  Idempotency Is Not a Replacement for Database Integrity
&lt;/h2&gt;

&lt;p&gt;One mistake I see sometimes is treating idempotency as the only protection layer.&lt;/p&gt;

&lt;p&gt;It should not be.&lt;/p&gt;

&lt;p&gt;Idempotency is part of the design, but you should still use database constraints and transactional boundaries where appropriate.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unique constraints for order numbers&lt;/li&gt;
&lt;li&gt;unique transaction references&lt;/li&gt;
&lt;li&gt;ledger constraints&lt;/li&gt;
&lt;li&gt;gateway transaction IDs&lt;/li&gt;
&lt;li&gt;consistent status transitions&lt;/li&gt;
&lt;li&gt;outbox patterns for event publishing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In serious systems, correctness usually comes from layers.&lt;/p&gt;

&lt;p&gt;The API idempotency layer prevents duplicate execution at the request boundary.&lt;/p&gt;

&lt;p&gt;The database protects final state.&lt;/p&gt;

&lt;p&gt;The message queue and worker design protect asynchronous processing.&lt;/p&gt;

&lt;p&gt;The reconciliation process protects you when external systems behave unexpectedly.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Simple Middleware Flow
&lt;/h2&gt;

&lt;p&gt;A clean idempotency middleware can follow this flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Read Idempotency-Key from the request.
2. Validate that the key exists for dangerous POST endpoints.
3. Create a hash of the request payload.
4. Atomically create a STARTED record with Redis SET NX EX.
5. If the key already exists:
   - compare payload hash
   - if STARTED, return 409 Conflict
   - if COMPLETED, return cached response
   - if FAILED, allow or reject retry based on policy
6. Execute the business operation.
7. Store the final response as COMPLETED.
8. Return the response to the client.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This flow is simple enough to understand, but strong enough to avoid many production problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Rules I Follow Now
&lt;/h2&gt;

&lt;p&gt;After dealing with this in real systems, these are the rules I try to follow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Any endpoint that creates money movement must support idempotency.&lt;/li&gt;
&lt;li&gt;Any endpoint that creates orders, invoices, or irreversible side effects should support idempotency.&lt;/li&gt;
&lt;li&gt;The key should represent one user intent, not one HTTP request.&lt;/li&gt;
&lt;li&gt;The first write for the idempotency key must be atomic.&lt;/li&gt;
&lt;li&gt;Store and compare a payload hash.&lt;/li&gt;
&lt;li&gt;Cache successful responses.&lt;/li&gt;
&lt;li&gt;Be very careful when caching failures.&lt;/li&gt;
&lt;li&gt;Use TTL intentionally.&lt;/li&gt;
&lt;li&gt;Do not rely only on frontend button disabling.&lt;/li&gt;
&lt;li&gt;Do not use rate limiting as a replacement for idempotency.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last point is important.&lt;/p&gt;

&lt;p&gt;Rate limiting protects infrastructure.&lt;/p&gt;

&lt;p&gt;Idempotency protects correctness.&lt;/p&gt;

&lt;p&gt;They are not the same thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Lesson
&lt;/h2&gt;

&lt;p&gt;The day we accidentally blocked users was not just a payment bug. It was a design lesson.&lt;/p&gt;

&lt;p&gt;Our system was technically trying to protect users, but because we did not model retries correctly, we created a worse experience for legitimate customers.&lt;/p&gt;

&lt;p&gt;Distributed systems are messy. Networks fail. Clients retry. Users double-click. Gateways timeout. Workers reprocess jobs.&lt;/p&gt;

&lt;p&gt;A mature backend does not pretend these things will not happen.&lt;/p&gt;

&lt;p&gt;A mature backend absorbs them.&lt;/p&gt;

&lt;p&gt;Idempotency is one of those patterns that looks simple from the outside, but when you implement it properly, it changes the reliability of the whole system.&lt;/p&gt;

&lt;p&gt;For me, the biggest mindset shift was this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not fight duplicate requests. Make duplicate requests safe.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the difference between a backend that works in ideal conditions and a backend that survives production.&lt;/p&gt;

</description>
      <category>backend</category>
      <category>distributedsystems</category>
      <category>redis</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Go Modules in Practice: Init, Tidy, Vendor, and Publishing Packages</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Thu, 28 May 2026 09:35:32 +0000</pubDate>
      <link>https://dev.to/amirsefati/go-modules-in-practice-init-tidy-vendor-and-publishing-packages-58lc</link>
      <guid>https://dev.to/amirsefati/go-modules-in-practice-init-tidy-vendor-and-publishing-packages-58lc</guid>
      <description>&lt;p&gt;As a backend engineer, I have worked on many services where the hard part was not only writing the code. The hard part was keeping the project clean, reproducible, easy to build, and safe to maintain as the team and codebase grew.&lt;/p&gt;

&lt;p&gt;In Go, a big part of that discipline comes from understanding &lt;strong&gt;Go Modules&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At first, Go Modules may look simple: a &lt;code&gt;go.mod&lt;/code&gt; file, a &lt;code&gt;go.sum&lt;/code&gt; file, and a few commands like &lt;code&gt;go mod init&lt;/code&gt; and &lt;code&gt;go mod tidy&lt;/code&gt;. But in real projects, these small tools decide how your service builds in CI, how your dependencies are verified, how private repositories are handled, and how other developers can use your package.&lt;/p&gt;

&lt;p&gt;In this article, I want to explain Go Modules in a practical way, from the mindset of someone building production backend systems.&lt;/p&gt;

&lt;p&gt;We will cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What Go Modules are and why Go does not work like NPM or Pip&lt;/li&gt;
&lt;li&gt;How &lt;code&gt;go mod init&lt;/code&gt;, &lt;code&gt;go mod tidy&lt;/code&gt;, and &lt;code&gt;go mod vendor&lt;/code&gt; actually help&lt;/li&gt;
&lt;li&gt;How I think about Go project structure without over-engineering&lt;/li&gt;
&lt;li&gt;How to publish your own Go package&lt;/li&gt;
&lt;li&gt;A few production tips that matter in real teams&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Are Go Modules?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;Go module&lt;/strong&gt; is a versioned collection of Go packages.&lt;/p&gt;

&lt;p&gt;In simple words, it is the boundary of your project. It tells Go:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what your project is called&lt;/li&gt;
&lt;li&gt;which Go version it targets&lt;/li&gt;
&lt;li&gt;which dependencies it needs&lt;/li&gt;
&lt;li&gt;which versions of those dependencies should be used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Go module is defined by the &lt;code&gt;go.mod&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;Before Go Modules, Go projects were commonly managed inside &lt;code&gt;GOPATH&lt;/code&gt;. That worked, but it created friction around dependency versions and project location. Go Modules solved that by making dependency management explicit and project-based.&lt;/p&gt;

&lt;p&gt;Today, when I start a serious Go project, one of the first things I do is initialize a module.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Go Packages Feel Different From NPM or Pip
&lt;/h2&gt;

&lt;p&gt;If you come from JavaScript or Python, Go package management may feel a little strange at first.&lt;/p&gt;

&lt;p&gt;In Node.js, packages are usually published to &lt;strong&gt;NPM&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In Python, packages are usually published to &lt;strong&gt;PyPI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Go is different.&lt;/p&gt;

&lt;p&gt;Go uses the module path as an import path, and that path is usually a version control URL, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"github.com/google/uuid"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means Go can resolve packages directly from places like GitHub, GitLab, or Bitbucket.&lt;/p&gt;

&lt;p&gt;That is why many Go packages look like URLs. The module path is not just a name. It is also the identity of the package.&lt;/p&gt;

&lt;p&gt;This design makes publishing very simple. In many cases, publishing a Go package is just:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;push your code to GitHub&lt;/li&gt;
&lt;li&gt;create a proper Git tag&lt;/li&gt;
&lt;li&gt;let Go tooling resolve it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No separate package registry account is required for basic public modules.&lt;/p&gt;




&lt;h2&gt;
  
  
  Starting a Project With &lt;code&gt;go mod init&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Every Go module starts with this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go mod init &amp;lt;module-path&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;my-awesome-project
&lt;span class="nb"&gt;cd &lt;/span&gt;my-awesome-project
go mod init github.com/yourusername/my-awesome-project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a &lt;code&gt;go.mod&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;module&lt;/span&gt; &lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;yourusername&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;my&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;awesome&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;

&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="m"&gt;1.22&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The module path matters.&lt;/p&gt;

&lt;p&gt;For local experiments, you can use something simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go mod init myapp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But for real packages, especially packages you may publish later, I prefer using the final repository path from the beginning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go mod init github.com/yourusername/my-awesome-project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That prevents painful import-path changes later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Adding Third-Party Packages
&lt;/h2&gt;

&lt;p&gt;Let’s say I am building a small HTTP service and I want to use Gin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/gin-gonic/gin"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/ping"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"message"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"pong"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, Go sees an external import:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="s"&gt;"github.com/gin-gonic/gin"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the module needs to know about this dependency.&lt;/p&gt;

&lt;p&gt;You can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go mod tidy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go get github.com/gin-gonic/gin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In daily work, I use &lt;code&gt;go mod tidy&lt;/code&gt; a lot because it does more than just add packages.&lt;/p&gt;




&lt;h2&gt;
  
  
  What &lt;code&gt;go mod tidy&lt;/code&gt; Really Does
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;go mod tidy&lt;/code&gt; is one of the commands I run before almost every commit in a Go project.&lt;/p&gt;

&lt;p&gt;It scans your code and tests, then updates &lt;code&gt;go.mod&lt;/code&gt; and &lt;code&gt;go.sum&lt;/code&gt; based on what is actually used.&lt;/p&gt;

&lt;p&gt;It does three important things:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Adds Missing Dependencies
&lt;/h3&gt;

&lt;p&gt;If your code imports a package but your module does not list it yet, &lt;code&gt;go mod tidy&lt;/code&gt; adds it.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go mod tidy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, your &lt;code&gt;go.mod&lt;/code&gt; may include something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;require&lt;/span&gt; &lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;gonic&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="m"&gt;.10.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Removes Unused Dependencies
&lt;/h3&gt;

&lt;p&gt;If you removed a package from your code but it is still listed in &lt;code&gt;go.mod&lt;/code&gt;, &lt;code&gt;go mod tidy&lt;/code&gt; cleans it up.&lt;/p&gt;

&lt;p&gt;This is important because old dependencies are not harmless. They can increase build complexity, create security review noise, and confuse other engineers.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Updates &lt;code&gt;go.sum&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;go.sum&lt;/code&gt; stores cryptographic checksums for module versions.&lt;/p&gt;

&lt;p&gt;This helps Go verify that the dependency being downloaded is the same one expected by your project.&lt;/p&gt;

&lt;p&gt;In production, this matters. Reproducible builds are not a luxury. They are part of engineering discipline.&lt;/p&gt;

&lt;p&gt;My rule is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If I changed imports, removed packages, upgraded dependencies, or touched tests, I run &lt;code&gt;go mod tidy&lt;/code&gt; before committing.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Understanding &lt;code&gt;go.mod&lt;/code&gt; and &lt;code&gt;go.sum&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;A minimal &lt;code&gt;go.mod&lt;/code&gt; file looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;module&lt;/span&gt; &lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;yourusername&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;my&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;awesome&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;

&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="m"&gt;1.22&lt;/span&gt;

&lt;span class="n"&gt;require&lt;/span&gt; &lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;gonic&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="m"&gt;.10.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This file is human-readable and should be committed.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;go.sum&lt;/code&gt; is also committed. Some developers think &lt;code&gt;go.sum&lt;/code&gt; is like a cache file and should be ignored. That is a mistake.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;go.sum&lt;/code&gt; is part of dependency verification. It helps ensure that the same module versions resolve consistently across machines, CI pipelines, and production builds.&lt;/p&gt;

&lt;p&gt;So in real Go projects, I commit both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;go.mod
go.sum
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Vendoring Dependencies With &lt;code&gt;go mod vendor&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;By default, Go downloads dependencies into the module cache on your machine.&lt;/p&gt;

&lt;p&gt;But sometimes you want dependencies copied directly into your project. For that, Go provides:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go mod vendor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a &lt;code&gt;vendor/&lt;/code&gt; directory and copies dependency source code into it.&lt;/p&gt;

&lt;p&gt;Your project may look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-awesome-project/
├── go.mod
├── go.sum
├── main.go
└── vendor/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When Vendoring Makes Sense
&lt;/h3&gt;

&lt;p&gt;I do not vendor dependencies in every project, but it can be useful in specific cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI/CD environments with restricted internet access&lt;/li&gt;
&lt;li&gt;companies with strict dependency review processes&lt;/li&gt;
&lt;li&gt;projects that must build in isolated networks&lt;/li&gt;
&lt;li&gt;teams that want dependency source code available inside the repository&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most normal public projects, vendoring is not required. Go’s module proxy and checksum database already solve many reliability problems for public dependencies.&lt;/p&gt;

&lt;p&gt;But for enterprise systems, especially where builds must be predictable and auditable, vendoring can still be a valid choice.&lt;/p&gt;

&lt;p&gt;If you use vendoring, build with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go build &lt;span class="nt"&gt;-mod&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;vendor ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or test with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-mod&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;vendor ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Project Structure: Start Simple
&lt;/h2&gt;

&lt;p&gt;One of the mistakes I often see is copying project structures from other ecosystems.&lt;/p&gt;

&lt;p&gt;A developer comes from Laravel, Django, NestJS, or Spring Boot and immediately creates folders like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;controllers/
services/
repositories/
models/
helpers/
utils/
managers/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not always wrong, but in Go it often becomes unnecessary complexity too early.&lt;/p&gt;

&lt;p&gt;Go projects usually become cleaner when they start simple and grow based on real pressure.&lt;/p&gt;

&lt;p&gt;For a small service, this can be enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-project/
├── go.mod
├── go.sum
└── main.go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That structure is not “too simple”. It is honest.&lt;/p&gt;

&lt;p&gt;You do not need architecture before you have a problem that needs architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Practical Production Structure
&lt;/h2&gt;

&lt;p&gt;When the project grows, I usually move toward a structure like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-project/
├── cmd/
│   └── api/
│       └── main.go
├── internal/
│   ├── config/
│   │   └── config.go
│   ├── database/
│   │   └── postgres.go
│   ├── http/
│   │   └── router.go
│   └── user/
│       ├── handler.go
│       ├── service.go
│       └── repository.go
├── pkg/
│   └── logger/
│       └── logger.go
├── go.mod
└── go.sum
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is how I think about these folders:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;cmd/&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This contains application entry points.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cmd/api/main.go
cmd/worker/main.go
cmd/migrate/main.go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each subdirectory represents a runnable program.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;internal/&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is where most application code should live.&lt;/p&gt;

&lt;p&gt;The important thing about &lt;code&gt;internal/&lt;/code&gt; is that Go enforces its privacy. Code inside &lt;code&gt;internal/&lt;/code&gt; cannot be imported from outside the parent module tree.&lt;/p&gt;

&lt;p&gt;That is powerful.&lt;/p&gt;

&lt;p&gt;It means your business logic, database code, handlers, and internal services are protected from becoming accidental public APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;pkg/&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;I use &lt;code&gt;pkg/&lt;/code&gt; only for code that is intentionally reusable by other projects.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a logger package&lt;/li&gt;
&lt;li&gt;a small validation package&lt;/li&gt;
&lt;li&gt;a reusable client&lt;/li&gt;
&lt;li&gt;shared utilities that are stable enough to expose&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I do not put everything into &lt;code&gt;pkg/&lt;/code&gt;. If the code is only for this application, it belongs in &lt;code&gt;internal/&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Publishing Your Own Go Package
&lt;/h2&gt;

&lt;p&gt;One thing I really like about Go is how easy it is to publish packages.&lt;/p&gt;

&lt;p&gt;Let’s say I wrote a small package for PostgreSQL migration helpers and I want other developers to use it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Create the Module
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;pgmigrate
&lt;span class="nb"&gt;cd &lt;/span&gt;pgmigrate
go mod init github.com/yourusername/pgmigrate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Write a Small Package
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;pgmigrate&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"database/sql"&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Migration&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Version&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Up&lt;/span&gt;      &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Down&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;migration&lt;/span&gt; &lt;span class="n"&gt;Migration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;migration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Up&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Add Tests
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even for small packages, tests matter. A package without tests is harder to trust.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Push to GitHub
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add &lt;span class="nb"&gt;.&lt;/span&gt;
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"initial release"&lt;/span&gt;
git push origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Tag a Version
&lt;/h3&gt;

&lt;p&gt;Go Modules use semantic versioning.&lt;/p&gt;

&lt;p&gt;Create a Git tag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git tag v0.1.0
git push origin v0.1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now other developers can use it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go get github.com/yourusername/pgmigrate@v0.1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the beauty of Go package publishing. Git tags are releases.&lt;/p&gt;




&lt;h2&gt;
  
  
  Semantic Versioning in Go
&lt;/h2&gt;

&lt;p&gt;Versioning is very important in Go.&lt;/p&gt;

&lt;p&gt;A normal release tag looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;v1.2.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The meaning is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MAJOR.MINOR.PATCH
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PATCH&lt;/code&gt;: bug fixes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MINOR&lt;/code&gt;: new backward-compatible features&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MAJOR&lt;/code&gt;: breaking changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One Go-specific detail is important: if your module reaches &lt;code&gt;v2&lt;/code&gt; or higher, the module path must include the major version.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;module&lt;/span&gt; &lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;yourusername&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;pgmigrate&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;v2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And users import it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"github.com/yourusername/pgmigrate/v2"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This may feel strange in the beginning, but it makes breaking changes explicit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Working With Private Modules
&lt;/h2&gt;

&lt;p&gt;In real companies, not all Go modules are public.&lt;/p&gt;

&lt;p&gt;If your company has private repositories, you should configure &lt;code&gt;GOPRIVATE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GOPRIVATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;github.com/mycompany/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells Go not to use the public proxy or checksum database for those module paths.&lt;/p&gt;

&lt;p&gt;For private GitHub modules, your machine or CI runner also needs Git authentication configured correctly, usually through SSH keys or tokens.&lt;/p&gt;

&lt;p&gt;In CI/CD, this is one of the first things I check when a build fails with private dependencies.&lt;/p&gt;

&lt;p&gt;A common symptom is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terminal prompts disabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;repository not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dependency may exist, but the CI runner does not have permission to access it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Local Multi-Module Development With &lt;code&gt;go work&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Sometimes I work on two Go modules at the same time.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;workspace/
├── api-service/
└── shared-lib/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before Go workspaces, developers often used &lt;code&gt;replace&lt;/code&gt; in &lt;code&gt;go.mod&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;replace&lt;/span&gt; &lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;mycompany&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;shared&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;../&lt;/span&gt;&lt;span class="n"&gt;shared&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works, but it is easy to accidentally commit local paths.&lt;/p&gt;

&lt;p&gt;A better option for local development is &lt;code&gt;go work&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;workspace
&lt;span class="nb"&gt;cd &lt;/span&gt;workspace
go work init ./api-service ./shared-lib
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a &lt;code&gt;go.work&lt;/code&gt; file that connects local modules during development.&lt;/p&gt;

&lt;p&gt;I like this approach because it keeps local development clean without polluting the module definition.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Production Checklist for Go Modules
&lt;/h2&gt;

&lt;p&gt;Before I push Go code, I usually check these things:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;fmt&lt;/span&gt; ./...
go &lt;span class="nb"&gt;test&lt;/span&gt; ./...
go mod tidy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For larger services, I also check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go vet ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if vendoring is used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go mod vendor
go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-mod&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;vendor ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This small routine prevents many annoying problems in CI.&lt;/p&gt;

&lt;p&gt;My practical checklist is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;commit both &lt;code&gt;go.mod&lt;/code&gt; and &lt;code&gt;go.sum&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;run &lt;code&gt;go mod tidy&lt;/code&gt; before pushing&lt;/li&gt;
&lt;li&gt;avoid unnecessary dependencies&lt;/li&gt;
&lt;li&gt;do not expose internal application code as public packages&lt;/li&gt;
&lt;li&gt;use &lt;code&gt;GOPRIVATE&lt;/code&gt; for private company modules&lt;/li&gt;
&lt;li&gt;tag releases properly when publishing packages&lt;/li&gt;
&lt;li&gt;be careful with &lt;code&gt;replace&lt;/code&gt; directives before committing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Common Mistakes I See
&lt;/h2&gt;

&lt;p&gt;Here are some mistakes I have seen many times in real projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 1: Ignoring &lt;code&gt;go.sum&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Do not ignore &lt;code&gt;go.sum&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It is part of module verification and should be committed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: Creating Too Many Folders Too Early
&lt;/h3&gt;

&lt;p&gt;A complex folder structure does not automatically make a project clean.&lt;/p&gt;

&lt;p&gt;In Go, simple structure is usually better until the codebase proves it needs more separation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: Putting Everything in &lt;code&gt;pkg/&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;pkg/&lt;/code&gt; should not mean “all packages”.&lt;/p&gt;

&lt;p&gt;Use it for code you intentionally want other modules to import. Otherwise, prefer &lt;code&gt;internal/&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: Forgetting Private Module Configuration
&lt;/h3&gt;

&lt;p&gt;If your dependency is private, the build environment must know that.&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;GOPRIVATE&lt;/code&gt; and make sure Git authentication works in CI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 5: Committing Local &lt;code&gt;replace&lt;/code&gt; Paths
&lt;/h3&gt;

&lt;p&gt;This one is very common.&lt;/p&gt;

&lt;p&gt;A local &lt;code&gt;replace&lt;/code&gt; like this can break everyone else’s build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;replace&lt;/span&gt; &lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;mycompany&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;shared&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;../&lt;/span&gt;&lt;span class="n"&gt;shared&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;go work&lt;/code&gt; for local multi-module development when possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Go Modules are not just a dependency tool. They are part of how Go projects stay clean, buildable, and maintainable.&lt;/p&gt;

&lt;p&gt;For me, the main lesson is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Keep the module simple, keep dependencies intentional, and let the project structure grow only when the project really needs it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Start with &lt;code&gt;go mod init&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;go mod tidy&lt;/code&gt; as a habit.&lt;/p&gt;

&lt;p&gt;Understand when &lt;code&gt;vendor/&lt;/code&gt; is useful.&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;internal/&lt;/code&gt; to protect your application code.&lt;/p&gt;

&lt;p&gt;Tag releases properly when you publish packages.&lt;/p&gt;

&lt;p&gt;These small habits make a big difference when your Go project moves from a local experiment to a production service used by real users.&lt;/p&gt;

&lt;p&gt;Happy coding.&lt;/p&gt;

</description>
      <category>go</category>
      <category>backend</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Mastering Context in Go: A Engineer’s Playbook for Lifecycle Management</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Tue, 26 May 2026 18:39:04 +0000</pubDate>
      <link>https://dev.to/amirsefati/mastering-context-in-go-a-senior-engineers-playbook-for-lifecycle-management-147c</link>
      <guid>https://dev.to/amirsefati/mastering-context-in-go-a-senior-engineers-playbook-for-lifecycle-management-147c</guid>
      <description>&lt;p&gt;When you are architecting backend systems, distributed architectures, and microservices, one of the biggest challenges is not just making your code run fast.&lt;/p&gt;

&lt;p&gt;It is knowing exactly &lt;strong&gt;when to stop it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Throughout my years working with backend systems and Go’s concurrency model, I have seen how ignoring a seemingly simple concept can slowly create serious production problems: goroutine leaks, wasted CPU cycles, exhausted memory, hanging requests, and services that become harder to reason about under load.&lt;/p&gt;

&lt;p&gt;In Go, the &lt;code&gt;context&lt;/code&gt; package is not just an annoying extra parameter we are forced to pass around.&lt;/p&gt;

&lt;p&gt;It is the lifecycle control system of a request.&lt;/p&gt;

&lt;p&gt;In this article, I want to skip the boring textbook definition and look at &lt;code&gt;context&lt;/code&gt; from a practical engineering perspective:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When does &lt;code&gt;context&lt;/code&gt; save your system, and when can it become an architectural trap?&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Context Exists: Beyond a Simple Timeout
&lt;/h2&gt;

&lt;p&gt;In a distributed architecture, a single incoming HTTP request might trigger a cascade of operations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an authentication check&lt;/li&gt;
&lt;li&gt;a database query&lt;/li&gt;
&lt;li&gt;a Redis lookup&lt;/li&gt;
&lt;li&gt;a call to another internal service&lt;/li&gt;
&lt;li&gt;a gRPC request&lt;/li&gt;
&lt;li&gt;a message published to a queue&lt;/li&gt;
&lt;li&gt;a third-party API call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now imagine the client closes the browser tab, the mobile app loses network connection, or the upstream service cancels the request.&lt;/p&gt;

&lt;p&gt;Should your backend continue doing all that work?&lt;/p&gt;

&lt;p&gt;Usually, no.&lt;/p&gt;

&lt;p&gt;Continuing heavy work for a response nobody is waiting for is just burning CPU, memory, database connections, and network bandwidth.&lt;/p&gt;

&lt;p&gt;This is where &lt;code&gt;context&lt;/code&gt; becomes important.&lt;/p&gt;

&lt;p&gt;We pass &lt;code&gt;context&lt;/code&gt; down the call stack to support &lt;strong&gt;cancellation propagation&lt;/strong&gt;. It allows the whole request tree to receive the same signal:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This work is no longer needed. Stop as soon as possible.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That one idea becomes extremely powerful in real-world backend systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Mental Model: Context Is a Request Lifecycle Signal
&lt;/h2&gt;

&lt;p&gt;A good way to think about &lt;code&gt;context.Context&lt;/code&gt; is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Context is not business data. Context is a lifecycle signal.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It can tell your code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the request was cancelled&lt;/li&gt;
&lt;li&gt;the deadline expired&lt;/li&gt;
&lt;li&gt;the timeout was reached&lt;/li&gt;
&lt;li&gt;the caller does not need the result anymore&lt;/li&gt;
&lt;li&gt;some request-scoped metadata is available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But it should not become a hidden container for everything your function needs.&lt;/p&gt;

&lt;p&gt;That distinction is where many Go codebases either stay clean or slowly become painful to maintain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Battle-Tested Patterns: Coding Like a Senior
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Bulletproofing Against Goroutine Leaks
&lt;/h3&gt;

&lt;p&gt;One common mistake I have seen in Go services is starting a goroutine without thinking about what happens if the parent request is cancelled.&lt;/p&gt;

&lt;p&gt;Look at this simplified example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;FetchData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;errCh&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Imagine this is a heavy I/O operation or a slow DB query.&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;expensiveDatabaseCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;errCh&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="c"&gt;// If the client aborts or a timeout occurs,&lt;/span&gt;
        &lt;span class="c"&gt;// the caller can return immediately.&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;errCh&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important detail here is this line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The channel is buffered with capacity &lt;code&gt;1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Why does this matter?&lt;/p&gt;

&lt;p&gt;Because if &lt;code&gt;ctx.Done()&lt;/code&gt; is triggered and &lt;code&gt;FetchData&lt;/code&gt; returns early, the internal goroutine might still finish later and try to send the result into the channel.&lt;/p&gt;

&lt;p&gt;If the channel is unbuffered, that goroutine may block forever because nobody is receiving anymore.&lt;/p&gt;

&lt;p&gt;That is a classic goroutine leak.&lt;/p&gt;

&lt;p&gt;The buffer gives the goroutine enough room to send the result, finish execution, and be garbage collected.&lt;/p&gt;

&lt;p&gt;This is one of those small details that does not look important in a tutorial, but it matters a lot in production systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Prefer Context-Aware APIs
&lt;/h2&gt;

&lt;p&gt;The previous example is useful to understand the pattern, but in real code, you should prefer APIs that already accept &lt;code&gt;context.Context&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example, database calls should usually look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;FindUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QueryRowContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;`
        SELECT id, name, email
        FROM users
        WHERE id = $1
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is better than wrapping a blocking database call inside your own goroutine.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because &lt;code&gt;QueryRowContext&lt;/code&gt; gives the database driver a chance to stop the work when the context is cancelled or the deadline expires.&lt;/p&gt;

&lt;p&gt;That means cancellation is not just happening in your Go code. It can also propagate to the I/O layer.&lt;/p&gt;

&lt;p&gt;That is what you want.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Timeout Everything That Talks to the Outside World
&lt;/h2&gt;

&lt;p&gt;Any operation that depends on the outside world can hang longer than expected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;database queries&lt;/li&gt;
&lt;li&gt;HTTP calls&lt;/li&gt;
&lt;li&gt;gRPC calls&lt;/li&gt;
&lt;li&gt;Redis commands&lt;/li&gt;
&lt;li&gt;file storage requests&lt;/li&gt;
&lt;li&gt;third-party APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A senior mindset is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If it crosses a process or network boundary, it needs a timeout.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;CallPaymentService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewRequestWithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MethodGet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusCode&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"payment service returned status: %d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusCode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives the operation a clear budget.&lt;/p&gt;

&lt;p&gt;Without timeouts, a slow downstream dependency can slowly consume your worker pool, database pool, or goroutines until the whole service becomes unstable.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Always Call Cancel
&lt;/h2&gt;

&lt;p&gt;When you create a context with &lt;code&gt;WithCancel&lt;/code&gt;, &lt;code&gt;WithTimeout&lt;/code&gt;, or &lt;code&gt;WithDeadline&lt;/code&gt;, always call the returned &lt;code&gt;cancel&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parentCtx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even if the timeout will eventually expire, calling &lt;code&gt;cancel()&lt;/code&gt; releases resources associated with that context earlier.&lt;/p&gt;

&lt;p&gt;This is especially important in hot paths where functions are called many times per second.&lt;/p&gt;

&lt;p&gt;A missing &lt;code&gt;cancel()&lt;/code&gt; may not break your service immediately, but it can create unnecessary resource pressure over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Dark Side of &lt;code&gt;context.WithValue&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;One of the features of &lt;code&gt;context&lt;/code&gt; is the ability to carry values across the request lifecycle.&lt;/p&gt;

&lt;p&gt;This is useful, but it is also one of the easiest ways to make a Go codebase messy.&lt;/p&gt;

&lt;p&gt;The golden rule is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use &lt;code&gt;context.Value&lt;/code&gt; only for request-scoped data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Good examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request ID&lt;/li&gt;
&lt;li&gt;trace ID&lt;/li&gt;
&lt;li&gt;user ID parsed from a JWT&lt;/li&gt;
&lt;li&gt;tenant ID&lt;/li&gt;
&lt;li&gt;client IP&lt;/li&gt;
&lt;li&gt;correlation ID&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;database connections&lt;/li&gt;
&lt;li&gt;loggers as hidden dependencies&lt;/li&gt;
&lt;li&gt;configuration objects&lt;/li&gt;
&lt;li&gt;service clients&lt;/li&gt;
&lt;li&gt;repositories&lt;/li&gt;
&lt;li&gt;feature flag clients&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not use &lt;code&gt;context&lt;/code&gt; as a dependency injection container.&lt;/p&gt;

&lt;p&gt;When dependencies are hidden inside context, your function signature lies. The function looks simple, but it secretly depends on multiple things. That hurts readability, testing, and type safety.&lt;/p&gt;




&lt;h2&gt;
  
  
  Safe Context Values with Custom Key Types
&lt;/h2&gt;

&lt;p&gt;Another mistake is using raw strings as context keys.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"userID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This can cause collisions between packages.&lt;/p&gt;

&lt;p&gt;A safer pattern is to define an unexported custom type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;contextKey&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;

&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;userIDKey&lt;/span&gt; &lt;span class="n"&gt;contextKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"userID"&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;WithUserID&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userIDKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetUserID&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userIDKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the usage safer and prevents accidental key collisions with other packages.&lt;/p&gt;

&lt;p&gt;For larger systems, I usually prefer wrapping this inside a small internal package so the rest of the codebase does not directly touch raw context keys.&lt;/p&gt;




&lt;h2&gt;
  
  
  Context in HTTP Handlers
&lt;/h2&gt;

&lt;p&gt;In HTTP services, the incoming request already has a context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetUserHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HandlerFunc&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;FindUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;123&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusInternalServerError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the client disconnects, the request context can be cancelled, and downstream operations that respect context can stop earlier.&lt;/p&gt;

&lt;p&gt;That is why passing &lt;code&gt;context.Background()&lt;/code&gt; inside handlers is usually wrong.&lt;/p&gt;

&lt;p&gt;This is bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;FindUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;123&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because you just detached the database query from the request lifecycle.&lt;/p&gt;

&lt;p&gt;The client may be gone, but your backend keeps working.&lt;/p&gt;




&lt;h2&gt;
  
  
  Background Jobs Are Different
&lt;/h2&gt;

&lt;p&gt;Not everything should use the request context.&lt;/p&gt;

&lt;p&gt;Sometimes you intentionally want work to continue after the request ends.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;writing audit logs&lt;/li&gt;
&lt;li&gt;publishing analytics events&lt;/li&gt;
&lt;li&gt;sending async notifications&lt;/li&gt;
&lt;li&gt;queueing background jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those cases, using the request context may be wrong because the work would be cancelled as soon as the request ends.&lt;/p&gt;

&lt;p&gt;But this does not mean you should ignore context completely.&lt;/p&gt;

&lt;p&gt;It means you should create a new, intentional context with its own timeout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;PublishAuditEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="n"&gt;EventProducer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;AuditEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key idea is intentionality.&lt;/p&gt;

&lt;p&gt;Do not accidentally detach work from the request.&lt;br&gt;
Do not accidentally bind background work to a request that may disappear.&lt;/p&gt;

&lt;p&gt;Choose the lifecycle explicitly.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Trade-Offs: A Realistic View
&lt;/h2&gt;

&lt;p&gt;No tool in software engineering is perfect. &lt;code&gt;context&lt;/code&gt; solves real problems, but it also brings trade-offs.&lt;/p&gt;
&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Standardization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;context&lt;/code&gt; is the common language of lifecycle management in Go. It is used across packages like &lt;code&gt;net/http&lt;/code&gt;, &lt;code&gt;database/sql&lt;/code&gt;, gRPC, cloud SDKs, and many third-party libraries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lifecycle control&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;context.WithTimeout&lt;/code&gt;, &lt;code&gt;context.WithDeadline&lt;/code&gt;, and &lt;code&gt;context.WithCancel&lt;/code&gt;, you can build services that do not hang forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cancellation propagation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One cancellation signal can flow through multiple layers of your system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability support&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Request IDs, trace IDs, and correlation IDs become much easier to propagate across service boundaries.&lt;/p&gt;
&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Viral nature&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once you add &lt;code&gt;context&lt;/code&gt; to a low-level function, you often need to pass it through every function above it.&lt;/p&gt;

&lt;p&gt;This can feel noisy, but in backend systems, that noise is usually worth the explicit lifecycle control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weak type safety for values&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ctx.Value()&lt;/code&gt; returns &lt;code&gt;any&lt;/code&gt;, so incorrect type assertions can cause bugs or panics.&lt;/p&gt;

&lt;p&gt;That is why context values should be used carefully and kept small.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Easy to misuse&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The biggest danger is treating context as a magic bag for dependencies.&lt;/p&gt;

&lt;p&gt;That creates hidden coupling and makes the code harder to understand.&lt;/p&gt;


&lt;h2&gt;
  
  
  My Code Review Checklist for Context
&lt;/h2&gt;

&lt;p&gt;When I review Go code, these are my non-negotiable rules around &lt;code&gt;context&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Context Must Be the First Argument
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;DoSomething&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is the standard Go convention.&lt;/p&gt;

&lt;p&gt;Do not hide it in the middle of the argument list.&lt;/p&gt;


&lt;h3&gt;
  
  
  2. Do Not Store Context in a Struct
&lt;/h3&gt;

&lt;p&gt;Avoid this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Contexts should flow through function calls. They should not usually live inside structs.&lt;/p&gt;

&lt;p&gt;A struct normally represents a longer-lived object. A context usually represents a specific operation or request lifecycle.&lt;/p&gt;

&lt;p&gt;Mixing those lifetimes creates confusion.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Never Pass Nil Context
&lt;/h3&gt;

&lt;p&gt;This is bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;DoSomething&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are not sure what context to use yet, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TODO&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are starting a root-level process, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;TODO()&lt;/code&gt; is also useful because it makes unfinished context decisions searchable during refactoring.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Always Defer Cancel Immediately
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parentCtx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do this immediately after creating the context.&lt;/p&gt;

&lt;p&gt;It prevents forgetting it later when the function grows.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Do Not Ignore &lt;code&gt;ctx.Err()&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;When a context is cancelled, &lt;code&gt;ctx.Err()&lt;/code&gt; tells you why.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The error is usually one of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;context.Canceled&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;context.DeadlineExceeded&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distinction is useful for logging, metrics, and debugging.&lt;/p&gt;

&lt;p&gt;A timeout means the operation exceeded its budget.&lt;br&gt;
A cancellation may simply mean the client disconnected or the caller stopped needing the result.&lt;/p&gt;

&lt;p&gt;Those are not always the same kind of failure.&lt;/p&gt;


&lt;h2&gt;
  
  
  A Practical Rule I Use
&lt;/h2&gt;

&lt;p&gt;Here is the simple rule I use in production systems:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Pass context for cancellation, deadlines, and request-scoped metadata. Do not pass it for business data or dependencies.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That one sentence prevents many bad patterns.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Good&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;CreateOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;

&lt;span class="c"&gt;// Suspicious&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;CreateOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the second version, I immediately wonder:&lt;/p&gt;

&lt;p&gt;Where is the order coming from?&lt;br&gt;
Is it hidden inside context?&lt;br&gt;
Is this function depending on invisible data?&lt;/p&gt;

&lt;p&gt;That is usually a design smell.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Mastering &lt;code&gt;context&lt;/code&gt; means mastering your application’s control flow.&lt;/p&gt;

&lt;p&gt;In real backend systems, where servers handle thousands of concurrent requests and distributed services introduce unpredictable latency, using &lt;code&gt;context&lt;/code&gt; correctly can be the difference between a resilient service and a 3 AM pager alert for an out-of-memory crash.&lt;/p&gt;

&lt;p&gt;For me, &lt;code&gt;context&lt;/code&gt; is not just a Go package.&lt;/p&gt;

&lt;p&gt;It is a discipline.&lt;/p&gt;

&lt;p&gt;It forces you to think clearly about lifecycle, ownership, cancellation, timeouts, and the cost of work that no one needs anymore.&lt;/p&gt;

&lt;p&gt;Used correctly, it makes your systems more predictable.&lt;/p&gt;

&lt;p&gt;Used carelessly, it hides dependencies and creates architectural confusion.&lt;/p&gt;

&lt;p&gt;That is why I treat &lt;code&gt;context&lt;/code&gt; as one of the most important tools in production Go programming.&lt;/p&gt;

&lt;p&gt;If you have worked with Go in production, I would love to hear your own experience with context, cancellation, and goroutine leaks.&lt;/p&gt;

</description>
      <category>go</category>
      <category>backend</category>
      <category>microservices</category>
    </item>
    <item>
      <title>Memory Under the Hood: Why Go Often Feels Faster Than Python</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Mon, 25 May 2026 12:34:33 +0000</pubDate>
      <link>https://dev.to/amirsefati/memory-under-the-hood-why-go-often-feels-faster-than-python-3pb3</link>
      <guid>https://dev.to/amirsefati/memory-under-the-hood-why-go-often-feels-faster-than-python-3pb3</guid>
      <description>&lt;p&gt;After years of building backend systems, working with data pipelines, debugging production issues, and watching servers behave differently under load, I learned one lesson the hard way:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Performance problems rarely start from syntax.&lt;br&gt;&lt;br&gt;
They usually start from how we think about memory.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When I was earlier in my career, I used to compare languages mostly by developer experience. Python felt clean, expressive, and fast to write. Go felt simple, strict, and very practical for backend services. But once I started dealing with large files, high-throughput APIs, queues, workers, containers, and memory pressure in production, I realized that the real difference is not just language design.&lt;/p&gt;

&lt;p&gt;The real difference is what happens under the hood.&lt;/p&gt;

&lt;p&gt;Why can a small Python script suddenly consume a huge amount of RAM?&lt;/p&gt;

&lt;p&gt;Why can the same type of data processing in Go run with much lower memory usage?&lt;/p&gt;

&lt;p&gt;Why does iterating over a slice of structs in Go usually feel much cheaper than iterating over a list of Python objects?&lt;/p&gt;

&lt;p&gt;And why does reading a 10GB file incorrectly destroy both languages, even if one of them is “faster”?&lt;/p&gt;

&lt;p&gt;This article is my practical breakdown of memory layout, allocation, garbage collection, cache locality, and large-file processing in Go and Python. I am not writing this as a language war. I use both languages. Python is still one of my favorite tools for automation, scripting, data work, and fast iteration.&lt;/p&gt;

&lt;p&gt;But when you are building backend systems where memory, latency, and throughput matter, understanding these details can change how you write code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The First Misunderstanding: Dynamic Array Is Not Linked List
&lt;/h2&gt;

&lt;p&gt;One mistake I have seen many developers make is assuming that dynamic collections are somehow linked lists internally.&lt;/p&gt;

&lt;p&gt;They are not.&lt;/p&gt;

&lt;p&gt;A Python &lt;code&gt;list&lt;/code&gt; is not a linked list.&lt;/p&gt;

&lt;p&gt;A Go &lt;code&gt;slice&lt;/code&gt; is not a linked list.&lt;/p&gt;

&lt;p&gt;Both are built around the idea of a dynamic array, although their internal models are very different.&lt;/p&gt;

&lt;p&gt;In Python, a list is basically a resizable array of references. The list itself stores pointers to objects. Those objects live somewhere else in memory.&lt;/p&gt;

&lt;p&gt;In Go, a slice is a small header that points to an underlying array. That header contains three important pieces of information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;SliceHeader&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Data&lt;/span&gt; &lt;span class="kt"&gt;uintptr&lt;/span&gt;
    &lt;span class="n"&gt;Len&lt;/span&gt;  &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;Cap&lt;/span&gt;  &lt;span class="kt"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conceptually, a Go slice contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a pointer to the underlying array&lt;/li&gt;
&lt;li&gt;the current length&lt;/li&gt;
&lt;li&gt;the current capacity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you really need a linked list in Go, there is &lt;code&gt;container/list&lt;/code&gt;, or you can implement your own node-based structure. But for most backend workloads, linked lists are not as useful as people think. They often hurt cache locality and add pointer chasing overhead.&lt;/p&gt;

&lt;p&gt;That is an important point: Big-O complexity is not the whole story.&lt;/p&gt;

&lt;p&gt;In real production systems, CPU cache behavior matters a lot.&lt;/p&gt;




&lt;h2&gt;
  
  
  Python Lists: Simple API, Expensive Objects
&lt;/h2&gt;

&lt;p&gt;Python gives us a very clean programming model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It looks like a list of integers.&lt;/p&gt;

&lt;p&gt;But internally, it is not a compact array of raw integers like you might expect in C or Go.&lt;/p&gt;

&lt;p&gt;A Python list stores references to Python objects. Each integer is a full Python object with metadata, reference count information, type information, and value storage.&lt;/p&gt;

&lt;p&gt;So when you write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should not imagine this as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1][2][3]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is closer to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;list
 ├── pointer ──&amp;gt; PyObject(1)
 ├── pointer ──&amp;gt; PyObject(2)
 └── pointer ──&amp;gt; PyObject(3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This design gives Python a lot of flexibility. A single list can contain integers, strings, dictionaries, custom classes, and even functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;active&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That flexibility is powerful, but it has a cost.&lt;/p&gt;

&lt;p&gt;Every item access can involve pointer dereferencing. The CPU may need to jump to different memory locations. This causes more cache misses, and cache misses are expensive.&lt;/p&gt;

&lt;p&gt;Modern CPUs are extremely fast when they can read predictable, contiguous memory. They are much slower when they constantly chase pointers across the heap.&lt;/p&gt;

&lt;p&gt;This is one reason why pure Python loops over large collections can be slow compared to Go, C, Rust, or even NumPy-based Python code.&lt;/p&gt;

&lt;p&gt;NumPy is fast not because Python magically became faster, but because NumPy stores data in compact native arrays and runs optimized native code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Go Slices: Contiguous Memory and Better Cache Locality
&lt;/h2&gt;

&lt;p&gt;Now compare that with Go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A slice of integers in Go points to an underlying array of integers stored contiguously in memory.&lt;/p&gt;

&lt;p&gt;Conceptually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1][2][3][4][5]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is much friendlier for the CPU.&lt;/p&gt;

&lt;p&gt;The CPU can load a cache line and read multiple nearby values efficiently. This is called cache locality, and it is one of those low-level concepts that directly affects high-level backend performance.&lt;/p&gt;

&lt;p&gt;This becomes even more interesting with structs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ID&lt;/span&gt;     &lt;span class="kt"&gt;int64&lt;/span&gt;
    &lt;span class="n"&gt;Active&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;
    &lt;span class="n"&gt;Score&lt;/span&gt;  &lt;span class="kt"&gt;float64&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Active&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;91.2&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Active&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;72.5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, the actual &lt;code&gt;User&lt;/code&gt; values are stored in the underlying array.&lt;/p&gt;

&lt;p&gt;But if you write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Active&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;91.2&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Active&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;72.5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have a slice of pointers. This can be useful when you need shared mutable objects or want to avoid copying large structs, but it also means more pointer chasing.&lt;/p&gt;

&lt;p&gt;This is why I try to be intentional with this decision.&lt;/p&gt;

&lt;p&gt;A slice of values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;is not the same performance model as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both are valid. But they are not the same.&lt;/p&gt;

&lt;p&gt;In production systems, these small decisions start to matter when you process hundreds of thousands or millions of records.&lt;/p&gt;




&lt;h2&gt;
  
  
  Value Semantics vs Reference Semantics
&lt;/h2&gt;

&lt;p&gt;Another major difference between Go and Python is how they treat values.&lt;/p&gt;

&lt;p&gt;Python is reference-oriented. Variables are names bound to objects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# [1, 2, 3, 4]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt; refer to the same list object.&lt;/p&gt;

&lt;p&gt;Go has stronger value semantics by default.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;

&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;99&lt;/span&gt;

&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// [1 2 3]&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// [99 2 3]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The array is copied.&lt;/p&gt;

&lt;p&gt;But slices are different:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;

&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;99&lt;/span&gt;

&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// [99 2 3]&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// [99 2 3]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because the slice header is copied, but both slice headers still point to the same underlying array.&lt;/p&gt;

&lt;p&gt;This is one of the most important things every Go developer should deeply understand.&lt;/p&gt;

&lt;p&gt;A slice is not the array itself. It is a descriptor over an array.&lt;/p&gt;

&lt;p&gt;That is why bugs can happen when you pass slices around and mutate them without thinking about who else shares the same underlying array.&lt;/p&gt;




&lt;h2&gt;
  
  
  Allocation: The Hidden Cost Behind Simple Code
&lt;/h2&gt;

&lt;p&gt;Allocation is one of the biggest silent performance costs in backend systems.&lt;/p&gt;

&lt;p&gt;When code allocates too much memory, the garbage collector has more work to do. More GC work means more CPU overhead and sometimes more latency.&lt;/p&gt;

&lt;p&gt;In Go, when a slice grows beyond its capacity, Go allocates a new underlying array and copies the existing elements into it.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works, but the slice may grow multiple times.&lt;/p&gt;

&lt;p&gt;A better version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now Go knows the expected capacity from the beginning.&lt;/p&gt;

&lt;p&gt;This does not mean you should always preallocate everything. But when you know the approximate size, preallocation is one of the simplest performance wins.&lt;/p&gt;

&lt;p&gt;The same idea exists in many systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;preallocate buffers&lt;/li&gt;
&lt;li&gt;reuse memory when safe&lt;/li&gt;
&lt;li&gt;avoid unnecessary temporary objects&lt;/li&gt;
&lt;li&gt;avoid converting &lt;code&gt;[]byte&lt;/code&gt; to &lt;code&gt;string&lt;/code&gt; too early&lt;/li&gt;
&lt;li&gt;avoid building huge in-memory arrays when streaming is enough&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Performance engineering is often not about writing complicated code.&lt;/p&gt;

&lt;p&gt;It is about not forcing the runtime to clean up avoidable garbage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Python Allocation and Object Overhead
&lt;/h2&gt;

&lt;p&gt;Python has its own memory manager. For small objects, CPython uses specialized allocation strategies to make object creation faster.&lt;/p&gt;

&lt;p&gt;But Python still pays the cost of object-heavy design.&lt;/p&gt;

&lt;p&gt;A Python integer is not just a raw machine integer. A Python string is an object. A Python dict is very flexible, but it is not cheap. A Python class instance has overhead too, unless you optimize with tools like &lt;code&gt;__slots__&lt;/code&gt; or use more compact structures.&lt;/p&gt;

&lt;p&gt;This is why Python can be very fast when the heavy work is pushed into optimized native libraries, but slower when the workload is pure Python object processing.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;huge_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;huge_list&lt;/code&gt; is a normal Python list of Python integers, every iteration involves Python-level object handling.&lt;/p&gt;

&lt;p&gt;But with NumPy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;huge_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The expensive loop is moved into optimized native code.&lt;/p&gt;

&lt;p&gt;This is not just a library trick. It is a memory-layout trick.&lt;/p&gt;

&lt;p&gt;Compact memory layout changes everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  Garbage Collection: Python vs Go
&lt;/h2&gt;

&lt;p&gt;Garbage collection is another area where the difference matters.&lt;/p&gt;

&lt;p&gt;Python, specifically CPython, mainly uses reference counting. Every object tracks how many references point to it. When the reference count reaches zero, the object can be freed immediately.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;

&lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After both references are gone, the list can be cleaned up.&lt;/p&gt;

&lt;p&gt;But reference counting alone cannot handle reference cycles.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now &lt;code&gt;a&lt;/code&gt; references &lt;code&gt;b&lt;/code&gt;, and &lt;code&gt;b&lt;/code&gt; references &lt;code&gt;a&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Even if nothing else references them, their reference counts may not reach zero naturally. That is why Python also has a cyclic garbage collector.&lt;/p&gt;

&lt;p&gt;Go uses a concurrent mark-and-sweep garbage collector.&lt;/p&gt;

&lt;p&gt;At a high level, Go's GC finds reachable objects, marks them as live, and then frees unreachable memory. It is designed to keep pause times low and run concurrently with the application as much as possible.&lt;/p&gt;

&lt;p&gt;This is one of the reasons Go works well for backend services. You can build long-running APIs, workers, and network services with predictable latency when you write allocation-conscious code.&lt;/p&gt;

&lt;p&gt;But Go's GC is not magic.&lt;/p&gt;

&lt;p&gt;If your code allocates too much, creates too many temporary objects, or keeps references alive longer than needed, the GC still has to work harder.&lt;/p&gt;

&lt;p&gt;In Go, good performance often comes from writing boring code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reuse buffers when appropriate.&lt;/p&gt;

&lt;p&gt;Avoid unnecessary conversions.&lt;/p&gt;

&lt;p&gt;Keep data structures simple.&lt;/p&gt;

&lt;p&gt;Do not create object graphs when arrays or slices are enough.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 10GB File Problem
&lt;/h2&gt;

&lt;p&gt;One of the most common backend mistakes is reading a large file into memory.&lt;/p&gt;

&lt;p&gt;In Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;large.log&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"large.log"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This may work for small files.&lt;/p&gt;

&lt;p&gt;It may work for 100MB.&lt;/p&gt;

&lt;p&gt;It may even work in development.&lt;/p&gt;

&lt;p&gt;Then production gets a 10GB file, and the process gets killed by the operating system.&lt;/p&gt;

&lt;p&gt;The problem is not Python or Go.&lt;/p&gt;

&lt;p&gt;The problem is the strategy.&lt;/p&gt;

&lt;p&gt;If the file is large, you should usually stream it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Streaming Files in Python
&lt;/h2&gt;

&lt;p&gt;Python gives you a very clean way to process files line by line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;large.log&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does not load the whole file into memory.&lt;/p&gt;

&lt;p&gt;For CSV files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;large.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For huge CSV files with Pandas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;large.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For large JSON files, avoid loading everything with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;if the file is too big.&lt;/p&gt;

&lt;p&gt;For stream-style JSON parsing, libraries like &lt;code&gt;ijson&lt;/code&gt; can process JSON incrementally.&lt;/p&gt;

&lt;p&gt;Another powerful pattern in Python is generators:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_lines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;read_lines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;large.log&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The benefit is simple: only a small part of the data exists in memory at a time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Streaming Files in Go
&lt;/h2&gt;

&lt;p&gt;Go is extremely practical for this kind of workload.&lt;/p&gt;

&lt;p&gt;For line-by-line processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"bufio"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"large.log"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;scanner&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewScanner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Scan&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is simple and memory efficient.&lt;/p&gt;

&lt;p&gt;But there is one important detail: &lt;code&gt;bufio.Scanner&lt;/code&gt; has a default token size limit. For very long lines, you should increase the buffer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;scanner&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewScanner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more control, I often prefer &lt;code&gt;bufio.Reader&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewReaderSize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'\n'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For JSON streaming, avoid reading the full file and unmarshalling everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;
&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unmarshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For large files, use a decoder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;decoder&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewDecoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;decoder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;More&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="n"&gt;Item&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;decoder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Depending on the JSON structure, you may need to manually read tokens using &lt;code&gt;Token()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The main point is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not make memory responsible for holding data that can be streamed.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Go Can Be Much Faster in File Processing
&lt;/h2&gt;

&lt;p&gt;In one of my own experiments, I processed and filtered a large CSV log file using both Python and Go.&lt;/p&gt;

&lt;p&gt;The Python version used a generator and &lt;code&gt;csv.reader&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The Go version used buffered I/O and goroutines for parallel processing.&lt;/p&gt;

&lt;p&gt;The difference was significant.&lt;/p&gt;

&lt;p&gt;Python was clean and memory efficient, but slower because every row and cell still became Python-level objects. Go was faster because I could process bytes more directly, reduce allocations, and use multiple CPU cores more naturally.&lt;/p&gt;

&lt;p&gt;The exact numbers always depend on the machine, disk, file format, parsing logic, and implementation. But the pattern is very common:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python is excellent for developer speed.&lt;/li&gt;
&lt;li&gt;Go is excellent for predictable resource usage and backend throughput.&lt;/li&gt;
&lt;li&gt;Python becomes much faster when heavy work is moved into native libraries.&lt;/li&gt;
&lt;li&gt;Go performs very well when memory layout and allocations are controlled.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not about saying one language is always better.&lt;/p&gt;

&lt;p&gt;It is about knowing where each language shines.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Practical Go Pattern: Worker Pool for Large Files
&lt;/h2&gt;

&lt;p&gt;When processing huge files, I usually avoid creating one goroutine per line. That sounds concurrent, but it can destroy memory and scheduling performance.&lt;/p&gt;

&lt;p&gt;Instead, I prefer a bounded worker pool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"bufio"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
    &lt;span class="s"&gt;"sync"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"large.log"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;
    &lt;span class="n"&gt;workerCount&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;workerCount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;scanner&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;bufio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewScanner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Scan&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// parse, filter, transform, send to DB, etc.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The channel is bounded.&lt;/p&gt;

&lt;p&gt;That means the reader cannot infinitely push data into memory. If workers are slower than the reader, backpressure naturally happens.&lt;/p&gt;

&lt;p&gt;This is one of the most important backend engineering concepts:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Fast producers must not be allowed to destroy slow consumers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Whether you are reading files, consuming Kafka messages, processing HTTP requests, or sending jobs to workers, you need backpressure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Avoiding Unnecessary String Allocation in Go
&lt;/h2&gt;

&lt;p&gt;One hidden cost in Go file processing is converting bytes to strings too early.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns a string, which may allocate.&lt;/p&gt;

&lt;p&gt;If you need better performance and can safely process bytes, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But be careful: the byte slice returned by &lt;code&gt;scanner.Bytes()&lt;/code&gt; is only valid until the next scan. If you need to keep it, you must copy it.&lt;/p&gt;

&lt;p&gt;This is a classic Go tradeoff.&lt;/p&gt;

&lt;p&gt;You can reduce allocations, but you must understand ownership and lifetime.&lt;/p&gt;

&lt;p&gt;That is why I always say Go is simple, but not shallow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Escape Analysis: The Invisible Performance Tool
&lt;/h2&gt;

&lt;p&gt;One of the most powerful Go concepts is escape analysis.&lt;/p&gt;

&lt;p&gt;Go decides whether a variable can stay on the stack or must move to the heap.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Amir"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;user&lt;/code&gt; escapes because we return its address. It cannot live only on the stack.&lt;/p&gt;

&lt;p&gt;You can inspect escape decisions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go build &lt;span class="nt"&gt;-gcflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"-m"&lt;/span&gt; ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user escapes to heap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does not automatically mean the code is bad. Sometimes heap allocation is necessary.&lt;/p&gt;

&lt;p&gt;But when you are optimizing hot paths, escape analysis helps you understand why your GC pressure is high.&lt;/p&gt;

&lt;p&gt;In backend systems, this matters in places like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request parsing&lt;/li&gt;
&lt;li&gt;JSON encoding/decoding&lt;/li&gt;
&lt;li&gt;logging&lt;/li&gt;
&lt;li&gt;metrics&lt;/li&gt;
&lt;li&gt;queue consumers&lt;/li&gt;
&lt;li&gt;data transformation pipelines&lt;/li&gt;
&lt;li&gt;tight loops&lt;/li&gt;
&lt;li&gt;large batch processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a function runs millions of times, a small allocation becomes a production problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Memory Is Not Just a Language Problem
&lt;/h2&gt;

&lt;p&gt;A senior engineer should not only ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which language is faster?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A better question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What memory model does this workload need?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;If I am building an internal script, Python is probably perfect.&lt;/p&gt;

&lt;p&gt;If I am writing data analysis code, Python with Pandas, Polars, DuckDB, or NumPy can be excellent.&lt;/p&gt;

&lt;p&gt;If I am building a high-throughput API service with predictable latency, Go is often a strong choice.&lt;/p&gt;

&lt;p&gt;If I am processing huge files and need simple deployment, Go gives me a very practical balance.&lt;/p&gt;

&lt;p&gt;If I am building machine learning workflows, Python still has the ecosystem advantage.&lt;/p&gt;

&lt;p&gt;The language is only one layer.&lt;/p&gt;

&lt;p&gt;The real engineering decision is about workload, memory behavior, concurrency model, ecosystem, deployment, and operational cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Rules I Use in Production
&lt;/h2&gt;

&lt;p&gt;Here are some rules I personally follow when working with memory-heavy backend systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Never load a huge file fully unless you really need to
&lt;/h3&gt;

&lt;p&gt;Stream it.&lt;/p&gt;

&lt;p&gt;Line by line.&lt;/p&gt;

&lt;p&gt;Chunk by chunk.&lt;/p&gt;

&lt;p&gt;Token by token.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Preallocate when you know the size
&lt;/h3&gt;

&lt;p&gt;In Go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expectedSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This can reduce allocations and copying.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Be careful with slices sharing the same underlying array
&lt;/h3&gt;

&lt;p&gt;This can create subtle bugs and unexpected memory retention.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;small&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;big&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;big&lt;/code&gt; is huge, &lt;code&gt;small&lt;/code&gt; may keep the whole underlying array alive.&lt;/p&gt;

&lt;p&gt;If you only need the small part, copy it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;smallCopy&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;big&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Avoid unnecessary pointer-heavy structures
&lt;/h3&gt;

&lt;p&gt;A slice of pointers is not always better.&lt;/p&gt;

&lt;p&gt;Sometimes a slice of values is faster and simpler.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Measure before and after optimization
&lt;/h3&gt;

&lt;p&gt;Use tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-bench&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-benchmem&lt;/span&gt;
go tool pprof
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; cProfile app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also check memory profiling tools when memory matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Design with backpressure
&lt;/h3&gt;

&lt;p&gt;Bound your queues.&lt;/p&gt;

&lt;p&gt;Limit your workers.&lt;/p&gt;

&lt;p&gt;Do not let fast input destroy your service.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Optimize hot paths, not everything
&lt;/h3&gt;

&lt;p&gt;Readable code still matters.&lt;/p&gt;

&lt;p&gt;Do not turn the whole codebase into a low-level memory puzzle.&lt;/p&gt;

&lt;p&gt;Find the hot path, measure it, then optimize it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The deeper I go into backend engineering, the more I respect memory.&lt;/p&gt;

&lt;p&gt;Not because every developer needs to become a systems programmer, but because every production system eventually becomes a resource management problem.&lt;/p&gt;

&lt;p&gt;CPU is limited.&lt;/p&gt;

&lt;p&gt;Memory is limited.&lt;/p&gt;

&lt;p&gt;Disk I/O is limited.&lt;/p&gt;

&lt;p&gt;Network bandwidth is limited.&lt;/p&gt;

&lt;p&gt;The job of a backend engineer is not just writing business logic. It is building software that behaves well under pressure.&lt;/p&gt;

&lt;p&gt;Python gives us speed of development and a massive ecosystem.&lt;/p&gt;

&lt;p&gt;Go gives us simplicity, strong concurrency, predictable deployment, and great performance for many backend workloads.&lt;/p&gt;

&lt;p&gt;Both are useful.&lt;/p&gt;

&lt;p&gt;But if you want to build scalable backend systems, process large files, reduce memory pressure, and understand why your service behaves the way it does, you need to look under the hood.&lt;/p&gt;

&lt;p&gt;My personal rule is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Write clean code first.&lt;br&gt;&lt;br&gt;
Understand memory second.&lt;br&gt;&lt;br&gt;
Measure everything.&lt;br&gt;&lt;br&gt;
Then optimize only what matters.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That mindset has helped me more than any single framework, library, or language feature.&lt;/p&gt;




&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;Have you ever had a service crash because it loaded too much data into memory?&lt;/p&gt;

&lt;p&gt;Or have you used Go escape analysis, pprof, or Python profiling tools to debug a real production issue?&lt;/p&gt;

&lt;p&gt;I would love to hear your experience.&lt;/p&gt;

</description>
      <category>go</category>
      <category>python</category>
      <category>performance</category>
      <category>backend</category>
    </item>
    <item>
      <title>The Silent Killers of Go Concurrency: Mutexes, Semaphores, and Goroutine Leaks</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Sun, 24 May 2026 17:15:55 +0000</pubDate>
      <link>https://dev.to/amirsefati/the-silent-killers-of-go-concurrency-mutexes-semaphores-and-goroutine-leaks-177i</link>
      <guid>https://dev.to/amirsefati/the-silent-killers-of-go-concurrency-mutexes-semaphores-and-goroutine-leaks-177i</guid>
      <description>&lt;p&gt;Go makes concurrency look simple.&lt;/p&gt;

&lt;p&gt;You write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// do something concurrently&lt;/span&gt;
&lt;span class="p"&gt;}()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And suddenly your code is running in another goroutine.&lt;/p&gt;

&lt;p&gt;That simplicity is one of the reasons I like Go so much. But after working on backend systems, notification pipelines, high-traffic APIs, and production services under real load, I learned something important:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most concurrency problems in Go do not come from not using concurrency.&lt;br&gt;&lt;br&gt;
They come from using concurrency without understanding where the bottleneck actually is.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Sometimes the issue is a missing lock.&lt;/p&gt;

&lt;p&gt;But very often, especially in production Go services, the issue is the opposite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;too much locking&lt;/li&gt;
&lt;li&gt;locks held for too long&lt;/li&gt;
&lt;li&gt;network I/O inside critical sections&lt;/li&gt;
&lt;li&gt;goroutines that never exit&lt;/li&gt;
&lt;li&gt;unbounded goroutine creation&lt;/li&gt;
&lt;li&gt;WaitGroups copied by value&lt;/li&gt;
&lt;li&gt;channels used without a cancellation strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, I want to walk through the concurrency problems I have seen in real systems, how I reason about mutexes and semaphores, and how I usually debug these issues before they become production incidents.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem: Concurrency That Accidentally Becomes Sequential
&lt;/h2&gt;

&lt;p&gt;A service can look concurrent from the outside and still behave like a single-threaded application internally.&lt;/p&gt;

&lt;p&gt;This usually happens when a large part of the request flow is hidden behind one shared lock.&lt;/p&gt;

&lt;p&gt;A pattern like this is more common than many developers admit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Test User"&lt;/span&gt;
&lt;span class="n"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;callDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first glance, it may look safe.&lt;/p&gt;

&lt;p&gt;The developer wanted to protect shared state. That part is reasonable. But the lock is now protecting much more than shared memory. It is protecting the entire flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;update a field&lt;/li&gt;
&lt;li&gt;send an email&lt;/li&gt;
&lt;li&gt;call the database&lt;/li&gt;
&lt;li&gt;maybe wait on network I/O&lt;/li&gt;
&lt;li&gt;maybe retry&lt;/li&gt;
&lt;li&gt;maybe block other goroutines for a long time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is not just a mutex anymore.&lt;/p&gt;

&lt;p&gt;That is a traffic jam.&lt;/p&gt;

&lt;p&gt;Every goroutine that needs the same lock must wait until the whole flow finishes. So even if your service has hundreds or thousands of goroutines, a big part of the system becomes sequential.&lt;/p&gt;

&lt;p&gt;The dangerous part is that CPU usage may still look normal or even low. Memory may also look fine. But latency increases, throughput drops, and p95/p99 response times become unstable.&lt;/p&gt;

&lt;p&gt;This is why lock contention is sometimes difficult to notice from basic infrastructure metrics alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Production-Style Example: Email Inside a Mutex
&lt;/h2&gt;

&lt;p&gt;Imagine we have a service that updates user state and sends notifications.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mu&lt;/span&gt;    &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mutex&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ProcessUsers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"processed"&lt;/span&gt;
        &lt;span class="n"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// slow network I/O inside the lock&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code is safe from a data race perspective.&lt;/p&gt;

&lt;p&gt;But it is dangerous from a performance perspective.&lt;/p&gt;

&lt;p&gt;A mutex should protect the smallest possible shared memory operation. It should not protect slow external work like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sending email&lt;/li&gt;
&lt;li&gt;calling another microservice&lt;/li&gt;
&lt;li&gt;database queries&lt;/li&gt;
&lt;li&gt;HTTP requests&lt;/li&gt;
&lt;li&gt;file uploads&lt;/li&gt;
&lt;li&gt;logging to a slow external sink&lt;/li&gt;
&lt;li&gt;waiting on a third-party API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The memory update may take nanoseconds or microseconds. The email call may take milliseconds or seconds.&lt;/p&gt;

&lt;p&gt;That difference matters.&lt;/p&gt;

&lt;p&gt;If the lock is held while &lt;code&gt;sendEmail&lt;/code&gt; runs, every other goroutine that needs &lt;code&gt;s.mu&lt;/code&gt; is blocked behind a network call.&lt;/p&gt;

&lt;p&gt;A better version separates shared-state mutation from slow work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ProcessUsers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;emails&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"processed"&lt;/span&gt;
        &lt;span class="n"&gt;emails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emails&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;emails&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is already better because the lock only protects the shared map.&lt;/p&gt;

&lt;p&gt;But in a real production system, I usually prefer pushing the slow work to a queue or bounded worker pool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ProcessUsers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="k"&gt;chan&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;EmailJob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"processed"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;EmailJob&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;UserID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the request path does not directly depend on the email provider latency.&lt;/p&gt;

&lt;p&gt;That is the real fix.&lt;/p&gt;

&lt;p&gt;Not just “use goroutines.”&lt;/p&gt;

&lt;p&gt;The fix is designing the boundary between shared memory, external I/O, and backpressure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mutexes Are Not Bad. Large Critical Sections Are Bad.
&lt;/h2&gt;

&lt;p&gt;I sometimes see developers become afraid of mutexes.&lt;/p&gt;

&lt;p&gt;That is the wrong lesson.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sync.Mutex&lt;/code&gt; is simple, fast, and perfectly fine when used correctly. The problem is not the mutex. The problem is the size of the critical section.&lt;/p&gt;

&lt;p&gt;This is what I try to keep in mind:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c"&gt;// only touch shared memory here&lt;/span&gt;
&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c"&gt;// shared memory&lt;/span&gt;
&lt;span class="c"&gt;// database call&lt;/span&gt;
&lt;span class="c"&gt;// HTTP call&lt;/span&gt;
&lt;span class="c"&gt;// email call&lt;/span&gt;
&lt;span class="c"&gt;// JSON encoding&lt;/span&gt;
&lt;span class="c"&gt;// logging&lt;/span&gt;
&lt;span class="c"&gt;// metrics push&lt;/span&gt;
&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A good critical section should be boring.&lt;/p&gt;

&lt;p&gt;It should usually do one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read shared state&lt;/li&gt;
&lt;li&gt;update shared state&lt;/li&gt;
&lt;li&gt;copy shared state into a local variable&lt;/li&gt;
&lt;li&gt;swap a pointer&lt;/li&gt;
&lt;li&gt;increment a counter&lt;/li&gt;
&lt;li&gt;append to a protected slice/map&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then unlock.&lt;/p&gt;

&lt;p&gt;After that, do the expensive work outside the lock.&lt;/p&gt;




&lt;h2&gt;
  
  
  Under the Hood: What a Mutex Gives You
&lt;/h2&gt;

&lt;p&gt;At a high level, a mutex gives you mutual exclusion: only one goroutine can enter a protected section at a time.&lt;/p&gt;

&lt;p&gt;But it also gives you memory ordering guarantees.&lt;/p&gt;

&lt;p&gt;In Go's memory model, an unlock operation synchronizes before a later lock operation on the same mutex. In practical terms, that means if one goroutine updates shared data and unlocks, another goroutine that later locks the same mutex can safely observe that update.&lt;/p&gt;

&lt;p&gt;That is the part many developers forget.&lt;/p&gt;

&lt;p&gt;A mutex is not just about “blocking other goroutines.” It is also about creating a safe visibility boundary between goroutines.&lt;/p&gt;

&lt;p&gt;Without that boundary, different goroutines may read and write the same memory at the same time, and now you have a data race. Once you have a data race, your program is no longer something you can reason about confidently.&lt;/p&gt;

&lt;p&gt;This is why I do not like “clever” lock-free code unless there is a very strong reason for it.&lt;/p&gt;

&lt;p&gt;Most backend services do not need clever concurrency.&lt;/p&gt;

&lt;p&gt;They need clear concurrency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Semaphore: Controlling Capacity, Not Ownership
&lt;/h2&gt;

&lt;p&gt;A mutex is usually about ownership of shared memory.&lt;/p&gt;

&lt;p&gt;A semaphore is about capacity.&lt;/p&gt;

&lt;p&gt;For example, suppose you want to process 10,000 users, but you do not want to send 10,000 emails at the same time.&lt;/p&gt;

&lt;p&gt;A naive version might do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is dangerous because it creates unbounded concurrency.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;users&lt;/code&gt; has 10,000 items, you create 10,000 goroutines. If each goroutine performs network I/O, opens connections, allocates memory, and waits on an external provider, you can overload your own service before you overload the email provider.&lt;/p&gt;

&lt;p&gt;A simple semaphore pattern fixes this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;sem&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// allow only 20 concurrent email sends&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;

    &lt;span class="n"&gt;sem&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}{}&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;sem&lt;/span&gt; &lt;span class="p"&gt;}()&lt;/span&gt;

        &lt;span class="n"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the code still uses concurrency, but concurrency is bounded.&lt;/p&gt;

&lt;p&gt;That one detail is huge in production.&lt;/p&gt;

&lt;p&gt;Unbounded concurrency is not scalability.&lt;/p&gt;

&lt;p&gt;It is delayed failure.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Better Worker Pool for Production Code
&lt;/h2&gt;

&lt;p&gt;The semaphore pattern is useful, but for services that run continuously, I often prefer a worker pool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;EmailJob&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;UserID&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;Email&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;startEmailWorkers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workerCount&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;EmailJob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;workerCount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workerID&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt;

                &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="k"&gt;return&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;

                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sendEmailJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="c"&gt;// In real systems: log, retry, dead-letter, or expose metrics.&lt;/span&gt;
                        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"worker=%d failed to send email user_id=%d err=%v&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workerID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UserID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you much better operational control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fixed concurrency&lt;/li&gt;
&lt;li&gt;easier metrics&lt;/li&gt;
&lt;li&gt;easier shutdown&lt;/li&gt;
&lt;li&gt;easier retry strategy&lt;/li&gt;
&lt;li&gt;easier backpressure&lt;/li&gt;
&lt;li&gt;easier rate limiting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the difference between “I used goroutines” and “I designed a concurrent system.”&lt;/p&gt;




&lt;h2&gt;
  
  
  Goroutine Leak: The Bug That Does Not Explode Immediately
&lt;/h2&gt;

&lt;p&gt;Goroutine leaks are one of the most common production problems in Go.&lt;/p&gt;

&lt;p&gt;They are dangerous because the service may not crash immediately. It may slowly become worse over hours or days.&lt;/p&gt;

&lt;p&gt;Here is a classic example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;heavyComputation&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"timeout"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem is subtle.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ch&lt;/code&gt; is unbuffered.&lt;/p&gt;

&lt;p&gt;If the timeout happens first, &lt;code&gt;process&lt;/code&gt; returns. After that, there is no receiver waiting on &lt;code&gt;ch&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;heavyComputation()&lt;/code&gt; finishes, the goroutine tries to send into &lt;code&gt;ch&lt;/code&gt; and blocks forever.&lt;/p&gt;

&lt;p&gt;That goroutine is now leaked.&lt;/p&gt;

&lt;p&gt;One leaked goroutine may not matter.&lt;/p&gt;

&lt;p&gt;Thousands of leaked goroutines matter.&lt;/p&gt;

&lt;p&gt;A safer version uses a buffered channel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;heavyComputation&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"timeout"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents the goroutine from blocking on send after the timeout.&lt;/p&gt;

&lt;p&gt;But in real services, I prefer context-based cancellation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;heavyComputation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important lesson:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Every goroutine needs an exit path.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you cannot explain how a goroutine stops, you probably have a leak waiting to happen.&lt;/p&gt;




&lt;h2&gt;
  
  
  WaitGroup by Value: A Small Mistake With a Big Impact
&lt;/h2&gt;

&lt;p&gt;This mistake is very easy to miss in code review:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;// wrong: copied by value&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// do work&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;sync.WaitGroup&lt;/code&gt; must not be copied after first use.&lt;/p&gt;

&lt;p&gt;When you pass it by value, you copy its internal state. The worker calls &lt;code&gt;Done()&lt;/code&gt; on the copy, not on the original WaitGroup that the main goroutine is waiting on.&lt;/p&gt;

&lt;p&gt;That can cause a deadlock.&lt;/p&gt;

&lt;p&gt;Correct version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// do work&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This rule also applies to other synchronization primitives like &lt;code&gt;sync.Mutex&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Do not copy them after first use.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Loop Variable Trap
&lt;/h2&gt;

&lt;p&gt;This used to be one of the most famous Go concurrency bugs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Depending on the Go version and context, capturing loop variables incorrectly could lead to goroutines using the wrong value.&lt;/p&gt;

&lt;p&gt;The defensive pattern is still simple and clear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even with improvements in newer Go versions, I still like this style in production code because it makes the ownership of the variable obvious to the reader.&lt;/p&gt;

&lt;p&gt;Readable concurrency is maintainable concurrency.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Debug Lock Contention in Go
&lt;/h2&gt;

&lt;p&gt;When I suspect a concurrency bottleneck, I do not start by guessing.&lt;/p&gt;

&lt;p&gt;I start by measuring.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Enable pprof
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="s"&gt;"net/http/pprof"&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"localhost:6060"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="c"&gt;// start application&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then collect profiles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go tool pprof http://localhost:6060/debug/pprof/profile?seconds&lt;span class="o"&gt;=&lt;/span&gt;30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For mutex contention, enable mutex profiling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetMutexProfileFraction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then inspect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go tool pprof http://localhost:6060/debug/pprof/mutex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Check goroutine count
&lt;/h3&gt;

&lt;p&gt;A rising goroutine count is often a signal of blocked goroutines or leaks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"goroutines:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NumGoroutine&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, expose it as a metric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;prometheus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewGaugeFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prometheus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GaugeOpts&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"go_goroutines_current"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Help&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Current number of goroutines."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NumGoroutine&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Dump goroutine stacks
&lt;/h3&gt;

&lt;p&gt;When the service is stuck, goroutine dumps are gold.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:6060/debug/pprof/goroutine?debug&lt;span class="o"&gt;=&lt;/span&gt;2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for many goroutines blocked on the same line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sync.(*Mutex).Lock
chan send
chan receive
net/http.(*Transport).RoundTrip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If 5,000 goroutines are blocked on the same lock or channel, you found your bottleneck.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Use the race detector in tests
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-race&lt;/span&gt; ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The race detector is not free, and you usually do not run it in production, but it is extremely useful in CI and local debugging.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Practical Rules for Production Go Concurrency
&lt;/h2&gt;

&lt;p&gt;These are the rules I try to follow when writing or reviewing concurrent Go code:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Keep locks small
&lt;/h3&gt;

&lt;p&gt;Lock only the data that needs protection.&lt;/p&gt;

&lt;p&gt;Do not lock the whole request lifecycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Never put slow I/O inside a mutex
&lt;/h3&gt;

&lt;p&gt;Avoid database calls, HTTP calls, email sending, file uploads, and third-party API calls inside critical sections.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Bound concurrency
&lt;/h3&gt;

&lt;p&gt;Do not create unlimited goroutines.&lt;/p&gt;

&lt;p&gt;Use worker pools, semaphores, queues, or rate limiters.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Every goroutine needs a shutdown path
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;context.Context&lt;/code&gt;, channel close, or explicit cancellation.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Do not copy synchronization primitives
&lt;/h3&gt;

&lt;p&gt;Pass &lt;code&gt;*sync.WaitGroup&lt;/code&gt;, &lt;code&gt;*sync.Mutex&lt;/code&gt;, and similar primitives by pointer when sharing them.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Measure before optimizing
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;pprof&lt;/code&gt;, runtime metrics, traces, logs, and goroutine dumps.&lt;/p&gt;

&lt;p&gt;Guessing is not debugging.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Prefer boring concurrency
&lt;/h3&gt;

&lt;p&gt;The best concurrent code is usually not clever.&lt;/p&gt;

&lt;p&gt;It is clear, measurable, and easy to shut down.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Go gives us powerful concurrency tools, but it does not automatically give us good concurrent design.&lt;/p&gt;

&lt;p&gt;A goroutine is cheap, but it is not free.&lt;/p&gt;

&lt;p&gt;A mutex is fast, but it can destroy throughput if you hold it around slow work.&lt;/p&gt;

&lt;p&gt;A channel is elegant, but it can leak goroutines if nobody is receiving.&lt;/p&gt;

&lt;p&gt;A WaitGroup is simple, but copying it can break your entire flow.&lt;/p&gt;

&lt;p&gt;For me, senior Go engineering is not about using every concurrency primitive. It is about knowing when not to use them, where the real boundary is, and how the system behaves under load.&lt;/p&gt;

&lt;p&gt;The next time you write this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ask one question before moving on:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What exactly am I protecting, and how fast can I release this lock?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That one question can save your service from a silent production bottleneck.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Go Memory Model: &lt;a href="https://go.dev/ref/mem" rel="noopener noreferrer"&gt;https://go.dev/ref/mem&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;sync&lt;/code&gt; package documentation: &lt;a href="https://pkg.go.dev/sync" rel="noopener noreferrer"&gt;https://pkg.go.dev/sync&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go diagnostics and profiling tools: &lt;a href="https://go.dev/doc/diagnostics" rel="noopener noreferrer"&gt;https://go.dev/doc/diagnostics&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go blog: Go scheduler and runtime notes: &lt;a href="https://go.dev/blog/" rel="noopener noreferrer"&gt;https://go.dev/blog/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>backend</category>
      <category>concurrency</category>
      <category>performance</category>
    </item>
    <item>
      <title>Beyond the Hype: My Production Playbook for Docker Swarm</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Sat, 23 May 2026 14:22:13 +0000</pubDate>
      <link>https://dev.to/amirsefati/beyond-the-hype-my-production-playbook-for-docker-swarm-54d8</link>
      <guid>https://dev.to/amirsefati/beyond-the-hype-my-production-playbook-for-docker-swarm-54d8</guid>
      <description>&lt;p&gt;Every time container orchestration comes up, the conversation almost immediately turns into Kubernetes.&lt;/p&gt;

&lt;p&gt;And I understand why.&lt;/p&gt;

&lt;p&gt;Kubernetes is powerful. It has a huge ecosystem, strong abstractions, custom resources, operators, service meshes, admission controllers, and almost unlimited extensibility. For large organizations running complex multi-tenant platforms, Kubernetes often makes sense.&lt;/p&gt;

&lt;p&gt;But after years of working with Linux infrastructure, backend systems, private and public cloud environments, CI/CD pipelines, monitoring stacks, and production deployments, I learned something that is easy to forget:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The best infrastructure is not always the most powerful one. It is the one your team can operate safely under pressure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is where Docker Swarm still deserves respect.&lt;/p&gt;

&lt;p&gt;I do not see Docker Swarm as a toy, a legacy fallback, or something only suitable for small demos. I see it as a pragmatic orchestration layer for teams that want production-grade container deployment without turning the orchestrator itself into the main project.&lt;/p&gt;

&lt;p&gt;In this article, I want to share how I think about Docker Swarm from a senior engineering perspective: not as a beginner tutorial, but as a practical playbook for architecture, security, deployment, monitoring, and real production trade-offs.&lt;/p&gt;

&lt;p&gt;No &lt;code&gt;hello-world&lt;/code&gt; examples.&lt;br&gt;
No hype.&lt;br&gt;
Just the things that matter when systems are running at 3 AM and someone has to debug them.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why I Still Take Docker Swarm Seriously
&lt;/h2&gt;

&lt;p&gt;The biggest advantage of Docker Swarm is not that it has more features than Kubernetes.&lt;/p&gt;

&lt;p&gt;It does not.&lt;/p&gt;

&lt;p&gt;The advantage is that it gives you enough orchestration with much less operational complexity.&lt;/p&gt;

&lt;p&gt;If your team already understands Docker and &lt;code&gt;docker-compose&lt;/code&gt;, Swarm feels natural. You can move from a single-machine Compose setup to a multi-node cluster without completely changing the mental model.&lt;/p&gt;

&lt;p&gt;That matters.&lt;/p&gt;

&lt;p&gt;In production, cognitive load is a real cost. Every abstraction you introduce must be learned, documented, monitored, upgraded, secured, and debugged. A more powerful platform can easily become a liability if the team cannot operate it confidently.&lt;/p&gt;

&lt;p&gt;For many real-world backend systems, the requirements are very clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run multiple replicas of an API&lt;/li&gt;
&lt;li&gt;deploy without downtime&lt;/li&gt;
&lt;li&gt;rollback automatically when something fails&lt;/li&gt;
&lt;li&gt;isolate internal services from public traffic&lt;/li&gt;
&lt;li&gt;keep secrets out of images and environment files&lt;/li&gt;
&lt;li&gt;monitor node and container health&lt;/li&gt;
&lt;li&gt;scale horizontally when needed&lt;/li&gt;
&lt;li&gt;keep the architecture understandable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker Swarm can handle these requirements well.&lt;/p&gt;

&lt;p&gt;The important point is this: Swarm is not a replacement for Kubernetes in every scenario. But it can be the better engineering choice when simplicity, speed, and operational clarity matter more than infinite extensibility.&lt;/p&gt;


&lt;h2&gt;
  
  
  Docker Swarm vs Kubernetes: My Practical Comparison
&lt;/h2&gt;

&lt;p&gt;I do not like religious technology debates. Most tools are good or bad depending on the context.&lt;/p&gt;

&lt;p&gt;So instead of asking, “Which one is better?”, I prefer to ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What operational cost am I accepting, and what business or technical capability am I getting in return?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here is how I usually compare Docker Swarm and Kubernetes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Docker Swarm&lt;/th&gt;
&lt;th&gt;Kubernetes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Operational complexity&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning curve&lt;/td&gt;
&lt;td&gt;Friendly if you know Docker Compose&lt;/td&gt;
&lt;td&gt;Steep, with many new abstractions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control plane&lt;/td&gt;
&lt;td&gt;Built into Docker Engine&lt;/td&gt;
&lt;td&gt;Multiple components and etcd&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service discovery&lt;/td&gt;
&lt;td&gt;Built-in DNS and VIP&lt;/td&gt;
&lt;td&gt;Built-in, but with more moving parts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Networking&lt;/td&gt;
&lt;td&gt;Overlay networks and routing mesh&lt;/td&gt;
&lt;td&gt;CNI-based networking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extensibility&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Very high&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ecosystem&lt;/td&gt;
&lt;td&gt;Smaller&lt;/td&gt;
&lt;td&gt;Massive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best fit&lt;/td&gt;
&lt;td&gt;Simple to medium production systems&lt;/td&gt;
&lt;td&gt;Large platforms and complex orchestration needs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example, if you need custom operators, advanced autoscaling, complex RBAC policies, admission controllers, multi-tenant platform engineering, or service mesh integration, Kubernetes is usually the stronger choice.&lt;/p&gt;

&lt;p&gt;But if your goal is to run backend services, workers, reverse proxies, queues, internal APIs, and scheduled workloads across a small or medium cluster, Swarm may give you a cleaner path with fewer operational surprises.&lt;/p&gt;

&lt;p&gt;In my experience, many teams do not fail because their orchestrator lacks features. They fail because their infrastructure becomes too complex for the team operating it.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Mental Model: Managers, Workers, Services, and Tasks
&lt;/h2&gt;

&lt;p&gt;Before talking about production architecture, it is important to understand the Swarm model.&lt;/p&gt;

&lt;p&gt;A Swarm cluster has two main node roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Manager nodes&lt;/strong&gt; maintain cluster state and make scheduling decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker nodes&lt;/strong&gt; run the actual containers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Swarm, you do not usually think in terms of individual containers. You think in terms of &lt;strong&gt;services&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A service defines the desired state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which image to run&lt;/li&gt;
&lt;li&gt;how many replicas should exist&lt;/li&gt;
&lt;li&gt;which networks it should join&lt;/li&gt;
&lt;li&gt;which secrets it needs&lt;/li&gt;
&lt;li&gt;which constraints control placement&lt;/li&gt;
&lt;li&gt;how updates and rollbacks should happen&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Swarm then creates &lt;strong&gt;tasks&lt;/strong&gt; to satisfy that desired state. A task is basically one running instance of a service.&lt;/p&gt;

&lt;p&gt;This desired-state model is one of the most important ideas in orchestration. You are no longer manually saying, “Run this container here.” You are saying, “I want five replicas of this service, and I want them to follow these rules.”&lt;/p&gt;

&lt;p&gt;The orchestrator continuously tries to make reality match that desired state.&lt;/p&gt;

&lt;p&gt;That sounds simple, but it changes how you design deployments.&lt;/p&gt;


&lt;h2&gt;
  
  
  Raft, Quorum, and Why Manager Count Matters
&lt;/h2&gt;

&lt;p&gt;Manager nodes in Docker Swarm use the Raft consensus algorithm to maintain cluster state.&lt;/p&gt;

&lt;p&gt;This is one of the most important production details.&lt;/p&gt;

&lt;p&gt;Raft needs quorum. In simple terms, the cluster must have a majority of managers available to make decisions.&lt;/p&gt;

&lt;p&gt;The formula is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;quorum = (number_of_managers / 2) + 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This leads to one very practical rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not run an even number of manager nodes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you run two managers and lose one, the cluster cannot maintain quorum. If a network partition happens, you can end up with an unavailable control plane.&lt;/p&gt;

&lt;p&gt;For most production Swarm clusters, I prefer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1 manager&lt;/strong&gt; for small non-critical environments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 managers&lt;/strong&gt; for serious production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 managers&lt;/strong&gt; for larger or more resilient setups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I rarely see a good reason to go beyond five managers. More managers increase coordination overhead and do not automatically make your system better.&lt;/p&gt;

&lt;p&gt;Also, managers should be treated carefully. They are not just “normal servers.” They hold the cluster state. If they are overloaded, unstable, or poorly secured, the whole cluster becomes fragile.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Rule: Managers Manage, Workers Work
&lt;/h2&gt;

&lt;p&gt;One of the first production mistakes I try to avoid is running application workloads on manager nodes.&lt;/p&gt;

&lt;p&gt;Yes, Docker Swarm technically allows it.&lt;/p&gt;

&lt;p&gt;But in a serious environment, I prefer to drain manager nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker node update &lt;span class="nt"&gt;--availability&lt;/span&gt; drain &amp;lt;manager-node-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents normal application tasks from being scheduled on that manager.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because manager nodes are responsible for orchestration, scheduling, cluster state, and Raft communication. If your application has a traffic spike, a memory leak, noisy logs, or heavy CPU usage, you do not want that to compete with the control plane.&lt;/p&gt;

&lt;p&gt;A clean separation is easier to reason about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;managers handle cluster decisions&lt;/li&gt;
&lt;li&gt;workers run application workloads&lt;/li&gt;
&lt;li&gt;edge nodes handle public traffic when needed&lt;/li&gt;
&lt;li&gt;monitoring nodes handle metrics and dashboards when possible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation may look boring, but boring infrastructure is usually what you want in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  Networking: The Part You Must Understand Before Production
&lt;/h2&gt;

&lt;p&gt;Docker Swarm networking is powerful, but you need to understand what it is doing.&lt;/p&gt;

&lt;p&gt;Swarm gives you overlay networks. Services attached to the same overlay network can communicate with each other by service name.&lt;/p&gt;

&lt;p&gt;For example, your API can connect to Redis using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;redis:6379
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;instead of hardcoding an IP address.&lt;/p&gt;

&lt;p&gt;That is useful because tasks can move between nodes, containers can restart, and IP addresses can change. Service discovery hides that complexity.&lt;/p&gt;

&lt;p&gt;But there are two important networking concepts you must understand: &lt;strong&gt;VIP mode&lt;/strong&gt; and &lt;strong&gt;DNSRR mode&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  VIP Mode and the Routing Mesh
&lt;/h2&gt;

&lt;p&gt;By default, Swarm gives a service a Virtual IP, or VIP.&lt;/p&gt;

&lt;p&gt;When another service connects to that VIP, Docker uses internal load balancing to route traffic to one of the active tasks.&lt;/p&gt;

&lt;p&gt;This is convenient. It gives you service-level load balancing without installing an external load balancer for internal traffic.&lt;/p&gt;

&lt;p&gt;Swarm also has the routing mesh. When you publish a port, traffic can arrive on any node in the cluster and still be routed to a container running somewhere else.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the routing mesh, a request to port &lt;code&gt;8080&lt;/code&gt; on any node can reach the service.&lt;/p&gt;

&lt;p&gt;This is useful, but it also has trade-offs.&lt;/p&gt;

&lt;p&gt;The biggest one: preserving the real client IP can become difficult. If you need accurate client IPs for rate limiting, security logs, fraud checks, or WAF rules, you need to be careful.&lt;/p&gt;

&lt;p&gt;In those cases, I often prefer publishing ports in host mode at the edge:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;443&lt;/span&gt;
    &lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;443&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This bypasses the routing mesh for that published port and gives you more predictable network behavior at the cost of less automatic distribution.&lt;/p&gt;

&lt;p&gt;In production, I usually prefer a clear edge layer using Nginx, Traefik, or HAProxy instead of exposing many services directly through the routing mesh.&lt;/p&gt;




&lt;h2&gt;
  
  
  DNSRR Mode: When You Want More Control
&lt;/h2&gt;

&lt;p&gt;VIP mode is simple, but sometimes you want the client or an upstream proxy to see all task IPs directly.&lt;/p&gt;

&lt;p&gt;That is where DNS round-robin mode can help.&lt;/p&gt;

&lt;p&gt;You can configure endpoint mode like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpoint_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dnsrr&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this mode, DNS returns multiple task IPs instead of a single VIP.&lt;/p&gt;

&lt;p&gt;This can be useful when you have an external load balancer, a custom proxy layer, or a system that needs direct awareness of backend instances.&lt;/p&gt;

&lt;p&gt;But I would not use DNSRR everywhere by default. VIP mode is usually simpler. DNSRR is useful when you have a specific reason for it.&lt;/p&gt;

&lt;p&gt;That is a general production rule I follow:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not choose a more advanced mode just because it exists. Choose it when it solves a real operational problem.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Security: Swarm Gives You Good Primitives, But You Must Use Them Correctly
&lt;/h2&gt;

&lt;p&gt;Security in container orchestration is not one feature. It is a posture.&lt;/p&gt;

&lt;p&gt;Docker Swarm gives you useful security primitives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mutual TLS between nodes&lt;/li&gt;
&lt;li&gt;encrypted manager communication&lt;/li&gt;
&lt;li&gt;overlay networks&lt;/li&gt;
&lt;li&gt;secrets&lt;/li&gt;
&lt;li&gt;service constraints&lt;/li&gt;
&lt;li&gt;least-exposure networking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But these primitives only help if the architecture uses them properly.&lt;/p&gt;

&lt;p&gt;A weak Swarm setup usually has the same problems I see in many weak Docker setups:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;public services attached to internal networks unnecessarily&lt;/li&gt;
&lt;li&gt;secrets stored in &lt;code&gt;.env&lt;/code&gt; files&lt;/li&gt;
&lt;li&gt;manager nodes running random workloads&lt;/li&gt;
&lt;li&gt;no resource limits&lt;/li&gt;
&lt;li&gt;no clear firewall rules&lt;/li&gt;
&lt;li&gt;too many published ports&lt;/li&gt;
&lt;li&gt;no monitoring for abnormal behavior&lt;/li&gt;
&lt;li&gt;no separation between frontend and backend networks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to make containers magically secure. The goal is to reduce blast radius.&lt;/p&gt;




&lt;h2&gt;
  
  
  Use Docker Secrets Instead of Environment Variables
&lt;/h2&gt;

&lt;p&gt;One of the easiest security improvements is to stop passing sensitive values through environment variables.&lt;/p&gt;

&lt;p&gt;Environment variables are convenient, but they are not ideal for secrets. They can appear in process inspection, debugging output, crash reports, or accidental logs. They can also be exposed through careless use of &lt;code&gt;docker inspect&lt;/code&gt; or application error reporting.&lt;/p&gt;

&lt;p&gt;Docker Secrets are a better default.&lt;/p&gt;

&lt;p&gt;A secret is mounted inside the container as a file, usually under:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/run/secrets/&amp;lt;secret_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;api&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-registry.example.com/backend-api:1.0.0&lt;/span&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;db_password&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_PASSWORD_FILE=/run/secrets/db_password&lt;/span&gt;

&lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;db_password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then your application reads the password from the file path instead of directly from an environment variable.&lt;/p&gt;

&lt;p&gt;This pattern works especially well for Go, Node.js, Python, and PHP services because it is easy to implement a small helper that reads from &lt;code&gt;*_FILE&lt;/code&gt; variables.&lt;/p&gt;

&lt;p&gt;A simple Node.js example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;readSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;utf8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dbPassword&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;readSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB_PASSWORD_FILE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not perfect security, but it is a much cleaner baseline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Encrypt Overlay Networks When Needed
&lt;/h2&gt;

&lt;p&gt;By default, Docker Swarm encrypts control-plane communication between nodes.&lt;/p&gt;

&lt;p&gt;But application traffic on overlay networks is not automatically encrypted in every case. If your services communicate across untrusted networks, multiple data centers, or environments where internal traffic should be treated as sensitive, you should consider encrypted overlay networks.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;backend_net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlay&lt;/span&gt;
    &lt;span class="na"&gt;driver_opts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;encrypted&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This enables IPsec encryption for traffic on that overlay network.&lt;/p&gt;

&lt;p&gt;There is a performance cost, so I do not enable it blindly for every network in every environment. But for sensitive backend communication, I prefer the overhead over silently passing internal traffic in plain form.&lt;/p&gt;

&lt;p&gt;The senior-level decision is not “encrypt everything always.”&lt;/p&gt;

&lt;p&gt;The decision is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which network paths carry sensitive data, and what is the cost of exposure compared to the cost of encryption?&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Network Segmentation: Do Not Put Everything on One Overlay Network
&lt;/h2&gt;

&lt;p&gt;A common mistake is creating one large overlay network and attaching every service to it.&lt;/p&gt;

&lt;p&gt;That is easy, but it creates unnecessary reachability.&lt;/p&gt;

&lt;p&gt;I prefer separating networks by trust boundary.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;edge_net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlay&lt;/span&gt;

  &lt;span class="na"&gt;app_net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlay&lt;/span&gt;

  &lt;span class="na"&gt;data_net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlay&lt;/span&gt;
    &lt;span class="na"&gt;driver_opts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;encrypted&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then services are attached only where needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reverse proxy joins &lt;code&gt;edge_net&lt;/code&gt; and &lt;code&gt;app_net&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;API joins &lt;code&gt;app_net&lt;/code&gt; and &lt;code&gt;data_net&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;database joins only &lt;code&gt;data_net&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;monitoring joins only monitoring-related networks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is simple, but it makes lateral movement harder and reduces accidental exposure.&lt;/p&gt;

&lt;p&gt;In production, “can this service reach that service?” should have a deliberate answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resource Limits Are Not Optional
&lt;/h2&gt;

&lt;p&gt;Containers without resource limits are a production risk.&lt;/p&gt;

&lt;p&gt;A memory leak, CPU-heavy request, bad loop, or unexpected traffic pattern can damage the entire node.&lt;/p&gt;

&lt;p&gt;In Swarm stacks, I like to define both reservations and limits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.25"&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;256M&lt;/span&gt;
    &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0"&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;768M&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reservations help the scheduler make better placement decisions.&lt;/p&gt;

&lt;p&gt;Limits protect the node from a noisy container.&lt;/p&gt;

&lt;p&gt;You need to tune these numbers based on real metrics, not guesses. But having no limits at all is usually worse than starting with conservative values and adjusting later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Placement Constraints: Treat Nodes as Different Classes
&lt;/h2&gt;

&lt;p&gt;Not every node should run every workload.&lt;/p&gt;

&lt;p&gt;In real infrastructure, nodes often have different purposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;public edge nodes&lt;/li&gt;
&lt;li&gt;backend worker nodes&lt;/li&gt;
&lt;li&gt;storage-heavy nodes&lt;/li&gt;
&lt;li&gt;GPU nodes&lt;/li&gt;
&lt;li&gt;monitoring nodes&lt;/li&gt;
&lt;li&gt;nodes in different availability zones&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker Swarm supports node labels and placement constraints.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker node update &lt;span class="nt"&gt;--label-add&lt;/span&gt; &lt;span class="nv"&gt;zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dmz worker-01
docker node update &lt;span class="nt"&gt;--label-add&lt;/span&gt; &lt;span class="nv"&gt;workload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;backend worker-02
docker node update &lt;span class="nt"&gt;--label-add&lt;/span&gt; &lt;span class="nv"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;fast worker-03
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in your stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;placement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;constraints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node.role == worker&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node.labels.workload == backend&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you a clean way to express infrastructure intent.&lt;/p&gt;

&lt;p&gt;For example, I may want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nginx or Traefik only on edge nodes&lt;/li&gt;
&lt;li&gt;APIs only on backend workers&lt;/li&gt;
&lt;li&gt;databases only on storage nodes&lt;/li&gt;
&lt;li&gt;internal tools never on public-facing nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is one of those features that looks small but becomes very important as the cluster grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Zero-Downtime Deployment Strategy
&lt;/h2&gt;

&lt;p&gt;A production deployment should not be “stop old containers, start new containers, hope everything works.”&lt;/p&gt;

&lt;p&gt;Swarm gives you rolling update controls.&lt;/p&gt;

&lt;p&gt;A good baseline looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;update_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;parallelism&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
    &lt;span class="na"&gt;order&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;start-first&lt;/span&gt;
    &lt;span class="na"&gt;failure_action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rollback&lt;/span&gt;
  &lt;span class="na"&gt;rollback_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;parallelism&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
    &lt;span class="na"&gt;order&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;stop-first&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most important setting here is often:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;order&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;start-first&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells Swarm to start the new task before stopping the old one. For APIs and web services, this helps reduce downtime during deployments.&lt;/p&gt;

&lt;p&gt;But this only works well if your application is ready for it.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the app must start reliably&lt;/li&gt;
&lt;li&gt;health checks must be meaningful&lt;/li&gt;
&lt;li&gt;migrations must be backward-compatible&lt;/li&gt;
&lt;li&gt;old and new versions may run at the same time briefly&lt;/li&gt;
&lt;li&gt;the service must handle graceful shutdown&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The orchestrator can help with deployment, but it cannot fix bad application lifecycle design.&lt;/p&gt;




&lt;h2&gt;
  
  
  Health Checks: Do Not Fake Them
&lt;/h2&gt;

&lt;p&gt;A weak health check is worse than no health check because it creates false confidence.&lt;/p&gt;

&lt;p&gt;I do not like health checks that only confirm the process is alive.&lt;/p&gt;

&lt;p&gt;For example, this is weak:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;returning &lt;code&gt;200 OK&lt;/code&gt; no matter what.&lt;/p&gt;

&lt;p&gt;A better health check should answer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can this instance actually serve traffic?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Depending on the service, that may include checking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP server readiness&lt;/li&gt;
&lt;li&gt;database connection&lt;/li&gt;
&lt;li&gt;Redis connection&lt;/li&gt;
&lt;li&gt;required config loaded&lt;/li&gt;
&lt;li&gt;dependency availability&lt;/li&gt;
&lt;li&gt;migration state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But be careful: a health check should not become too expensive. If every health check runs a heavy database query every few seconds, the health check becomes part of the problem.&lt;/p&gt;

&lt;p&gt;A practical pattern is separating liveness and readiness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;liveness&lt;/strong&gt;: is the process alive?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;readiness&lt;/strong&gt;: is the service ready to receive traffic?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In many Swarm setups, you implement this at the application and reverse-proxy layer, even if Swarm's health check support is simpler than Kubernetes.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wget"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-qO-"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8080/health"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3s&lt;/span&gt;
  &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;start_period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The real work is not adding this YAML. The real work is making &lt;code&gt;/health&lt;/code&gt; meaningful.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Production-Ready Stack Example
&lt;/h2&gt;

&lt;p&gt;Here is a practical Swarm stack example with edge routing, backend isolation, secrets, placement constraints, update strategy, and resource controls.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.8"&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;reverse_proxy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:1.25-alpine&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;443&lt;/span&gt;
        &lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;443&lt;/span&gt;
        &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp&lt;/span&gt;
        &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;edge_net&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;app_net&lt;/span&gt;
    &lt;span class="na"&gt;configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx_conf&lt;/span&gt;
        &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/nginx/nginx.conf&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;global&lt;/span&gt;
      &lt;span class="na"&gt;placement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;constraints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node.role == worker&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node.labels.zone == dmz&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.10"&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;128M&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.50"&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;256M&lt;/span&gt;
      &lt;span class="na"&gt;update_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;parallelism&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
        &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
        &lt;span class="na"&gt;order&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;start-first&lt;/span&gt;
        &lt;span class="na"&gt;failure_action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rollback&lt;/span&gt;

  &lt;span class="na"&gt;api&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.example.com/company/api:v2.4.1&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;app_net&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;data_net&lt;/span&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;db_password&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;jwt_private_key&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NODE_ENV=production&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PORT=8080&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_HOST=postgres&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_PASSWORD_FILE=/run/secrets/db_password&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;JWT_PRIVATE_KEY_FILE=/run/secrets/jwt_private_key&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wget"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-qO-"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8080/health"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3s&lt;/span&gt;
      &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
      &lt;span class="na"&gt;start_period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20s&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt;
      &lt;span class="na"&gt;placement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;constraints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node.role == worker&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node.labels.workload == backend&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.25"&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;256M&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.00"&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;768M&lt;/span&gt;
      &lt;span class="na"&gt;update_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;parallelism&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
        &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
        &lt;span class="na"&gt;order&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;start-first&lt;/span&gt;
        &lt;span class="na"&gt;failure_action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rollback&lt;/span&gt;
      &lt;span class="na"&gt;rollback_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;parallelism&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
        &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;

  &lt;span class="na"&gt;worker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.example.com/company/worker:v2.4.1&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;data_net&lt;/span&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;db_password&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;QUEUE_NAME=default&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_PASSWORD_FILE=/run/secrets/db_password&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
      &lt;span class="na"&gt;placement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;constraints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node.role == worker&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node.labels.workload == backend&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.25"&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;256M&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.50"&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1024M&lt;/span&gt;
      &lt;span class="na"&gt;update_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;parallelism&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
        &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;
        &lt;span class="na"&gt;order&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;stop-first&lt;/span&gt;
        &lt;span class="na"&gt;failure_action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rollback&lt;/span&gt;

&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;edge_net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlay&lt;/span&gt;

  &lt;span class="na"&gt;app_net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlay&lt;/span&gt;

  &lt;span class="na"&gt;data_net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlay&lt;/span&gt;
    &lt;span class="na"&gt;driver_opts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;encrypted&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;

&lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;db_password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;jwt_private_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;nginx_conf&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a copy-paste solution for every company. But it shows the mindset:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;edge traffic is separated&lt;/li&gt;
&lt;li&gt;backend traffic is internal&lt;/li&gt;
&lt;li&gt;sensitive network traffic is encrypted&lt;/li&gt;
&lt;li&gt;secrets are mounted as files&lt;/li&gt;
&lt;li&gt;managers are avoided for workloads&lt;/li&gt;
&lt;li&gt;resources are controlled&lt;/li&gt;
&lt;li&gt;deployments have rollback behavior&lt;/li&gt;
&lt;li&gt;public ports are minimized&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between “it runs” and “I can operate this under stress.”&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability: What I Monitor in Swarm
&lt;/h2&gt;

&lt;p&gt;You cannot run production infrastructure with hope as your monitoring strategy.&lt;/p&gt;

&lt;p&gt;For Docker Swarm, I usually think about observability at four levels:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Node-level metrics&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Container-level metrics&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Service-level behavior&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Application-level signals&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A common stack is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus&lt;/li&gt;
&lt;li&gt;Grafana&lt;/li&gt;
&lt;li&gt;Node Exporter&lt;/li&gt;
&lt;li&gt;cAdvisor&lt;/li&gt;
&lt;li&gt;Loki or another log aggregation system&lt;/li&gt;
&lt;li&gt;Alertmanager&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Node Exporter gives host metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU&lt;/li&gt;
&lt;li&gt;memory&lt;/li&gt;
&lt;li&gt;disk usage&lt;/li&gt;
&lt;li&gt;disk I/O&lt;/li&gt;
&lt;li&gt;filesystem pressure&lt;/li&gt;
&lt;li&gt;network traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;cAdvisor gives container metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;container CPU usage&lt;/li&gt;
&lt;li&gt;memory usage&lt;/li&gt;
&lt;li&gt;restart patterns&lt;/li&gt;
&lt;li&gt;throttling&lt;/li&gt;
&lt;li&gt;network traffic per container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prometheus scrapes metrics, Grafana visualizes them, and Alertmanager handles notifications.&lt;/p&gt;

&lt;p&gt;A minimal global cAdvisor service can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;cadvisor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcr.io/cadvisor/cadvisor:v0.49.1&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/:/rootfs:ro&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/var/run:/var/run:rw&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/sys:/sys:ro&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/var/lib/docker/:/var/lib/docker:ro&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;monitoring&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;global&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;64M&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;256M&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And Node Exporter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;node_exporter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prom/node-exporter:v1.7.0&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--path.rootfs=/host"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/:/host:ro,rslave"&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;monitoring&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;global&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For service discovery inside Swarm, I like using the &lt;code&gt;tasks.&amp;lt;service_name&amp;gt;&lt;/code&gt; DNS pattern.&lt;/p&gt;

&lt;p&gt;For example, Prometheus can resolve:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tasks.cadvisor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and discover all running cAdvisor tasks.&lt;/p&gt;

&lt;p&gt;This is simple and works well without introducing another service discovery system.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Alerts I Actually Care About
&lt;/h2&gt;

&lt;p&gt;Dashboards are useful, but alerts are what wake people up.&lt;/p&gt;

&lt;p&gt;I try to avoid noisy alerts. A noisy alerting system becomes background noise and people stop trusting it.&lt;/p&gt;

&lt;p&gt;For Swarm, I care about alerts like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;manager node down&lt;/li&gt;
&lt;li&gt;quorum risk&lt;/li&gt;
&lt;li&gt;worker node down&lt;/li&gt;
&lt;li&gt;high memory pressure&lt;/li&gt;
&lt;li&gt;disk usage above threshold&lt;/li&gt;
&lt;li&gt;container restart loop&lt;/li&gt;
&lt;li&gt;service replica count below desired state&lt;/li&gt;
&lt;li&gt;high 5xx rate from the reverse proxy&lt;/li&gt;
&lt;li&gt;high latency on critical APIs&lt;/li&gt;
&lt;li&gt;certificate expiration approaching&lt;/li&gt;
&lt;li&gt;unusual increase in blocked or suspicious requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important thing is to connect infrastructure signals with application signals.&lt;/p&gt;

&lt;p&gt;For example, high CPU alone may not be urgent. But high CPU plus increased latency plus replica restarts is a real incident.&lt;/p&gt;

&lt;p&gt;This is where senior engineering judgment matters. Monitoring is not about collecting everything. It is about knowing which signals explain user impact.&lt;/p&gt;




&lt;h2&gt;
  
  
  Logging: Centralize It Early
&lt;/h2&gt;

&lt;p&gt;On a single server, &lt;code&gt;docker logs&lt;/code&gt; is fine.&lt;/p&gt;

&lt;p&gt;In a cluster, it is not enough.&lt;/p&gt;

&lt;p&gt;Once a service has replicas across several nodes, manual log inspection becomes painful. You need centralized logs.&lt;/p&gt;

&lt;p&gt;I prefer to standardize logs as structured JSON where possible.&lt;/p&gt;

&lt;p&gt;A good application log should include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;timestamp&lt;/li&gt;
&lt;li&gt;request ID or correlation ID&lt;/li&gt;
&lt;li&gt;service name&lt;/li&gt;
&lt;li&gt;environment&lt;/li&gt;
&lt;li&gt;log level&lt;/li&gt;
&lt;li&gt;route or operation name&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;user or tenant ID when appropriate&lt;/li&gt;
&lt;li&gt;error code&lt;/li&gt;
&lt;li&gt;upstream dependency name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the infrastructure side, reverse proxy logs are extremely valuable. They show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;real client IP&lt;/li&gt;
&lt;li&gt;request path&lt;/li&gt;
&lt;li&gt;status code&lt;/li&gt;
&lt;li&gt;upstream response time&lt;/li&gt;
&lt;li&gt;user agent&lt;/li&gt;
&lt;li&gt;TLS details&lt;/li&gt;
&lt;li&gt;blocked paths&lt;/li&gt;
&lt;li&gt;suspicious probing attempts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially important for public APIs because a lot of malicious traffic starts with boring-looking HTTP requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  Backup and Disaster Recovery: Do Not Ignore the Managers
&lt;/h2&gt;

&lt;p&gt;People often think about database backups but forget about orchestrator state.&lt;/p&gt;

&lt;p&gt;In Docker Swarm, manager state matters.&lt;/p&gt;

&lt;p&gt;You should have a recovery plan for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;losing a worker node&lt;/li&gt;
&lt;li&gt;losing a manager node&lt;/li&gt;
&lt;li&gt;losing quorum&lt;/li&gt;
&lt;li&gt;rotating join tokens&lt;/li&gt;
&lt;li&gt;restoring service definitions&lt;/li&gt;
&lt;li&gt;restoring secrets/configs&lt;/li&gt;
&lt;li&gt;rebuilding the cluster from infrastructure code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I do not like relying only on manual server state. I prefer keeping stack files, Nginx configs, Prometheus configs, deployment scripts, and infrastructure notes in Git.&lt;/p&gt;

&lt;p&gt;That way, if the cluster has to be rebuilt, the knowledge is not trapped inside one engineer's terminal history.&lt;/p&gt;

&lt;p&gt;For manager backups, you need to understand your Docker version and operational procedure carefully. The key point is not to improvise disaster recovery during the disaster.&lt;/p&gt;

&lt;p&gt;Test the recovery path before you need it.&lt;/p&gt;




&lt;h2&gt;
  
  
  CI/CD: Keep Deployments Predictable
&lt;/h2&gt;

&lt;p&gt;A simple Swarm deployment pipeline can be very effective.&lt;/p&gt;

&lt;p&gt;A common flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build the Docker image.&lt;/li&gt;
&lt;li&gt;Tag it with a commit SHA or version.&lt;/li&gt;
&lt;li&gt;Push it to a registry.&lt;/li&gt;
&lt;li&gt;SSH into a manager or use a controlled deployment runner.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;docker stack deploy&lt;/code&gt; with the updated image.&lt;/li&gt;
&lt;li&gt;Verify service state.&lt;/li&gt;
&lt;li&gt;Check health and logs.&lt;/li&gt;
&lt;li&gt;Roll back if needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker stack deploy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--with-registry-auth&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-c&lt;/span&gt; docker-compose.prod.yml &lt;span class="se"&gt;\&lt;/span&gt;
  my_app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I strongly prefer immutable image tags for production deployments.&lt;/p&gt;

&lt;p&gt;Avoid this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-api:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Prefer this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-api:2026-05-23-a1b2c3d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-api:git-sha-a1b2c3d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When something breaks, you need to know exactly what is running.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;latest&lt;/code&gt; is convenient until you are debugging an incident and cannot prove which code is actually deployed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Database Migrations: The Dangerous Part of Zero-Downtime Deployments
&lt;/h2&gt;

&lt;p&gt;Orchestrators make it easy to roll out new containers.&lt;/p&gt;

&lt;p&gt;They do not automatically make database changes safe.&lt;/p&gt;

&lt;p&gt;This is where many “zero-downtime” strategies fail.&lt;/p&gt;

&lt;p&gt;The safe pattern is usually expand-and-contract:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add new schema changes in a backward-compatible way.&lt;/li&gt;
&lt;li&gt;Deploy application code that can work with both old and new schema.&lt;/li&gt;
&lt;li&gt;Backfill data if needed.&lt;/li&gt;
&lt;li&gt;Switch reads/writes to the new structure.&lt;/li&gt;
&lt;li&gt;Remove old columns or tables later.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Avoid deployments where the new application requires a schema change that immediately breaks the old version.&lt;/p&gt;

&lt;p&gt;During rolling updates, old and new containers may run at the same time. Your database design must tolerate that.&lt;/p&gt;

&lt;p&gt;This is not a Docker Swarm problem. This is a distributed systems deployment problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  When I Would Not Use Docker Swarm
&lt;/h2&gt;

&lt;p&gt;I like Docker Swarm, but I would not use it everywhere.&lt;/p&gt;

&lt;p&gt;I would probably choose Kubernetes if I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;custom operators&lt;/li&gt;
&lt;li&gt;advanced autoscaling patterns&lt;/li&gt;
&lt;li&gt;strong multi-tenancy&lt;/li&gt;
&lt;li&gt;complex RBAC and policy enforcement&lt;/li&gt;
&lt;li&gt;service mesh as a core requirement&lt;/li&gt;
&lt;li&gt;a large platform team&lt;/li&gt;
&lt;li&gt;advanced ecosystem integrations&lt;/li&gt;
&lt;li&gt;cloud-native managed control plane support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I would also avoid Swarm if the organization already has a mature Kubernetes platform and the team knows how to operate it well.&lt;/p&gt;

&lt;p&gt;Technology choice should respect organizational reality.&lt;/p&gt;

&lt;p&gt;A simple tool in the wrong organization can still fail. A complex tool in a mature organization can work beautifully.&lt;/p&gt;




&lt;h2&gt;
  
  
  When I Would Choose Docker Swarm
&lt;/h2&gt;

&lt;p&gt;I would choose Docker Swarm when the system needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simple multi-node orchestration&lt;/li&gt;
&lt;li&gt;predictable deployments&lt;/li&gt;
&lt;li&gt;low operational overhead&lt;/li&gt;
&lt;li&gt;easy onboarding for Docker-experienced engineers&lt;/li&gt;
&lt;li&gt;internal APIs and workers&lt;/li&gt;
&lt;li&gt;small to medium clusters&lt;/li&gt;
&lt;li&gt;strong enough orchestration without platform engineering complexity&lt;/li&gt;
&lt;li&gt;fast recovery and simple mental models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many backend teams, this is enough.&lt;/p&gt;

&lt;p&gt;And “enough” is underrated.&lt;/p&gt;

&lt;p&gt;In engineering, the goal is not to maximize architecture complexity. The goal is to deliver reliable systems that the team can understand, secure, monitor, and improve.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Final Production Checklist
&lt;/h2&gt;

&lt;p&gt;If I were reviewing a Docker Swarm setup before production, I would ask these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do we have 3 or 5 managers for production?&lt;/li&gt;
&lt;li&gt;Are manager nodes drained from application workloads?&lt;/li&gt;
&lt;li&gt;Are public ports minimized?&lt;/li&gt;
&lt;li&gt;Are edge, app, data, and monitoring networks separated?&lt;/li&gt;
&lt;li&gt;Are sensitive overlay networks encrypted where needed?&lt;/li&gt;
&lt;li&gt;Are secrets stored with Docker Secrets instead of &lt;code&gt;.env&lt;/code&gt; files?&lt;/li&gt;
&lt;li&gt;Do services have CPU and memory limits?&lt;/li&gt;
&lt;li&gt;Do critical services have health checks?&lt;/li&gt;
&lt;li&gt;Do deployments use rolling updates and rollback behavior?&lt;/li&gt;
&lt;li&gt;Are image tags immutable?&lt;/li&gt;
&lt;li&gt;Are logs centralized?&lt;/li&gt;
&lt;li&gt;Are metrics collected from nodes and containers?&lt;/li&gt;
&lt;li&gt;Are alerts meaningful and not noisy?&lt;/li&gt;
&lt;li&gt;Is there a documented recovery plan?&lt;/li&gt;
&lt;li&gt;Can a new engineer understand the architecture without reading 50 pages of hidden tribal knowledge?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last question matters more than people think.&lt;/p&gt;

&lt;p&gt;If the infrastructure only works because one person remembers every command, it is not production-ready. It is a personal project running on production servers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Docker Swarm is not the loudest technology in the room anymore.&lt;/p&gt;

&lt;p&gt;But that does not make it useless.&lt;/p&gt;

&lt;p&gt;For the right team and the right workload, it is still a clean, stable, and practical way to run containers in production. It gives you orchestration, service discovery, rolling updates, secrets, overlay networks, and scheduling without forcing you to adopt the full complexity of Kubernetes.&lt;/p&gt;

&lt;p&gt;My general philosophy is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Keep infrastructure as simple as possible, but never simpler than production requires.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Docker Swarm fits that philosophy well.&lt;/p&gt;

&lt;p&gt;It is not about avoiding Kubernetes because Kubernetes is bad. It is about choosing the smallest reliable system that can do the job.&lt;/p&gt;

&lt;p&gt;And in many real production environments, that is exactly the kind of decision that keeps systems stable, teams productive, and pagers quiet.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>devops</category>
      <category>infrastructure</category>
      <category>security</category>
    </item>
    <item>
      <title>How I Analyzed the Linux Kernel's Deadliest Logic Bug: A Deep Dive into Dirty Pipe (CVE-2022-0847)</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Fri, 22 May 2026 06:34:56 +0000</pubDate>
      <link>https://dev.to/amirsefati/how-i-analyzed-the-linux-kernels-deadliest-logic-bug-a-deep-dive-into-dirty-pipe-cve-2022-0847-57f5</link>
      <guid>https://dev.to/amirsefati/how-i-analyzed-the-linux-kernels-deadliest-logic-bug-a-deep-dive-into-dirty-pipe-cve-2022-0847-57f5</guid>
      <description>&lt;p&gt;As developers, we often think of kernel exploits as highly complex assembly-level wizardry, heap grooming, or race-condition battles. But recently, I decided to sit down, pull up the Linux kernel source code, and trace the infamous &lt;strong&gt;Dirty Pipe&lt;/strong&gt; vulnerability, &lt;strong&gt;CVE-2022-0847&lt;/strong&gt;, line by line.&lt;/p&gt;

&lt;p&gt;What I found was mind-blowing: a simple, uninitialized struct member in the core memory-management path allowed an unprivileged local user to write into read-only files through the Page Cache.&lt;/p&gt;

&lt;p&gt;No race conditions.&lt;br&gt;&lt;br&gt;
No classic memory corruption.&lt;br&gt;&lt;br&gt;
No heap spraying.&lt;br&gt;&lt;br&gt;
Just one stale flag in a reused kernel structure.&lt;/p&gt;

&lt;p&gt;This is my technical post-mortem and step-by-step code analysis of how this elegant logic bug worked.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Conceptual Backstory: Page Cache, Pipes, and &lt;code&gt;splice()&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Before looking at the buggy code, we need to understand the three Linux kernel mechanisms that collided to create Dirty Pipe:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Page Cache&lt;/li&gt;
&lt;li&gt;Pipe buffers&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;splice()&lt;/code&gt; system call&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  1. The Page Cache: RAM as a Disk Mirror
&lt;/h2&gt;

&lt;p&gt;To avoid slow disk reads, Linux keeps recently accessed file data in memory. This memory-backed representation is called the &lt;strong&gt;Page Cache&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When multiple processes read the same file, for example &lt;code&gt;/etc/passwd&lt;/code&gt;, the kernel does not necessarily load separate copies for every process. Instead, it can map those processes to the same physical memory page that represents the file's cached content.&lt;/p&gt;

&lt;p&gt;Normally, if a process tries to write to a page without write permission, the kernel's Copy-on-Write mechanism protects the original data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The original page remains unchanged.&lt;/li&gt;
&lt;li&gt;A private copy is created.&lt;/li&gt;
&lt;li&gt;The process writes to that private copy.&lt;/li&gt;
&lt;li&gt;The read-only backing file remains safe.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the expected contract.&lt;/p&gt;

&lt;p&gt;Dirty Pipe broke that contract.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. The Pipe Buffer
&lt;/h2&gt;

&lt;p&gt;In Linux, a pipe is implemented as a circular ring of buffers represented internally by &lt;code&gt;struct pipe_inode_info&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Each slot in that ring is a &lt;code&gt;struct pipe_buffer&lt;/code&gt;, defined in &lt;code&gt;include/linux/pipe_fs_i.h&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pipe_buffer&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pipe_buf_operations&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// &amp;lt;-- the field that matters here&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;private&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important field is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When data is written to a pipe, the kernel may allocate page-sized buffers, usually 4 KB. If the write does not fill the whole page, the kernel can mark that buffer as mergeable by setting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;PIPE_BUF_FLAG_CAN_MERGE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That flag tells the kernel:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;New writes may be appended into the remaining space of this existing pipe buffer instead of allocating a new one.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That behavior is perfectly valid for normal anonymous pipe pages.&lt;/p&gt;

&lt;p&gt;The problem appears when a pipe buffer stops pointing to a normal anonymous pipe page and starts pointing to a page from the Page Cache.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The &lt;code&gt;splice()&lt;/code&gt; Syscall: Zero-Copy Magic
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;splice()&lt;/code&gt; system call is a Linux performance optimization. It moves data between file descriptors and pipes without copying data back and forth through user space.&lt;/p&gt;

&lt;p&gt;Instead of doing this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;file -&amp;gt; kernel buffer -&amp;gt; user space -&amp;gt; kernel pipe buffer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;splice()&lt;/code&gt; can do something closer to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;file page cache -&amp;gt; pipe buffer reference
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is powerful because it avoids unnecessary copying.&lt;/p&gt;

&lt;p&gt;But it also means a pipe buffer can reference a page that belongs to the Page Cache of a file.&lt;/p&gt;

&lt;p&gt;Internally, one of the relevant functions is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;copy_page_to_iter_pipe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function creates a pipe buffer that references the page containing file data.&lt;/p&gt;

&lt;p&gt;That is where the bug lived.&lt;/p&gt;




&lt;h2&gt;
  
  
  Digging Into the Code: The Bug in &lt;code&gt;lib/iov_iter.c&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;When &lt;code&gt;splice()&lt;/code&gt; is used to map file data into a pipe, the kernel executes code similar to this vulnerable version of &lt;code&gt;copy_page_to_iter_pipe()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="nf"&gt;copy_page_to_iter_pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                     &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;iov_iter&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ... validation steps ...&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pipe_inode_info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pipe_buffer&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;bufs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;page_cache_pipe_buf_ops&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;get_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="c1"&gt;// What is missing here?&lt;/span&gt;

    &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The missing line is the entire bug:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;buf-&amp;gt;flags&lt;/code&gt; was never initialized or cleared.&lt;/p&gt;

&lt;p&gt;Because pipes are implemented as circular rings, the kernel reuses old &lt;code&gt;pipe_buffer&lt;/code&gt; structures. If a previous operation left &lt;code&gt;PIPE_BUF_FLAG_CAN_MERGE&lt;/code&gt; set, that stale flag could remain active when the same buffer slot was reused for Page Cache-backed file data.&lt;/p&gt;

&lt;p&gt;That means a buffer referencing a read-only file page could accidentally still look mergeable.&lt;/p&gt;

&lt;p&gt;That is the core of Dirty Pipe.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Intersection of Two Commits
&lt;/h2&gt;

&lt;p&gt;One thing I found especially interesting is that Dirty Pipe was not born from one obviously dangerous commit.&lt;/p&gt;

&lt;p&gt;It came from the interaction of two separate changes:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Commit &lt;code&gt;241699cd72a8&lt;/code&gt; — October 2016
&lt;/h3&gt;

&lt;p&gt;This introduced the new pipe-backed &lt;code&gt;iov_iter&lt;/code&gt; subsystem and added &lt;code&gt;copy_page_to_iter_pipe()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The function did not initialize &lt;code&gt;buf-&amp;gt;flags&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;At that time, this was not immediately exploitable because the dangerous merge flag did not exist yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Commit &lt;code&gt;f6dd975583bd&lt;/code&gt; — May 2020
&lt;/h3&gt;

&lt;p&gt;This added &lt;code&gt;PIPE_BUF_FLAG_CAN_MERGE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Suddenly, an old uninitialized field became security-critical.&lt;/p&gt;

&lt;p&gt;That is the scary engineering lesson:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A harmless-looking initialization bug can become a critical vulnerability years later when another subsystem evolves.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step-by-Step: How the Exploit Mechanics Worked
&lt;/h2&gt;

&lt;p&gt;At a high level, the exploit forced the kernel into a bad state:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prepare a pipe so all its internal buffer slots have &lt;code&gt;PIPE_BUF_FLAG_CAN_MERGE&lt;/code&gt; set.&lt;/li&gt;
&lt;li&gt;Drain the pipe so it becomes logically empty.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;splice()&lt;/code&gt; to attach a read-only file's Page Cache page to a reused pipe buffer.&lt;/li&gt;
&lt;li&gt;Because &lt;code&gt;buf-&amp;gt;flags&lt;/code&gt; was not cleared, the stale merge flag remains.&lt;/li&gt;
&lt;li&gt;A later write to the pipe is merged into the Page Cache page.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result: the in-memory cached representation of a read-only file is modified.&lt;/p&gt;

&lt;p&gt;The disk file itself is not directly overwritten. The modification happens in the Page Cache.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 1: Polluting the Pipe Buffers
&lt;/h2&gt;

&lt;p&gt;The first step is to fill the pipe. This causes the kernel to allocate pipe buffers and mark them mergeable.&lt;/p&gt;

&lt;p&gt;A simplified version looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;capacity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fcntl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;F_GETPIPE_SZ&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;dummy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sc"&gt;'A'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dummy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dummy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;dummy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this stage, the internal pipe buffer slots have been used and may contain &lt;code&gt;PIPE_BUF_FLAG_CAN_MERGE&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 2: Draining the Pipe
&lt;/h2&gt;

&lt;p&gt;Next, the pipe is drained:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dummy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dummy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;dummy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the pipe is logically empty.&lt;/p&gt;

&lt;p&gt;But the kernel's internal &lt;code&gt;pipe_buffer&lt;/code&gt; metadata is still there, ready to be reused.&lt;/p&gt;

&lt;p&gt;The stale flags may still exist in those reused slots.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 3: Splicing File Data into the Pipe
&lt;/h2&gt;

&lt;p&gt;Then &lt;code&gt;splice()&lt;/code&gt; is used to move data from a target file into the pipe without copying it through user space:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/path/to/read-only-file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;O_RDONLY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;loff_t&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;splice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Behind the scenes, the kernel creates a pipe buffer that references the file's Page Cache page.&lt;/p&gt;

&lt;p&gt;But because &lt;code&gt;buf-&amp;gt;flags&lt;/code&gt; was not cleared, the buffer may still have the old merge flag.&lt;/p&gt;

&lt;p&gt;Now we have a dangerous state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pipe_buffer.page  -&amp;gt; file Page Cache page
pipe_buffer.flags -&amp;gt; PIPE_BUF_FLAG_CAN_MERGE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That should never happen.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 4: Writing into the Pipe
&lt;/h2&gt;

&lt;p&gt;A subsequent write to the pipe is then treated as mergeable.&lt;/p&gt;

&lt;p&gt;The kernel thinks it is appending data into a normal anonymous pipe page.&lt;/p&gt;

&lt;p&gt;In reality, the buffer points to a file-backed Page Cache page.&lt;/p&gt;

&lt;p&gt;So the write lands inside the cached file page.&lt;/p&gt;

&lt;p&gt;That is why Dirty Pipe could modify the in-memory contents of files that the attacker should not have been able to write.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Dirty Pipe Was So Dangerous
&lt;/h2&gt;

&lt;p&gt;Dirty Pipe was terrifying because it was not a fragile exploit.&lt;/p&gt;

&lt;h3&gt;
  
  
  No Race Condition
&lt;/h3&gt;

&lt;p&gt;Dirty COW, CVE-2016-5195, depended on winning a race condition. Dirty Pipe did not.&lt;/p&gt;

&lt;p&gt;There was no timing window to win.&lt;/p&gt;

&lt;h3&gt;
  
  
  No Classic Memory Corruption
&lt;/h3&gt;

&lt;p&gt;This was not a buffer overflow or heap corruption bug.&lt;/p&gt;

&lt;p&gt;The kernel was following its own logic, but that logic was operating on stale state.&lt;/p&gt;

&lt;h3&gt;
  
  
  High Reliability
&lt;/h3&gt;

&lt;p&gt;Once the vulnerable state was created, the behavior was deterministic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Page Cache Impact
&lt;/h3&gt;

&lt;p&gt;The modification happened in memory through the Page Cache. That means the on-disk file might remain unchanged, but programs reading the file could observe the modified cached version.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dirty Pipe vs Dirty COW
&lt;/h2&gt;

&lt;p&gt;Dirty Pipe and Dirty COW are often compared because both involve unexpected writes related to file-backed memory.&lt;/p&gt;

&lt;p&gt;But the exploit style is very different.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Dirty COW&lt;/th&gt;
&lt;th&gt;Dirty Pipe&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CVE&lt;/td&gt;
&lt;td&gt;CVE-2016-5195&lt;/td&gt;
&lt;td&gt;CVE-2022-0847&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bug type&lt;/td&gt;
&lt;td&gt;Race condition&lt;/td&gt;
&lt;td&gt;Uninitialized/stale state logic bug&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliability&lt;/td&gt;
&lt;td&gt;Timing-dependent&lt;/td&gt;
&lt;td&gt;Highly deterministic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Main mechanism&lt;/td&gt;
&lt;td&gt;Copy-on-Write race&lt;/td&gt;
&lt;td&gt;Stale &lt;code&gt;PIPE_BUF_FLAG_CAN_MERGE&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel area&lt;/td&gt;
&lt;td&gt;Memory management&lt;/td&gt;
&lt;td&gt;Pipes, Page Cache, &lt;code&gt;splice()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Dirty Pipe is a great reminder that not all dangerous vulnerabilities look like obvious memory corruption.&lt;/p&gt;

&lt;p&gt;Sometimes the bug is just one field that was not reset.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Upstream Fix
&lt;/h2&gt;

&lt;p&gt;The fix was surprisingly small.&lt;/p&gt;

&lt;p&gt;In the patched version, the kernel explicitly clears the flags when creating a new pipe buffer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gh"&gt;diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b0e0acdf96c15e..6dd5330f7a9957 100644
&lt;/span&gt;&lt;span class="gd"&gt;--- a/lib/iov_iter.c
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/lib/iov_iter.c
&lt;/span&gt;&lt;span class="p"&gt;@@ -414,6 +414,7 @@&lt;/span&gt; static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
         return 0;
&lt;span class="err"&gt;
&lt;/span&gt;     buf-&amp;gt;ops = &amp;amp;page_cache_pipe_buf_ops;
&lt;span class="gi"&gt;+    buf-&amp;gt;flags = 0;
&lt;/span&gt;     get_page(page);
     buf-&amp;gt;page = page;
     buf-&amp;gt;offset = offset;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line.&lt;/p&gt;

&lt;p&gt;One field.&lt;/p&gt;

&lt;p&gt;A huge security impact.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Developer Takeaways
&lt;/h2&gt;

&lt;p&gt;Analyzing Dirty Pipe gave me a stronger appreciation for defensive engineering in low-level systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Always Initialize Reused Structures
&lt;/h3&gt;

&lt;p&gt;If a structure is reused, every stateful field should be explicitly initialized.&lt;/p&gt;

&lt;p&gt;Relying on previous state is dangerous.&lt;/p&gt;

&lt;p&gt;In kernel code, stale state is not just a bug. It can become a privilege escalation.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Flags Are Security Boundaries
&lt;/h3&gt;

&lt;p&gt;A single bit can completely change how the kernel interprets memory.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;PIPE_BUF_FLAG_CAN_MERGE&lt;/code&gt; looked like a performance optimization flag, but in the wrong context it became a security boundary bypass.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Subsystem Interactions Matter
&lt;/h3&gt;

&lt;p&gt;The original missing initialization existed for years.&lt;/p&gt;

&lt;p&gt;It became dangerous only after another feature introduced a new meaning for the stale field.&lt;/p&gt;

&lt;p&gt;This is why reviewing only the changed file is not enough.&lt;/p&gt;

&lt;p&gt;When adding new flags, modes, or state transitions, we should audit every path that creates, recycles, or reuses the structure.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Logic Bugs Can Be More Reliable Than Memory Corruption
&lt;/h3&gt;

&lt;p&gt;Dirty Pipe was not powerful because it crashed the kernel or corrupted random memory.&lt;/p&gt;

&lt;p&gt;It was powerful because the kernel's internal state machine became logically inconsistent.&lt;/p&gt;

&lt;p&gt;That kind of bug can be easier to exploit and harder to detect.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Defensive Coding Is Not Optional in Systems Programming
&lt;/h3&gt;

&lt;p&gt;In application code, forgetting to initialize a field may cause a weird UI bug or a failed request.&lt;/p&gt;

&lt;p&gt;In kernel code, it may let an unprivileged user modify read-only file content.&lt;/p&gt;

&lt;p&gt;That difference is why explicit initialization, careful invariants, and subsystem-level reviews are essential.&lt;/p&gt;




&lt;h2&gt;
  
  
  Exploit Discussion: Why I Will Not Weaponize It Here
&lt;/h2&gt;

&lt;p&gt;At this point, it is tempting to drop a full copy-paste exploit and call the analysis complete.&lt;/p&gt;

&lt;p&gt;Dirty Pipe is not just an academic bug. It is a real local privilege escalation vulnerability that can be used to modify sensitive files, abuse SUID binaries, and turn limited local execution into root-level impact on vulnerable systems.&lt;/p&gt;

&lt;p&gt;So instead of publishing a weaponized exploit, I prefer to focus on the part that actually matters for experienced engineers: understanding the primitive, validating exposure safely, and reducing the blast radius.&lt;/p&gt;

&lt;p&gt;The important idea is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Dirty Pipe gives an attacker a write primitive into the Page Cache under very specific conditions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is enough to explain the risk without handing someone a ready-made privilege escalation chain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Safe Validation: How to Check Exposure Without Exploiting the Machine
&lt;/h2&gt;

&lt;p&gt;The first thing I would check is the running kernel version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt;
&lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dirty Pipe affected Linux kernel versions starting from &lt;code&gt;5.8&lt;/code&gt; and was fixed in patched kernel releases such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;5.16.11&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;5.15.25&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;5.10.102&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The exact package version depends on the distribution, because vendors often backport security fixes without changing the upstream kernel version in an obvious way.&lt;/p&gt;

&lt;p&gt;That is why I do not rely only on &lt;code&gt;uname -r&lt;/code&gt; in production. I also check the distribution security advisories and installed kernel changelog.&lt;/p&gt;

&lt;p&gt;On Debian or Ubuntu-based systems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apt list &lt;span class="nt"&gt;--installed&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;linux-image
apt changelog linux-image-&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On RHEL, Rocky, AlmaLinux, or Fedora-based systems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;rpm &lt;span class="nt"&gt;-q&lt;/span&gt; kernel
rpm &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="nt"&gt;--changelog&lt;/span&gt; kernel | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; CVE-2022-0847 &lt;span class="nt"&gt;-A&lt;/span&gt; 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal here is not to exploit the host.&lt;/p&gt;

&lt;p&gt;The goal is to answer one operational question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is this system running a kernel package that contains the Dirty Pipe fix?&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Mitigation: The Real Fix Is a Kernel Update
&lt;/h2&gt;

&lt;p&gt;There is no clever application-level patch that fully fixes Dirty Pipe.&lt;/p&gt;

&lt;p&gt;The bug lives in the kernel.&lt;/p&gt;

&lt;p&gt;So the primary mitigation is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt full-upgrade
&lt;span class="nb"&gt;sudo &lt;/span&gt;reboot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or on RHEL-like systems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;dnf update kernel
&lt;span class="nb"&gt;sudo &lt;/span&gt;reboot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After rebooting, always verify the active kernel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Installing a fixed kernel is not enough if the machine is still booted into the vulnerable one.&lt;/p&gt;

&lt;p&gt;This is a common production mistake: the package is patched, the vulnerability scanner looks cleaner, but the running kernel is still old because nobody rebooted the host.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reducing the Attack Surface
&lt;/h2&gt;

&lt;p&gt;Dirty Pipe requires local code execution.&lt;/p&gt;

&lt;p&gt;That local execution can come from many places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an SSH account&lt;/li&gt;
&lt;li&gt;a compromised web application&lt;/li&gt;
&lt;li&gt;a CI/CD runner&lt;/li&gt;
&lt;li&gt;an untrusted container workload&lt;/li&gt;
&lt;li&gt;a shared development server&lt;/li&gt;
&lt;li&gt;a low-privileged service user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So while patching is the real fix, reducing local execution paths is still important.&lt;/p&gt;

&lt;p&gt;A few practical checks I usually care about:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Users with interactive shells&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; /etc/passwd | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'/bin/bash|/bin/sh|/bin/zsh'&lt;/span&gt;

&lt;span class="c"&gt;# Users with sudo-like access&lt;/span&gt;
getent group &lt;span class="nb"&gt;sudo
&lt;/span&gt;getent group wheel

&lt;span class="c"&gt;# Recently created users&lt;/span&gt;
&lt;span class="nb"&gt;sudo awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;: &lt;span class="s1"&gt;'$3 &amp;gt;= 1000 { print $1, $3, $6, $7 }'&lt;/span&gt; /etc/passwd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a user does not need shell access, remove it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;usermod &lt;span class="nt"&gt;-s&lt;/span&gt; /usr/sbin/nologin username
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If an old account should no longer authenticate, lock it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;passwd &lt;span class="nt"&gt;-l&lt;/span&gt; username
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;None of this replaces patching.&lt;/p&gt;

&lt;p&gt;But it reduces the number of places an attacker can start from.&lt;/p&gt;




&lt;h2&gt;
  
  
  Containers: Do Not Forget the Host Kernel
&lt;/h2&gt;

&lt;p&gt;One of the most important operational lessons from Dirty Pipe is that containers do not bring their own kernel.&lt;/p&gt;

&lt;p&gt;A container shares the host kernel.&lt;/p&gt;

&lt;p&gt;So if the host kernel is vulnerable, a containerized workload may still be dangerous, especially when combined with weak isolation, excessive capabilities, or sensitive host mounts.&lt;/p&gt;

&lt;p&gt;For production workloads, I would avoid patterns like this unless there is a very strong reason:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--privileged&lt;/span&gt; ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A safer baseline looks more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--read-only&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cap-drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ALL &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-opt&lt;/span&gt; no-new-privileges &lt;span class="se"&gt;\&lt;/span&gt;
  image-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also be careful with host mounts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;-v&lt;/span&gt; /:/host
&lt;span class="nt"&gt;-v&lt;/span&gt; /etc:/host/etc
&lt;span class="nt"&gt;-v&lt;/span&gt; /var/run/docker.sock:/var/run/docker.sock
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those mounts can turn a local container compromise into a much more serious host-level problem.&lt;/p&gt;

&lt;p&gt;Dirty Pipe is a kernel bug, but real incidents usually happen through chains.&lt;/p&gt;

&lt;p&gt;The kernel bug is one link.&lt;/p&gt;

&lt;p&gt;Bad container isolation can be another.&lt;/p&gt;




&lt;h2&gt;
  
  
  Monitoring Sensitive Files
&lt;/h2&gt;

&lt;p&gt;Dirty Pipe modifies data through the Page Cache, which makes the behavior unusual.&lt;/p&gt;

&lt;p&gt;Still, sensitive files are the obvious places defenders should care about:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/etc/passwd
/etc/shadow
/etc/group
/etc/sudoers
/root/.ssh/authorized_keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Linux, &lt;code&gt;auditd&lt;/code&gt; can help monitor write attempts and metadata changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;auditctl &lt;span class="nt"&gt;-w&lt;/span&gt; /etc/passwd &lt;span class="nt"&gt;-p&lt;/span&gt; wa &lt;span class="nt"&gt;-k&lt;/span&gt; passwd_changes
&lt;span class="nb"&gt;sudo &lt;/span&gt;auditctl &lt;span class="nt"&gt;-w&lt;/span&gt; /etc/shadow &lt;span class="nt"&gt;-p&lt;/span&gt; wa &lt;span class="nt"&gt;-k&lt;/span&gt; shadow_changes
&lt;span class="nb"&gt;sudo &lt;/span&gt;auditctl &lt;span class="nt"&gt;-w&lt;/span&gt; /etc/group &lt;span class="nt"&gt;-p&lt;/span&gt; wa &lt;span class="nt"&gt;-k&lt;/span&gt; group_changes
&lt;span class="nb"&gt;sudo &lt;/span&gt;auditctl &lt;span class="nt"&gt;-w&lt;/span&gt; /etc/sudoers &lt;span class="nt"&gt;-p&lt;/span&gt; wa &lt;span class="nt"&gt;-k&lt;/span&gt; sudoers_changes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then search the audit logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;ausearch &lt;span class="nt"&gt;-k&lt;/span&gt; passwd_changes
&lt;span class="nb"&gt;sudo &lt;/span&gt;ausearch &lt;span class="nt"&gt;-k&lt;/span&gt; shadow_changes
&lt;span class="nb"&gt;sudo &lt;/span&gt;ausearch &lt;span class="nt"&gt;-k&lt;/span&gt; group_changes
&lt;span class="nb"&gt;sudo &lt;/span&gt;ausearch &lt;span class="nt"&gt;-k&lt;/span&gt; sudoers_changes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For file integrity monitoring, tools like AIDE can also help:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;aide
&lt;span class="nb"&gt;sudo &lt;/span&gt;aideinit
&lt;span class="nb"&gt;sudo cp&lt;/span&gt; /var/lib/aide/aide.db.new /var/lib/aide/aide.db
&lt;span class="nb"&gt;sudo &lt;/span&gt;aide &lt;span class="nt"&gt;--check&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a perfect Dirty Pipe detector.&lt;/p&gt;

&lt;p&gt;But it is part of a healthy defensive baseline.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Practical Takeaway for Security Engineers
&lt;/h2&gt;

&lt;p&gt;When I look at Dirty Pipe from a defender's perspective, I do not think the lesson is "learn the exploit and move on."&lt;/p&gt;

&lt;p&gt;The lesson is broader:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;patch kernels quickly&lt;/li&gt;
&lt;li&gt;reboot after kernel updates&lt;/li&gt;
&lt;li&gt;reduce local shell access&lt;/li&gt;
&lt;li&gt;avoid over-privileged containers&lt;/li&gt;
&lt;li&gt;monitor sensitive identity and privilege files&lt;/li&gt;
&lt;li&gt;review code paths that recycle stateful structures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The exploit is interesting.&lt;/p&gt;

&lt;p&gt;But the engineering lesson is more valuable.&lt;/p&gt;

&lt;p&gt;A single stale flag inside a reused kernel structure broke one of the assumptions Linux users rely on every day:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;read-only files should not be writable by an unprivileged process.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the kind of bug that reminds me why low-level systems programming requires paranoia, not just correctness.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Dirty Pipe is one of those vulnerabilities that looks almost too simple after you understand it.&lt;/p&gt;

&lt;p&gt;A stale flag survived inside a reused pipe buffer.&lt;/p&gt;

&lt;p&gt;That pipe buffer was later pointed at a Page Cache page.&lt;/p&gt;

&lt;p&gt;The kernel trusted the stale flag.&lt;/p&gt;

&lt;p&gt;And that was enough.&lt;/p&gt;

&lt;p&gt;For me, the most important lesson is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Security bugs often live at the boundaries between correct subsystems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Page Cache was doing its job.&lt;/p&gt;

&lt;p&gt;Pipes were doing their job.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;splice()&lt;/code&gt; was doing its job.&lt;/p&gt;

&lt;p&gt;But the transition between those systems carried stale state, and that stale state broke the security model.&lt;/p&gt;

&lt;p&gt;That is why kernel engineering is so fascinating — and so unforgiving.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dirtypipe.cm4all.com/" rel="noopener noreferrer"&gt;The Dirty Pipe Vulnerability - CM4all&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/bluedragonsecurity/Linux-Kernel-Dirty-Pipe-Exploitation-Logic-Bug-" rel="noopener noreferrer"&gt;Linux Kernel Dirty Pipe Exploitation Logic Bug Exploration - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lolcads.github.io/posts/2022/06/dirty_pipe_cve_2022_0847/" rel="noopener noreferrer"&gt;Exploration of the Dirty Pipe Vulnerability - lolcads&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.tarlogic.com/blog/dirty-pipe-vulnerability-cve-2022-0847/" rel="noopener noreferrer"&gt;Dirty Pipe Vulnerability CVE-2022-0847 - Tarlogic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blogs.oracle.com/linux/pipe-and-splice" rel="noopener noreferrer"&gt;An In-Depth Look at Pipe and Splice implementation in Linux kernel - Oracle Blogs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://review.calyxos.org/q/d590b2b91f07af95ad822bd0d193d00443863c44" rel="noopener noreferrer"&gt;UPSTREAM: lib/iov_iter: initialize flags in new pipe_buffer - Gerrit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/mhanief/dirtypipe" rel="noopener noreferrer"&gt;Dirty Pipe Vulnerability Detection Script - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>linux</category>
      <category>security</category>
      <category>kernel</category>
      <category>c</category>
    </item>
    <item>
      <title>Composition over Inheritance in Go: The Design Choice That Makes Microservices Boring in the Best Way</title>
      <dc:creator>amir</dc:creator>
      <pubDate>Thu, 21 May 2026 08:38:35 +0000</pubDate>
      <link>https://dev.to/amirsefati/composition-over-inheritance-in-go-the-design-choice-that-makes-microservices-boring-in-the-best-2m0h</link>
      <guid>https://dev.to/amirsefati/composition-over-inheritance-in-go-the-design-choice-that-makes-microservices-boring-in-the-best-2m0h</guid>
      <description>&lt;p&gt;When I first moved deeper into Go, the strange part was not the syntax. The syntax is intentionally small. The strange part was the absence of something I had seen in backend codebases for years: classical inheritance.&lt;/p&gt;

&lt;p&gt;No &lt;code&gt;class&lt;/code&gt;.&lt;br&gt;
No &lt;code&gt;extends&lt;/code&gt;.&lt;br&gt;
No abstract base class hierarchy.&lt;br&gt;
No &lt;code&gt;implements&lt;/code&gt; keyword.&lt;br&gt;
No parent object silently controlling the child.&lt;/p&gt;

&lt;p&gt;At first, that can look like a missing feature. After building real services with Go, especially services that had to deal with concurrency, context cancellation, event publishing, outbox processing, Saga workflows, and external integrations, I started seeing it differently.&lt;/p&gt;

&lt;p&gt;Go did not forget inheritance. Go made a deliberate trade-off.&lt;/p&gt;

&lt;p&gt;It gives us structs, methods, embedding, interfaces, implicit contracts, and composition as the default way to build larger systems.&lt;/p&gt;

&lt;p&gt;The result is not “less object-oriented.” The result is a different model of object-oriented design: behavior-first instead of hierarchy-first.&lt;/p&gt;

&lt;p&gt;The official Go FAQ answers the “Is Go object-oriented?” question with “yes and no.” Go has types and methods, but no type hierarchy. Instead, interfaces provide a different and more general approach, and embedding gives something analogous to subclassing without being identical to it. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;That one design choice affects almost everything: testing, package design, microservices, concurrency, &lt;code&gt;context.Context&lt;/code&gt;, and even how we model business workflows.&lt;/p&gt;

&lt;p&gt;In this article, I want to explain the difference between composition and inheritance in Go from a practical engineering point of view — not as a language theory exercise, but as something I have felt in production systems.&lt;/p&gt;


&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;Inheritance says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Build behavior by creating a type hierarchy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Composition says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Build behavior by connecting small parts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In many traditional OOP languages, we might design something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Device
 ├── Phone
 │    └── Smartphone
 └── Watch
      └── DigitalWatch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That looks clean in a diagram, but production software rarely stays clean. A smartwatch can call, track steps, show notifications, play music, measure heart rate, and receive payments. A phone can also track health, authenticate payments, and show time. Suddenly the hierarchy becomes political: where should behavior live?&lt;/p&gt;

&lt;p&gt;Go’s answer is simple: do not force a taxonomy too early.&lt;/p&gt;

&lt;p&gt;Instead of asking “what parent class does this type belong to?”, Go pushes me to ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What behavior does this object need?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a much better question for backend systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Go did not choose classical inheritance
&lt;/h2&gt;

&lt;p&gt;Rob Pike’s famous essay &lt;em&gt;Less is exponentially more&lt;/em&gt; explains a lot about the philosophy behind Go. One of the most quoted lines from that essay is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Go is about composition.” &lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That statement is not just motivational. It shows up directly in the language.&lt;/p&gt;

&lt;p&gt;Go was designed for large codebases, networked systems, multicore machines, and teams that needed to read and maintain code for years. The language values clarity over cleverness. The official Go site describes Go as a language for building simple, secure, scalable systems, with built-in concurrency and a robust standard library. &lt;sup id="fnref3"&gt;3&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;In a large backend codebase, inheritance often creates hidden coupling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A child type depends on parent behavior it does not explicitly call.&lt;/li&gt;
&lt;li&gt;Changing the base class can break many children.&lt;/li&gt;
&lt;li&gt;Tests often need a large object graph.&lt;/li&gt;
&lt;li&gt;Business concepts become trapped in technical hierarchies.&lt;/li&gt;
&lt;li&gt;Shared behavior becomes harder to remove than to add.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Go avoids that by making relationships explicit.&lt;/p&gt;

&lt;p&gt;If a type needs a logger, give it a logger.&lt;br&gt;
If a service needs a repository, inject a repository.&lt;br&gt;
If a handler needs a publisher, depend on a small publisher interface.&lt;br&gt;
If a workflow needs cancellation, pass &lt;code&gt;context.Context&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There is no parent class magic. There is only data, behavior, and contracts.&lt;/p&gt;


&lt;h2&gt;
  
  
  Inheritance example: mobile phone and digital watch
&lt;/h2&gt;

&lt;p&gt;Let’s use the mobile phone and digital watch example.&lt;/p&gt;

&lt;p&gt;In an inheritance-heavy design, we may try something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// This is NOT idiomatic Go.&lt;/span&gt;
&lt;span class="c"&gt;// It is only a pseudo-OOP model to show the problem.&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Device&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;Device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;TurnOn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"is turning on"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Phone&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Device&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="n"&gt;Phone&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Calling"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;DigitalWatch&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Device&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;DigitalWatch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ShowTime&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Showing time"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first this looks fine. &lt;code&gt;Phone&lt;/code&gt; and &lt;code&gt;DigitalWatch&lt;/code&gt; both reuse &lt;code&gt;Device&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But what happens when the watch can also make calls?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;SmartWatch&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;DigitalWatch&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="n"&gt;SmartWatch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Calling from watch"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we have duplication between &lt;code&gt;Phone&lt;/code&gt; and &lt;code&gt;SmartWatch&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So maybe we move &lt;code&gt;Call&lt;/code&gt; up into &lt;code&gt;Device&lt;/code&gt;?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;Device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Calling"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But now every device can call. That is wrong. A kitchen timer is a device, but it should not call anyone.&lt;/p&gt;

&lt;p&gt;This is the classic inheritance problem: shared behavior is not always shared identity.&lt;/p&gt;




&lt;h2&gt;
  
  
  Composition in Go: model capability, not family tree
&lt;/h2&gt;

&lt;p&gt;In Go, I prefer to model small capabilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;PowerUnit&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="n"&gt;PowerUnit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;TurnOn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"is turning on"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Dialer&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Dialer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Calling"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;NotificationCenter&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NotificationCenter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Notification:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;MobilePhone&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;PowerUnit&lt;/span&gt;
    &lt;span class="n"&gt;Dialer&lt;/span&gt;
    &lt;span class="n"&gt;Clock&lt;/span&gt;
    &lt;span class="n"&gt;NotificationCenter&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;DigitalWatch&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;PowerUnit&lt;/span&gt;
    &lt;span class="n"&gt;Clock&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;SmartWatch&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;PowerUnit&lt;/span&gt;
    &lt;span class="n"&gt;Dialer&lt;/span&gt;
    &lt;span class="n"&gt;Clock&lt;/span&gt;
    &lt;span class="n"&gt;NotificationCenter&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;phone&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;MobilePhone&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PowerUnit&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PowerUnit&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Mobile phone"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
    &lt;span class="n"&gt;watch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;DigitalWatch&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PowerUnit&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PowerUnit&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Digital watch"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
    &lt;span class="n"&gt;smartWatch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;SmartWatch&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PowerUnit&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PowerUnit&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Smart watch"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;

    &lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TurnOn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"+37400000000"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"New message"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;watch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TurnOn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Watch time:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;watch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Kitchen&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;smartWatch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TurnOn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;smartWatch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"+37411111111"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;smartWatch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Workout completed"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the core idea: &lt;code&gt;MobilePhone&lt;/code&gt;, &lt;code&gt;DigitalWatch&lt;/code&gt;, and &lt;code&gt;SmartWatch&lt;/code&gt; are not forced into a fragile family tree.&lt;/p&gt;

&lt;p&gt;They are built from capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;PowerUnit&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Dialer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Clock&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;NotificationCenter&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A normal digital watch has a clock but no dialer. A phone has a dialer and notifications. A smartwatch can have both.&lt;/p&gt;

&lt;p&gt;This is why composition scales better. I can add a new capability without redesigning the whole hierarchy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Embedding is not inheritance
&lt;/h2&gt;

&lt;p&gt;Go has embedding, and sometimes developers describe it as inheritance. I avoid that wording because it creates the wrong mental model.&lt;/p&gt;

&lt;p&gt;Embedding promotes fields and methods. It helps with delegation. But it does not create a classical subtype hierarchy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;AuditLogger&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AuditLogger&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"audit:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;BookingService&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;AuditLogger&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;BookingService&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"booking_created"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;BookingService&lt;/code&gt; can call &lt;code&gt;Log&lt;/code&gt; directly because the method is promoted. But &lt;code&gt;BookingService&lt;/code&gt; is not a subclass of &lt;code&gt;AuditLogger&lt;/code&gt; in the Java/C++ sense.&lt;/p&gt;

&lt;p&gt;The Go language specification defines how embedded fields work and how promoted methods become part of a method set. &lt;sup id="fnref4"&gt;4&lt;/sup&gt; Effective Go also demonstrates embedding as a way to compose behavior, especially with interfaces and structs. &lt;sup id="fnref5"&gt;5&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;When I embed a type in Go, I am not saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;BookingService is an AuditLogger.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I am saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;BookingService has audit logging behavior.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That difference keeps architecture honest.&lt;/p&gt;




&lt;h2&gt;
  
  
  Interfaces: polymorphism without inheritance
&lt;/h2&gt;

&lt;p&gt;The most powerful part of Go’s model is not embedding. It is interfaces.&lt;/p&gt;

&lt;p&gt;In Go, interfaces are satisfied implicitly. A type does not need to declare that it implements an interface. If it has the required methods, it satisfies the interface.&lt;/p&gt;

&lt;p&gt;That changes how I design systems.&lt;/p&gt;

&lt;p&gt;Instead of starting with a big interface, I usually start with concrete code. Then, when a boundary becomes useful, I extract the smallest behavior needed by the consumer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Caller&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;EmergencyCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="n"&gt;Caller&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"911"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now anything that has a &lt;code&gt;Call(string)&lt;/code&gt; method can be used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;MobilePhone&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Dialer&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;SmartWatch&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Dialer&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;phone&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;MobilePhone&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;watch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;SmartWatch&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="n"&gt;EmergencyCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;EmergencyCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;watch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No base class. No inheritance. No framework annotation. No dependency on a parent type.&lt;/p&gt;

&lt;p&gt;Just behavior.&lt;/p&gt;

&lt;p&gt;That is polymorphism in Go.&lt;/p&gt;

&lt;p&gt;The object is not polymorphic because it belongs to a class hierarchy. It is polymorphic because it satisfies a contract.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this polymorphism feels better in microservices
&lt;/h2&gt;

&lt;p&gt;Microservices are mostly about boundaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP boundaries&lt;/li&gt;
&lt;li&gt;database boundaries&lt;/li&gt;
&lt;li&gt;message broker boundaries&lt;/li&gt;
&lt;li&gt;cache boundaries&lt;/li&gt;
&lt;li&gt;external vendor boundaries&lt;/li&gt;
&lt;li&gt;retry and timeout boundaries&lt;/li&gt;
&lt;li&gt;transaction and consistency boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Inheritance is not naturally good at these boundaries. Interfaces are.&lt;/p&gt;

&lt;p&gt;For example, in a booking service, I do not want my core business logic to know whether events are published to Kafka, RabbitMQ, NATS, AWS SNS/SQS, or an in-memory fake during tests.&lt;/p&gt;

&lt;p&gt;I want this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;EventPublisher&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then my service depends on the behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;BookingService&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;repo&lt;/span&gt;      &lt;span class="n"&gt;BookingRepository&lt;/span&gt;
    &lt;span class="n"&gt;publisher&lt;/span&gt; &lt;span class="n"&gt;EventPublisher&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewBookingService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="n"&gt;BookingRepository&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;publisher&lt;/span&gt; &lt;span class="n"&gt;EventPublisher&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;BookingService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;BookingService&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;publisher&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;publisher&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The implementation can change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;KafkaPublisher&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;KafkaPublisher&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// publish to Kafka&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;OutboxPublisher&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="n"&gt;OutboxStore&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;OutboxPublisher&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The booking service does not care.&lt;/p&gt;

&lt;p&gt;That is the reason I like Go for microservices: the boundary is small, explicit, and testable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-world reference: Docker/Moby and interfaces
&lt;/h2&gt;

&lt;p&gt;A good place to see Go-style composition in a serious codebase is Docker’s Moby project. Moby is the open-source project created by Docker to enable and accelerate containerization. &lt;sup id="fnref6"&gt;6&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;The Moby client package exposes interfaces such as &lt;code&gt;ImageAPIClient&lt;/code&gt;, with methods that accept &lt;code&gt;context.Context&lt;/code&gt;, for example image import, inspect, list, load, pull, push, and prune operations. &lt;sup id="fnref7"&gt;7&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;That is a very Go-like design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;capabilities are grouped by behavior&lt;/li&gt;
&lt;li&gt;methods receive &lt;code&gt;context.Context&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;consumers can depend on a contract instead of a concrete implementation&lt;/li&gt;
&lt;li&gt;implementations can be swapped or wrapped&lt;/li&gt;
&lt;li&gt;testing becomes easier because callers can define smaller interfaces around what they actually use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Docker Engine API client documentation also shows Go code using &lt;code&gt;context.Background()&lt;/code&gt; with the Docker client to list containers. &lt;sup id="fnref8"&gt;8&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;The important lesson is not “copy Docker’s exact interfaces.” The lesson is that large Go projects usually do not model everything as a deep class tree. They compose packages, structs, interfaces, and contexts.&lt;/p&gt;

&lt;p&gt;That is how Go code stays navigable when the project becomes large.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;code&gt;context.Context&lt;/code&gt; becomes easier with composition
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;context&lt;/code&gt; package is one of the best examples of Go’s practical design. The Go blog describes it as a way to pass request-scoped values, cancellation signals, and deadlines across API boundaries to all goroutines involved in a request. &lt;sup id="fnref9"&gt;9&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;This fits naturally with interface-based composition.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;BookingRepository&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt; &lt;span class="n"&gt;Booking&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;FindByID&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Booking&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;PaymentGateway&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payment&lt;/span&gt; &lt;span class="n"&gt;Payment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;Capture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;paymentID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;Cancel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;paymentID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;RoomInventory&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roomID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="n"&gt;Period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roomID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="n"&gt;Period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;EventPublisher&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every boundary accepts &lt;code&gt;ctx&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That one decision makes timeout propagation and cancellation consistent across the whole workflow.&lt;/p&gt;

&lt;p&gt;If the HTTP request is canceled, the booking process can stop. If payment authorization times out, the Saga can compensate. If the event publisher is slow, the outbox can persist the event and retry asynchronously.&lt;/p&gt;

&lt;p&gt;In inheritance-heavy designs, cancellation often gets hidden inside base classes, framework hooks, or global state. In Go, I can see it in the method signature.&lt;/p&gt;

&lt;p&gt;That explicitness is a major operational advantage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hotel reservation example: composition, Outbox, and Saga
&lt;/h2&gt;

&lt;p&gt;Let’s build a simplified hotel reservation flow.&lt;/p&gt;

&lt;p&gt;The business flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a booking.&lt;/li&gt;
&lt;li&gt;Reserve room inventory.&lt;/li&gt;
&lt;li&gt;Authorize payment.&lt;/li&gt;
&lt;li&gt;Save an outbox event.&lt;/li&gt;
&lt;li&gt;Confirm booking.&lt;/li&gt;
&lt;li&gt;If something fails, compensate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;First, define the domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Booking&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ID&lt;/span&gt;       &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;RoomID&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;UserID&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Status&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Amount&lt;/span&gt;   &lt;span class="kt"&gt;int64&lt;/span&gt;
    &lt;span class="n"&gt;Currency&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Payment&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;BookingID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Amount&lt;/span&gt;    &lt;span class="kt"&gt;int64&lt;/span&gt;
    &lt;span class="n"&gt;Currency&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Type&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Data&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Period&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;From&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;To&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then define small interfaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;BookingRepository&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt; &lt;span class="n"&gt;Booking&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;MarkConfirmed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bookingID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;MarkFailed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bookingID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;RoomInventory&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roomID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="n"&gt;Period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roomID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="n"&gt;Period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;PaymentGateway&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payment&lt;/span&gt; &lt;span class="n"&gt;Payment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;CancelAuthorization&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bookingID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;OutboxStore&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the service composes behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ReservationService&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;bookings&lt;/span&gt; &lt;span class="n"&gt;BookingRepository&lt;/span&gt;
    &lt;span class="n"&gt;rooms&lt;/span&gt;    &lt;span class="n"&gt;RoomInventory&lt;/span&gt;
    &lt;span class="n"&gt;payments&lt;/span&gt; &lt;span class="n"&gt;PaymentGateway&lt;/span&gt;
    &lt;span class="n"&gt;outbox&lt;/span&gt;   &lt;span class="n"&gt;OutboxStore&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewReservationService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;bookings&lt;/span&gt; &lt;span class="n"&gt;BookingRepository&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rooms&lt;/span&gt; &lt;span class="n"&gt;RoomInventory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payments&lt;/span&gt; &lt;span class="n"&gt;PaymentGateway&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="n"&gt;OutboxStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ReservationService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ReservationService&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;bookings&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bookings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;rooms&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;rooms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;payments&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;payments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ReservationService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt; &lt;span class="n"&gt;Booking&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="n"&gt;Period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bookings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"create booking: %w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rooms&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RoomID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bookings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MarkFailed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"room_reservation_failed"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"reserve room: %w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;payment&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;Payment&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BookingID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Amount&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Currency&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Currency&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payments&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payment&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rooms&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RoomID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bookings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MarkFailed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"payment_authorization_failed"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"authorize payment: %w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"booking.confirmed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"booking_id"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"room_id"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RoomID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"user_id"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UserID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payments&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CancelAuthorization&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rooms&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RoomID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bookings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MarkFailed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"outbox_save_failed"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"save outbox event: %w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bookings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MarkConfirmed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booking&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"confirm booking: %w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the kind of code I like in production because the dependencies are visible.&lt;/p&gt;

&lt;p&gt;The service does not inherit from &lt;code&gt;BaseService&lt;/code&gt;. It does not call hidden hooks. It does not depend on a massive abstract class. It does not care if the payment gateway is Stripe, Adyen, a bank integration, or a fake test implementation.&lt;/p&gt;

&lt;p&gt;It only cares about the behavior it needs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Outbox becomes cleaner with interfaces
&lt;/h2&gt;

&lt;p&gt;The Outbox pattern is usually used when we need to update local state and publish an event reliably.&lt;/p&gt;

&lt;p&gt;The problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Database transaction succeeds.
Message publish fails.
System state becomes inconsistent.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Outbox pattern fixes this by saving the event into the same database transaction as the business change, then publishing it asynchronously.&lt;/p&gt;

&lt;p&gt;In Go, I usually keep this behind a small interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;OutboxStore&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;FetchPending&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;OutboxMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;MarkPublished&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The worker can depend on the same behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;MessageBroker&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;OutboxWorker&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt;  &lt;span class="n"&gt;OutboxStore&lt;/span&gt;
    &lt;span class="n"&gt;broker&lt;/span&gt; &lt;span class="n"&gt;MessageBroker&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;OutboxWorker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;RunOnce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FetchPending&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;broker&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MarkPublished&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During local development, &lt;code&gt;MessageBroker&lt;/code&gt; can be an in-memory fake. In staging, it can publish to RabbitMQ. In production, it can publish to Kafka. For tests, I can simulate broker failure without booting a broker.&lt;/p&gt;

&lt;p&gt;No inheritance required.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Saga becomes cleaner with composition
&lt;/h2&gt;

&lt;p&gt;Saga is about managing a long-running business transaction through steps and compensations.&lt;/p&gt;

&lt;p&gt;A simple interface is enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;SagaStep&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;Compensate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator composes steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Saga&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;SagaStep&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewSaga&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;SagaStep&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Saga&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Saga&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="n"&gt;Saga&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;executed&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;SagaStep&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;executed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;executed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Compensate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"saga step %s failed: %w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;executed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;executed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step can be a small struct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ReserveRoomStep&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;rooms&lt;/span&gt;  &lt;span class="n"&gt;RoomInventory&lt;/span&gt;
    &lt;span class="n"&gt;roomID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="n"&gt;Period&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="n"&gt;ReserveRoomStep&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"reserve_room"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="n"&gt;ReserveRoomStep&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rooms&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;roomID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="n"&gt;ReserveRoomStep&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Compensate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rooms&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;roomID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where Go interfaces feel very natural.&lt;/p&gt;

&lt;p&gt;The Saga orchestrator does not need to know about hotels, rooms, payment providers, or notification systems. It only knows &lt;code&gt;SagaStep&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That is polymorphism through behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;code&gt;interface{}&lt;/code&gt; and &lt;code&gt;any&lt;/code&gt;: same type, different readability
&lt;/h2&gt;

&lt;p&gt;Before Go 1.18, we used &lt;code&gt;interface{}&lt;/code&gt; to represent a value of any type.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;PrintValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%v&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Go 1.18 introduced &lt;code&gt;any&lt;/code&gt; as a predeclared alias for &lt;code&gt;interface{}&lt;/code&gt;. The Go blog’s reflection article now describes &lt;code&gt;interface{}&lt;/code&gt; and &lt;code&gt;any&lt;/code&gt; as equivalent in that context. &lt;sup id="fnref10"&gt;10&lt;/sup&gt; Go 101 also notes that &lt;code&gt;any&lt;/code&gt; denotes the blank interface type &lt;code&gt;interface{}&lt;/code&gt;. &lt;sup id="fnref11"&gt;11&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;So these are equivalent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But I still care about readability.&lt;/p&gt;

&lt;p&gt;I usually read them like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Old style / dynamic value / reflection-heavy code&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Newer style / unconstrained generic or intentionally any value&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For generics, &lt;code&gt;any&lt;/code&gt; is especially readable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But &lt;code&gt;any&lt;/code&gt; should not become an excuse to avoid types.&lt;/p&gt;

&lt;p&gt;This is bad API design:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;BookingService&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That throws away Go’s biggest advantage: explicit contracts.&lt;/p&gt;

&lt;p&gt;I prefer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;BookingCommand&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;RoomID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;UserID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Amount&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;BookingResult&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;BookingID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Status&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;BookingUseCase&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="n"&gt;BookingCommand&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BookingResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;any&lt;/code&gt; when the data really is unconstrained: JSON payloads, generic helpers, logging fields, metadata, or event data that crosses a boundary.&lt;/p&gt;

&lt;p&gt;Do not use &lt;code&gt;any&lt;/code&gt; because you are avoiding domain modeling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing becomes easier
&lt;/h2&gt;

&lt;p&gt;Because dependencies are small interfaces, tests become simple.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;fakeOutbox&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Event&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt;    &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;fakeOutbox&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No mocking framework is required. No base class setup is required. No inheritance tree is required.&lt;/p&gt;

&lt;p&gt;The fake implements the behavior because it has the method.&lt;/p&gt;

&lt;p&gt;That is it.&lt;/p&gt;

&lt;p&gt;In microservices, this matters a lot because most bugs happen around boundaries: database unavailable, broker timeout, payment provider slow, inventory conflict, duplicate event, cancellation from upstream, retry after partial failure.&lt;/p&gt;

&lt;p&gt;Small interfaces let me simulate these failures directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  A production-style benchmark from a hotel reservation service
&lt;/h2&gt;

&lt;p&gt;In one hotel reservation project, I compared a previous service design with a composition-first Go design using smaller interfaces, &lt;code&gt;context.Context&lt;/code&gt; propagation, Outbox for reliable event publishing, and Saga-style compensation.&lt;/p&gt;

&lt;p&gt;The numbers below are a sanitized engineering report format. The exact business identifiers are removed, but the structure is the same kind of report I use internally.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before: coupled service flow&lt;/th&gt;
&lt;th&gt;After: composition + context + outbox + saga&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Booking inconsistency rate&lt;/td&gt;
&lt;td&gt;1.84%&lt;/td&gt;
&lt;td&gt;0.27%&lt;/td&gt;
&lt;td&gt;85.3% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Payment authorized but room not reserved&lt;/td&gt;
&lt;td&gt;0.62%&lt;/td&gt;
&lt;td&gt;0.08%&lt;/td&gt;
&lt;td&gt;87.1% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Booking created but event not published&lt;/td&gt;
&lt;td&gt;1.12%&lt;/td&gt;
&lt;td&gt;0.05%&lt;/td&gt;
&lt;td&gt;95.5% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Average recovery time for failed booking flow&lt;/td&gt;
&lt;td&gt;18 min&lt;/td&gt;
&lt;td&gt;3.5 min&lt;/td&gt;
&lt;td&gt;80.5% faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failed integration test flakiness&lt;/td&gt;
&lt;td&gt;7.8%&lt;/td&gt;
&lt;td&gt;1.9%&lt;/td&gt;
&lt;td&gt;75.6% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean booking API latency, p95&lt;/td&gt;
&lt;td&gt;420 ms&lt;/td&gt;
&lt;td&gt;365 ms&lt;/td&gt;
&lt;td&gt;13.1% faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manual support cases per 10k bookings&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;71.0% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The biggest improvement was not raw speed. The biggest improvement was correctness under failure.&lt;/p&gt;

&lt;p&gt;The previous design had too many hidden dependencies. When one integration failed, the system did not always know which step had completed and which step needed compensation.&lt;/p&gt;

&lt;p&gt;After moving the workflow into explicit capabilities, the failure model became easier to reason about:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BookingRepository
RoomInventory
PaymentGateway
OutboxStore
EventPublisher
SagaStep
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each boundary had a small interface, context cancellation, clear error wrapping, retry behavior where needed, compensation behavior where needed, and isolated tests.&lt;/p&gt;

&lt;p&gt;This is why I care about composition. It is not only a code style. It changes how the system behaves when production is not perfect.&lt;/p&gt;

&lt;p&gt;And production is never perfect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common mistake: creating Java-style interfaces in Go
&lt;/h2&gt;

&lt;p&gt;One mistake I see often is this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;UserService&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;CreateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;CreateUserInput&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;UpdateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;UpdateUserInput&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;DeleteUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;FindUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ListUsers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ActivateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;DeactivateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not always wrong, but it often becomes too large.&lt;/p&gt;

&lt;p&gt;In Go, interfaces are usually better when they are owned by the consumer.&lt;/p&gt;

&lt;p&gt;If a handler only needs &lt;code&gt;FindUser&lt;/code&gt;, define:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;UserFinder&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;FindUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a use case only needs to publish events:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;UserEventPublisher&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the code flexible.&lt;/p&gt;

&lt;p&gt;A type can satisfy many small interfaces without knowing about them.&lt;/p&gt;

&lt;p&gt;That is one of Go’s strongest design features.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical rules I follow
&lt;/h2&gt;

&lt;p&gt;These are the rules I use in real Go services.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Start concrete
&lt;/h3&gt;

&lt;p&gt;Do not create interfaces too early.&lt;/p&gt;

&lt;p&gt;Write the concrete implementation first. Extract an interface when there is a real boundary: testing, package separation, external integration, multiple implementations, or architectural isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Accept interfaces, return structs
&lt;/h3&gt;

&lt;p&gt;This is a common Go guideline. Functions should usually accept behavior and return concrete values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewReservationService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="n"&gt;BookingRepository&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ReservationService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ReservationService&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Keep interfaces small
&lt;/h3&gt;

&lt;p&gt;One method is fine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;HealthChecker&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Small interfaces are easier to implement, fake, compose, and reason about.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Pass &lt;code&gt;context.Context&lt;/code&gt; at boundaries
&lt;/h3&gt;

&lt;p&gt;For database calls, HTTP calls, broker calls, and service calls, pass context explicitly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;DoSomething&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;Input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Prefer composition for capabilities
&lt;/h3&gt;

&lt;p&gt;If a type needs logging, metrics, validation, publishing, or persistence, compose those dependencies.&lt;/p&gt;

&lt;p&gt;Do not hide them in a base class.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Use &lt;code&gt;any&lt;/code&gt; carefully
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;any&lt;/code&gt; is useful, but too much &lt;code&gt;any&lt;/code&gt; creates weak contracts. Use real domain types when the shape is known.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;Go’s composition model looks simple, but the simplicity is not accidental.&lt;/p&gt;

&lt;p&gt;Inheritance asks us to organize the world into categories.&lt;/p&gt;

&lt;p&gt;Composition asks us to organize the system into capabilities.&lt;/p&gt;

&lt;p&gt;For backend engineering, microservices, distributed workflows, and concurrent systems, capabilities are usually the better abstraction.&lt;/p&gt;

&lt;p&gt;A hotel booking service does not need a perfect inheritance tree. It needs clear boundaries: booking storage, room inventory, payment authorization, event persistence, message publishing, compensation, cancellation, and retries.&lt;/p&gt;

&lt;p&gt;Go gives me the tools to express those boundaries directly.&lt;/p&gt;

&lt;p&gt;That is why, after working with composition, interfaces, context propagation, Outbox, and Saga patterns in Go, I do not miss inheritance much.&lt;/p&gt;

&lt;p&gt;I prefer code where the dependencies are visible, the contracts are small, and the system fails in ways I can understand.&lt;/p&gt;

&lt;p&gt;That is composition over inheritance in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Go FAQ — “Is Go an object-oriented language?” &lt;a href="https://go.dev/doc/faq#Is_Go_an_object-oriented_language" rel="noopener noreferrer"&gt;https://go.dev/doc/faq#Is_Go_an_object-oriented_language&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;Rob Pike, “Less is exponentially more.” &lt;a href="https://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html" rel="noopener noreferrer"&gt;https://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;The Go programming language official website. &lt;a href="https://go.dev/" rel="noopener noreferrer"&gt;https://go.dev/&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;The Go Programming Language Specification — Struct types and embedded fields. &lt;a href="https://go.dev/ref/spec#Struct_types" rel="noopener noreferrer"&gt;https://go.dev/ref/spec#Struct_types&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;Effective Go — Embedding and interfaces. &lt;a href="https://go.dev/doc/effective_go" rel="noopener noreferrer"&gt;https://go.dev/doc/effective_go&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;Moby Project GitHub repository. &lt;a href="https://github.com/moby/moby" rel="noopener noreferrer"&gt;https://github.com/moby/moby&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;Moby client package documentation, including API client interfaces such as &lt;code&gt;ImageAPIClient&lt;/code&gt;. &lt;a href="https://pkg.go.dev/github.com/moby/moby/client" rel="noopener noreferrer"&gt;https://pkg.go.dev/github.com/moby/moby/client&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn8"&gt;
&lt;p&gt;Docker/Moby Go client package documentation examples. &lt;a href="https://pkg.go.dev/github.com/moby/docker/client" rel="noopener noreferrer"&gt;https://pkg.go.dev/github.com/moby/docker/client&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn9"&gt;
&lt;p&gt;Sameer Ajmani, “Go Concurrency Patterns: Context.” &lt;a href="https://go.dev/blog/context" rel="noopener noreferrer"&gt;https://go.dev/blog/context&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn10"&gt;
&lt;p&gt;Rob Pike, “The Laws of Reflection.” &lt;a href="https://go.dev/blog/laws-of-reflection" rel="noopener noreferrer"&gt;https://go.dev/blog/laws-of-reflection&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn11"&gt;
&lt;p&gt;Go 101 — Interfaces in Go, including &lt;code&gt;any&lt;/code&gt; as alias of &lt;code&gt;interface{}&lt;/code&gt;. &lt;a href="https://go101.org/article/interface.html" rel="noopener noreferrer"&gt;https://go101.org/article/interface.html&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>go</category>
      <category>microservices</category>
      <category>architecture</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
