<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Benard Otieno</title>
    <description>The latest articles on DEV Community by Benard Otieno (@benard_otieno_cdb9e6d4907).</description>
    <link>https://dev.to/benard_otieno_cdb9e6d4907</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3906480%2Ff72511ca-f111-4857-a72a-9939bb70a34b.png</url>
      <title>DEV Community: Benard Otieno</title>
      <link>https://dev.to/benard_otieno_cdb9e6d4907</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/benard_otieno_cdb9e6d4907"/>
    <language>en</language>
    <item>
      <title>Event-Driven Architecture: An Honest Assessment</title>
      <dc:creator>Benard Otieno</dc:creator>
      <pubDate>Sun, 24 May 2026 08:15:29 +0000</pubDate>
      <link>https://dev.to/benard_otieno_cdb9e6d4907/event-driven-architecture-an-honest-assessment-3jf6</link>
      <guid>https://dev.to/benard_otieno_cdb9e6d4907/event-driven-architecture-an-honest-assessment-3jf6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Event-driven systems are elegant in talks and brutal in production. After building and operating them across multiple companies, here is what nobody tells you before you commit to the pattern.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every few years the industry rediscovers event-driven architecture and&lt;br&gt;
decides it is the answer. The talks are compelling. Services decoupled&lt;br&gt;
from each other. No direct dependencies. Producers that emit events&lt;br&gt;
and never think about who consumes them. Consumers that react to what&lt;br&gt;
happened and never worry about who caused it. The system as a whole&lt;br&gt;
becomes a collection of independent actors responding to a shared stream&lt;br&gt;
of facts about the world.&lt;/p&gt;

&lt;p&gt;In the talk, this is clean. In production, it is one of the most&lt;br&gt;
operationally demanding patterns in software engineering, and the&lt;br&gt;
gap between how it is pitched and what it costs to run it well is&lt;br&gt;
wider than almost any other architectural pattern I can think of.&lt;/p&gt;

&lt;p&gt;I have built event-driven systems that worked well and event-driven&lt;br&gt;
systems that were disasters. The difference was not the technology&lt;br&gt;
and it was not the team's capability. It was whether the team went&lt;br&gt;
in with an accurate picture of what they were buying. Most teams&lt;br&gt;
do not get that picture before they commit. This article is an&lt;br&gt;
attempt to provide it.&lt;/p&gt;
&lt;h2&gt;
  
  
  What you actually get
&lt;/h2&gt;

&lt;p&gt;Start with what is genuinely good, because there is genuine good here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decoupling that is real.&lt;/strong&gt; When the order service publishes an&lt;br&gt;
OrderPlaced event and knows nothing about who consumes it, and when&lt;br&gt;
the inventory service consumes OrderPlaced and knows nothing about&lt;br&gt;
who published it, you have achieved something meaningful. Either&lt;br&gt;
service can be redeployed without the other. Either can evolve&lt;br&gt;
its internal implementation without negotiating with the other.&lt;br&gt;
A new service can start consuming OrderPlaced tomorrow without&lt;br&gt;
touching the order service at all.&lt;/p&gt;

&lt;p&gt;This decoupling is the thing that makes large organisations with&lt;br&gt;
many teams possible. The team that owns the order service does not&lt;br&gt;
need to be in a meeting with every team that cares about orders.&lt;br&gt;
They publish the event. Every consumer team builds and maintains&lt;br&gt;
their own reaction to it. The coordination that would have been&lt;br&gt;
synchronous and blocking becomes asynchronous and independent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit trails that emerge naturally.&lt;/strong&gt; If your events are your&lt;br&gt;
source of truth, you have a complete record of everything that&lt;br&gt;
happened in your system in the order it happened. Not just the&lt;br&gt;
current state, but the history of how you got there. This is&lt;br&gt;
genuinely useful for debugging, for compliance, and for the class&lt;br&gt;
of bugs that are almost impossible to diagnose without knowing&lt;br&gt;
what sequence of events preceded them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Load handling that is structural rather than bolted on.&lt;/strong&gt; A&lt;br&gt;
consumer that reads from a queue processes work at the rate it&lt;br&gt;
can handle, regardless of the rate at which work arrives. The&lt;br&gt;
queue absorbs the spike. The consumer processes the backlog when&lt;br&gt;
capacity is available. This is structurally different from a&lt;br&gt;
synchronous system where a traffic spike hits the service directly&lt;br&gt;
and the service either handles it or falls over.&lt;/p&gt;

&lt;p&gt;These are real benefits. They are worth having. They are also&lt;br&gt;
not free.&lt;/p&gt;
&lt;h2&gt;
  
  
  The cost nobody quotes you upfront
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Eventual consistency is not a configuration option, it is a&lt;br&gt;
commitment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you move from synchronous calls to events, you give up the&lt;br&gt;
ability to know, at any given moment, that all parts of your system&lt;br&gt;
agree on the current state. The order was placed. The OrderPlaced&lt;br&gt;
event was published. The inventory service will consume it and&lt;br&gt;
reserve the stock. When? Soon. How soon? Depends on the consumer's&lt;br&gt;
lag. What if the user queries their order status right now, before&lt;br&gt;
the inventory service has processed the event? The order exists&lt;br&gt;
in the order service's view. The inventory has not yet been&lt;br&gt;
reserved. The system is in an intermediate state that is internally&lt;br&gt;
consistent but not yet globally consistent.&lt;/p&gt;

&lt;p&gt;For many use cases this is acceptable. For some it is not. The&lt;br&gt;
teams that adopt event-driven architecture without thinking carefully&lt;br&gt;
about which of their use cases fall into which category discover&lt;br&gt;
the hard way that "eventually" can mean milliseconds, seconds,&lt;br&gt;
or minutes depending on what else is happening, and that users&lt;br&gt;
do not have the same patience for eventual consistency that&lt;br&gt;
architects do.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The problem that bites teams who haven't thought this through:
&lt;/span&gt;
&lt;span class="c1"&gt;# User places an order. Order service publishes event.
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;place_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;event_bus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OrderPlaced&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;

&lt;span class="c1"&gt;# User immediately queries order status.
# Order service returns order with status "pending".
# Inventory service has not yet processed the event.
# Frontend shows "pending" with a spinner.
# User refreshes. Still pending. Refreshes again.
# Inventory event processes. Status updates to "confirmed".
# User has refreshed four times and is on the phone with support.
&lt;/span&gt;
&lt;span class="c1"&gt;# The system was correct the entire time.
# The user experience was broken the entire time.
# These are not contradictory.
&lt;/span&gt;
&lt;span class="c1"&gt;# What teams often do to address this:
# Read-your-own-writes consistency for the immediate response,
# combined with a clear UI state that communicates processing is happening.
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;place_order_with_ux_in_mind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;event_bus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OrderPlaced&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your order is being confirmed. This usually takes a few seconds.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;poll_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/orders/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;# Give the client a signal about what to do next,
&lt;/span&gt;        &lt;span class="c1"&gt;# rather than leaving them in ambiguous pending state
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Debugging across services is a different discipline entirely.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a synchronous system, a bug has a call stack. You look at the&lt;br&gt;
stack trace and you see exactly what called what in what order.&lt;br&gt;
The sequence is right there.&lt;/p&gt;

&lt;p&gt;In an event-driven system, the equivalent of a call stack is a&lt;br&gt;
trace across multiple services, potentially across multiple events,&lt;br&gt;
potentially hours or days apart. OrderPlaced fires. InventoryReserved&lt;br&gt;
fires. PaymentProcessed fires. FulfillmentCreated fires. ShipmentCreated&lt;br&gt;
fires. The user reports that their order is stuck. You need to find&lt;br&gt;
where in this sequence something went wrong, knowing that each step&lt;br&gt;
is in a different service with different logs, possibly with events&lt;br&gt;
that were consumed out of order, possibly with a consumer that failed&lt;br&gt;
silently and moved on.&lt;/p&gt;

&lt;p&gt;Without distributed tracing that propagates correlation IDs across&lt;br&gt;
every event, this debugging is archaeology. You are sifting through&lt;br&gt;
log files from multiple services trying to reconstruct what happened&lt;br&gt;
to a specific order that a specific user placed at a specific time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Event envelope that makes debugging survivable.
# Every event carries a correlation ID from the originating request.
# Every subsequent event in the chain inherits it.
# Every service logs it with every operation related to the event.
# When debugging, filter all service logs by correlation ID
# and you reconstruct the full sequence.
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;


&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="n"&gt;causation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;    &lt;span class="c1"&gt;# ID of the event that caused this one
&lt;/span&gt;    &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;schema_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;caused_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parent_event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;correlation_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parent_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;correlation_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;causation_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parent_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;


&lt;span class="c1"&gt;# When the inventory service handles OrderPlaced and emits InventoryReserved:
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_order_placed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inventory.reservation.started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;reserve_inventory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;reserved_event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;InventoryReserved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;caused_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# Inherits correlation_id, sets causation_id
&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;event_bus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reserved_event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inventory.reservation.completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;caused_event_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reserved_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this envelope, every event in a chain shares a correlation ID.&lt;br&gt;
Every log line from every service that handled any event in the chain&lt;br&gt;
includes that correlation ID. Debugging a stuck order is a single&lt;br&gt;
log query: show me every log line with this correlation ID, across&lt;br&gt;
all services, sorted by time.&lt;/p&gt;

&lt;p&gt;Without this, you do not have an event-driven system you can operate.&lt;br&gt;
You have a system that works until it breaks and then you cannot&lt;br&gt;
find out why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consumer failures are invisible by default.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a synchronous system, a failure is loud. The call throws an&lt;br&gt;
exception. The caller gets an error. Someone notices.&lt;/p&gt;

&lt;p&gt;In an event-driven system, a consumer failure can be completely&lt;br&gt;
silent. The consumer reads the event, fails to process it, and&lt;br&gt;
depending on how it is configured, either requeues the event, moves&lt;br&gt;
it to a dead letter queue, or discards it and moves on. The producer&lt;br&gt;
never knows. The other consumers never know. The user whose order&lt;br&gt;
triggered the event never knows until they notice that something&lt;br&gt;
downstream has not happened.&lt;/p&gt;

&lt;p&gt;Dead letter queues are the standard answer to this and they are&lt;br&gt;
the right answer, but they are only useful if someone is watching&lt;br&gt;
them. A dead letter queue that nobody monitors is not a safety net.&lt;br&gt;
It is a place where failed events go to be forgotten.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Dead letter queue monitoring that actually alerts
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;prometheus_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Gauge&lt;/span&gt;

&lt;span class="n"&gt;sqs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;DLQ_DEPTH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqs_dlq_message_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Number of messages in dead letter queue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queue_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_dlq_depths&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;queues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order-processing-dlq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://sqs.region.amazonaws.com/account/order-processing-dlq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inventory-dlq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://sqs.region.amazonaws.com/account/inventory-dlq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payment-dlq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://sqs.region.amazonaws.com/account/payment-dlq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fulfillment-dlq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://sqs.region.amazonaws.com/account/fulfillment-dlq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;queue_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;queue_url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_queue_attributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;QueueUrl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;queue_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;AttributeNames&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ApproximateNumberOfMessages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attributes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ApproximateNumberOfMessages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;DLQ_DEPTH&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queue_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;queue_name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Alert rule: any DLQ with more than 0 messages is an incident.
# Not a warning. An incident.
# A message in the DLQ means an event failed to process.
# That means something the user expected to happen did not happen.
# That is always worth investigating immediately.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any message in a dead letter queue is a symptom of a real problem.&lt;br&gt;
Not a might-be-a-problem. A real problem. Treating DLQ depth as&lt;br&gt;
a metric that alerts at zero normalises the expectation that failures&lt;br&gt;
are real and visible, rather than the expectation that failures are&lt;br&gt;
background noise to be managed.&lt;/p&gt;
&lt;h2&gt;
  
  
  The schema problem that grows until it bites you
&lt;/h2&gt;

&lt;p&gt;Events are a public interface. Once a consumer is reading your events,&lt;br&gt;
the schema of those events is a contract. Changing the schema breaks&lt;br&gt;
the consumer.&lt;/p&gt;

&lt;p&gt;In a synchronous API, schema evolution is managed through versioning.&lt;br&gt;
The producer runs V1 and V2 of the endpoint simultaneously. Consumers&lt;br&gt;
migrate at their own pace. When all consumers have migrated, V1 is&lt;br&gt;
deprecated.&lt;/p&gt;

&lt;p&gt;In an event-driven system, the equivalent is possible but operationally&lt;br&gt;
harder. If you change the OrderPlaced event schema, you need every&lt;br&gt;
consumer of OrderPlaced to be updated before you change the schema, or&lt;br&gt;
the consumer needs to handle both old and new schemas simultaneously,&lt;br&gt;
or you need to maintain two event types in parallel during migration.&lt;br&gt;
None of these options is cheap, and they are cheaper if you planned&lt;br&gt;
for them than if you did not.&lt;/p&gt;

&lt;p&gt;The teams that handle this well establish schema governance before&lt;br&gt;
they have a schema problem. Not after.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Schema versioning that makes evolution manageable.
# Every event schema has an explicit version.
# Consumers declare which versions they can handle.
# The event bus routes accordingly.
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;


&lt;span class="c1"&gt;# Version 1 of OrderPlaced
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderPlacedV1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;schema_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;


&lt;span class="c1"&gt;# Version 2 adds line items and changes total to be in cents
# to avoid floating point issues that V1 had
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderPlacedV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;schema_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;          &lt;span class="c1"&gt;# Changed: was "total: float"
&lt;/span&gt;    &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;             &lt;span class="c1"&gt;# Added
&lt;/span&gt;    &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;         &lt;span class="c1"&gt;# Added
&lt;/span&gt;    &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;


&lt;span class="c1"&gt;# Consumer that handles both versions during migration period
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InventoryConsumer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OrderPlacedV1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;raw_event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;
            &lt;span class="c1"&gt;# V1 doesn't have items, so we have to fetch them
&lt;/span&gt;            &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_items&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OrderPlacedV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;raw_event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;
            &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inventory.consumer.unknown_schema_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;raw_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;UnknownSchemaVersionError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reserve_inventory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Schema versioning adds boilerplate. The alternative is finding out&lt;br&gt;
during an incident that a schema change broke a consumer that nobody&lt;br&gt;
knew was depending on the old format.&lt;/p&gt;
&lt;h2&gt;
  
  
  When event-driven architecture is the wrong answer
&lt;/h2&gt;

&lt;p&gt;The pattern is not universally appropriate. Teams adopt it when&lt;br&gt;
they should not, attracted by the elegance and the conference talks,&lt;br&gt;
and then spend years paying costs that were not necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you have one team and one service.&lt;/strong&gt; Events are an&lt;br&gt;
organisational boundary mechanism. If there is no organisational&lt;br&gt;
boundary, you are paying the operational cost of distributed&lt;br&gt;
messaging for no architectural benefit. A function call is faster,&lt;br&gt;
simpler, and easier to debug. A modular monolith with internal&lt;br&gt;
domain events gives you the architectural thinking without the&lt;br&gt;
operational overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When your operations require immediate consistency.&lt;/strong&gt; Financial&lt;br&gt;
transactions. Inventory deduction that must be accurate at the&lt;br&gt;
moment of purchase. Medical record updates. Any situation where&lt;br&gt;
the user or the business cannot tolerate the state being temporarily&lt;br&gt;
inconsistent. Eventual consistency is not a technical property to&lt;br&gt;
be engineered around in these cases. It is a fundamental unsuitability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When your team does not have the operational maturity for it.&lt;/strong&gt;&lt;br&gt;
Event-driven systems require distributed tracing. They require DLQ&lt;br&gt;
monitoring. They require schema governance. They require expertise&lt;br&gt;
in at least one message broker technology. They require runbooks&lt;br&gt;
for consumer failure scenarios. Teams that are still establishing&lt;br&gt;
basic engineering practices should not add this operational surface&lt;br&gt;
area. Stabilise first. Adopt the pattern when you have the capacity&lt;br&gt;
to operate it correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When the communication pattern is inherently synchronous.&lt;/strong&gt;&lt;br&gt;
A user submits a form and expects a result. An API client makes&lt;br&gt;
a request and needs a response before it can proceed. A batch job&lt;br&gt;
reads data and produces a report. These patterns do not become&lt;br&gt;
better by adding events between the steps. They become more complex&lt;br&gt;
with no benefit. Forcing an asynchronous pattern onto an inherently&lt;br&gt;
synchronous workflow is an architecture astronaut move, not an&lt;br&gt;
engineering decision.&lt;/p&gt;
&lt;h2&gt;
  
  
  The systems that work
&lt;/h2&gt;

&lt;p&gt;The event-driven systems that work well in production share a set&lt;br&gt;
of properties that are not negotiable.&lt;/p&gt;

&lt;p&gt;Every event carries enough context to be processed without fetching&lt;br&gt;
additional data. A consumer that needs to make a synchronous call&lt;br&gt;
to process an event has a dependency on the producer that the event&lt;br&gt;
pattern was supposed to eliminate.&lt;/p&gt;

&lt;p&gt;Every consumer is idempotent. Events can be delivered more than&lt;br&gt;
once. A consumer that is not idempotent will produce duplicate&lt;br&gt;
effects when this happens. Designing for idempotency upfront is&lt;br&gt;
straightforward. Retrofitting it after duplicate processing has&lt;br&gt;
caused data integrity issues is expensive.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Idempotent consumer using a processed events log
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_order_placed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Check if we have already processed this event
&lt;/span&gt;    &lt;span class="n"&gt;already_processed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;processed_events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;already_processed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inventory.consumer.duplicate_event_skipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="c1"&gt;# Process the event
&lt;/span&gt;    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;reserve_inventory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Record that we have processed it
&lt;/span&gt;    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;processed_events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;processed_at&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inventory-service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every schema change goes through a review process before it is&lt;br&gt;
deployed. The review asks: which consumers will be affected? Have&lt;br&gt;
they been updated? Can the change be made backwards-compatible?&lt;br&gt;
If not, what is the migration plan?&lt;/p&gt;

&lt;p&gt;Every dead letter queue has an alert and an owner. DLQ messages&lt;br&gt;
are investigated the same day they appear. Not triaged. Not&lt;br&gt;
backlogged. Investigated.&lt;/p&gt;

&lt;p&gt;The teams that run event-driven systems well have built this&lt;br&gt;
infrastructure. It is not glamorous work. It does not appear&lt;br&gt;
in the conference talk about the elegant decoupling. It is the&lt;br&gt;
thing that makes the decoupling survivable in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest version of the pitch
&lt;/h2&gt;

&lt;p&gt;Event-driven architecture is worth adopting when you have multiple&lt;br&gt;
teams that need genuine autonomy, when your use cases can tolerate&lt;br&gt;
eventual consistency, when you have the operational maturity to run&lt;br&gt;
distributed messaging in production, and when the benefits of&lt;br&gt;
decoupling outweigh the costs of asynchrony and distribution.&lt;/p&gt;

&lt;p&gt;In those circumstances it is genuinely powerful. The teams that&lt;br&gt;
use it well will tell you they cannot imagine going back. The&lt;br&gt;
operational complexity feels like a fair trade for the organisational&lt;br&gt;
flexibility.&lt;/p&gt;

&lt;p&gt;In other circumstances it is an expensive mismatch between pattern&lt;br&gt;
and problem. The complexity of the pattern does not disappear because&lt;br&gt;
you chose it for the wrong reasons. It stays. You pay it.&lt;/p&gt;

&lt;p&gt;The evaluation should be honest about both sides. Not "events are&lt;br&gt;
the modern way to build systems" which is a fashion statement.&lt;br&gt;
Not "events are always too complex" which ignores where they&lt;br&gt;
genuinely excel. The honest version: this pattern solves specific&lt;br&gt;
organisational and scalability problems at a specific operational&lt;br&gt;
cost. Do the problems apply to us? Can we afford the cost? Those&lt;br&gt;
are the questions worth asking before you commit.&lt;/p&gt;

&lt;p&gt;The answer is sometimes yes. It is not always yes.&lt;br&gt;
And pretending otherwise is how teams end up with event-driven&lt;br&gt;
systems they cannot operate and cannot easily escape.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>tooling</category>
      <category>architecture</category>
      <category>eventdriven</category>
    </item>
    <item>
      <title>The Senior Engineer Who Stopped Coding</title>
      <dc:creator>Benard Otieno</dc:creator>
      <pubDate>Sun, 24 May 2026 07:56:23 +0000</pubDate>
      <link>https://dev.to/benard_otieno_cdb9e6d4907/the-senior-engineer-who-stopped-coding-4dh2</link>
      <guid>https://dev.to/benard_otieno_cdb9e6d4907/the-senior-engineer-who-stopped-coding-4dh2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;At some point, many senior engineers quietly transition from building things to managing the building of things. This transition is often presented as growth. Sometimes it is. Often it is the beginning of a slow professional collapse.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There is a career transition that happens quietly, usually between&lt;br&gt;
year five and year ten of an engineering career, that nobody talks&lt;br&gt;
about honestly.&lt;/p&gt;

&lt;p&gt;The engineer starts getting pulled into more meetings. Their opinions&lt;br&gt;
are sought on architecture decisions. They review more code than they&lt;br&gt;
write. They spend increasing amounts of time on documents, on planning,&lt;br&gt;
on the work of coordination rather than the work of building. Their&lt;br&gt;
title changes. Their calendar fills up. Their pull request count drops&lt;br&gt;
toward zero.&lt;/p&gt;

&lt;p&gt;Everyone calls this growth. The engineer accepts that framing because&lt;br&gt;
it comes with a raise and an increased sense of importance. The team&lt;br&gt;
benefits because the engineer's experience is now multiplicative rather&lt;br&gt;
than additive. The organisation is satisfied because the senior&lt;br&gt;
headcount is being leveraged correctly.&lt;/p&gt;

&lt;p&gt;And then, two or three years later, the engineer sits down to build&lt;br&gt;
something and discovers that they cannot. The tools have moved on.&lt;br&gt;
The frameworks they knew have been superseded. The muscle memory&lt;br&gt;
of debugging and building and shipping is gone. They know what good&lt;br&gt;
looks like but they have lost the ability to produce it directly.&lt;/p&gt;

&lt;p&gt;They have become, without intending to, a person who talks about&lt;br&gt;
engineering rather than one who does it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;The pull away from coding is not malicious. It is structural, and&lt;br&gt;
it happens because the incentives all point in one direction.&lt;/p&gt;

&lt;p&gt;Senior engineers are expensive. Organisations naturally want to&lt;br&gt;
extract maximum value from expensive headcount. The most visible&lt;br&gt;
form of leverage is influence: one senior engineer who reviews&lt;br&gt;
code, answers questions, and makes architectural decisions affects&lt;br&gt;
the output of five junior engineers. One senior engineer who&lt;br&gt;
writes code produces the output of one engineer, perhaps with&lt;br&gt;
higher quality. The leverage calculation is obvious and it is wrong.&lt;/p&gt;

&lt;p&gt;It is wrong because it treats engineering skill as static. It&lt;br&gt;
assumes the senior engineer who stops building remains as capable&lt;br&gt;
as the one who keeps building. This is not how skills work. Engineering&lt;br&gt;
is a practice. It requires continuous exercise. A surgeon who stops&lt;br&gt;
performing operations and moves into medical administration is&lt;br&gt;
not available to the hospital as a surgeon five years later. They&lt;br&gt;
have different skills now. Valuable skills. Not surgical ones.&lt;/p&gt;

&lt;p&gt;The same is true for engineers. An engineer who has not written&lt;br&gt;
production code in three years, who has not debugged a live incident&lt;br&gt;
at the code level, who has not sat with a new framework and built&lt;br&gt;
something real with it, has a different skill set than they did&lt;br&gt;
three years ago. They have institutional knowledge. They have judgment.&lt;br&gt;
They can read a system design and identify its weaknesses. They cannot&lt;br&gt;
build the system.&lt;/p&gt;

&lt;p&gt;The organisation pays for what it believes is a force multiplier and&lt;br&gt;
gets something valuable but different from what it expected. The&lt;br&gt;
engineer pays with capability they did not intend to give up.&lt;/p&gt;
&lt;h2&gt;
  
  
  The judgment without craft problem
&lt;/h2&gt;

&lt;p&gt;There is a specific failure mode that emerges from this transition&lt;br&gt;
that is hard to see from the inside.&lt;/p&gt;

&lt;p&gt;An engineer who has stopped building retains their judgment about&lt;br&gt;
what good looks like, formed from their experience of building.&lt;br&gt;
This judgment is real and useful. The problem is that it begins to&lt;br&gt;
degrade in specific ways that are not immediately obvious.&lt;/p&gt;

&lt;p&gt;Good engineering judgment is not abstract. It is grounded in the&lt;br&gt;
current reality of what is possible, what is fast, what is painful,&lt;br&gt;
and what breaks. This reality changes constantly. The abstractions&lt;br&gt;
shift. The tooling improves. The performance characteristics of&lt;br&gt;
systems change as the underlying platforms evolve. The things that&lt;br&gt;
were hard five years ago are sometimes easy now. The things that&lt;br&gt;
seem easy from a distance are sometimes newly hard in ways that&lt;br&gt;
only become apparent when you try to build them.&lt;/p&gt;

&lt;p&gt;An engineer who is actively building updates their judgment&lt;br&gt;
continuously through contact with this reality. They try something,&lt;br&gt;
it does not work the way they expected, they update their model.&lt;br&gt;
They read the error message. They hit the limitation. They find&lt;br&gt;
the workaround. This continuous updating is not dramatic. It is&lt;br&gt;
the normal texture of building software. But it keeps the judgment&lt;br&gt;
calibrated.&lt;/p&gt;

&lt;p&gt;An engineer who is not actively building is running on cached&lt;br&gt;
judgment. Their model of what is hard and what is easy, what is&lt;br&gt;
fast and what is slow, is a snapshot from whenever they last&lt;br&gt;
built things seriously. This snapshot becomes less accurate over&lt;br&gt;
time, but the degradation is invisible to them because they have&lt;br&gt;
no direct contact with the current reality that would reveal the&lt;br&gt;
gap.&lt;/p&gt;

&lt;p&gt;The result is opinions that are confidently wrong in ways that&lt;br&gt;
only become apparent when someone tries to implement them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The kind of thing a cached-judgment engineer might propose:
# "Just add a caching layer in front of the database,
# it's straightforward, shouldn't take more than a day."
&lt;/span&gt;
&lt;span class="c1"&gt;# What the engineer building it discovers:
# Cache invalidation for this data model requires
# tracking relationships across four entity types.
# The cache stampede problem on cold starts needs handling.
# The serialisation format needs versioning.
# The TTL strategy needs to account for consistency requirements.
# Testing the cached paths needs a different test infrastructure.
&lt;/span&gt;
&lt;span class="c1"&gt;# What was "a day" is three weeks of careful engineering.
# The estimate came from judgment that had not been updated
# by the experience of actually doing it recently.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gap between proposed and actual is one of the most common&lt;br&gt;
sources of friction between senior engineers who have stepped back&lt;br&gt;
from coding and the engineers who implement their proposals. The&lt;br&gt;
implementers know the gap exists. The senior engineer often does not.&lt;/p&gt;
&lt;h2&gt;
  
  
  What gets lost that nobody mentions
&lt;/h2&gt;

&lt;p&gt;The obvious loss when an engineer stops coding is technical currency.&lt;br&gt;
The tools move. The frameworks change. This is real but it is&lt;br&gt;
also recoverable with deliberate effort.&lt;/p&gt;

&lt;p&gt;The less obvious losses are harder to recover.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ability to estimate honestly.&lt;/strong&gt; Estimation requires recent&lt;br&gt;
experience of how long things actually take. An engineer who has&lt;br&gt;
not built anything in two years cannot accurately estimate how long&lt;br&gt;
it will take to build something now. They can produce a number.&lt;br&gt;
The number will be wrong in ways that are hard to detect until the&lt;br&gt;
project is underway. The engineer will believe the number because&lt;br&gt;
it matches their cached sense of how long things take.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ability to detect complexity from code.&lt;/strong&gt; A senior engineer&lt;br&gt;
reading code knows when something is more complex than it should&lt;br&gt;
be. This knowledge comes from writing code and feeling the&lt;br&gt;
difference between something that reads cleanly because it is&lt;br&gt;
clean and something that reads cleanly because the complexity&lt;br&gt;
is hidden. After years without writing code, this sense dulls.&lt;br&gt;
The code review becomes shallower. The questions become more&lt;br&gt;
high-level. The subtle architectural problems that would have&lt;br&gt;
jumped off the page become invisible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The credibility to push back effectively.&lt;/strong&gt; An engineer who&lt;br&gt;
is actively building earns credibility with their team through&lt;br&gt;
demonstrated competence. When they say a design is wrong, the&lt;br&gt;
team believes them because they have seen the engineer be right&lt;br&gt;
before in ways that were verifiable. An engineer who has not&lt;br&gt;
built anything in years makes the same claim but with a different&lt;br&gt;
credibility structure. They may be right. The team has fewer&lt;br&gt;
ways to verify it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The feel for what is genuinely hard.&lt;/strong&gt; Some things are hard&lt;br&gt;
in ways that are not obvious from a design document. Race&lt;br&gt;
conditions that only appear under specific timing conditions.&lt;br&gt;
Data corruption that emerges from an edge case in a serialisation&lt;br&gt;
format. Performance problems that only manifest at production&lt;br&gt;
data volumes. An engineer who is building regularly encounters&lt;br&gt;
these and updates their internal map of where the hidden complexity&lt;br&gt;
lives. An engineer who is not building loses this map gradually,&lt;br&gt;
replaced by a more abstract and less accurate version.&lt;/p&gt;
&lt;h2&gt;
  
  
  The engineers who avoid this
&lt;/h2&gt;

&lt;p&gt;The senior engineers who do not fall into this pattern share&lt;br&gt;
something in common: they have made an explicit commitment to&lt;br&gt;
staying in contact with building, and they have made it in a&lt;br&gt;
way that is visible enough to be protected from schedule pressure.&lt;/p&gt;

&lt;p&gt;This takes different forms for different people.&lt;/p&gt;

&lt;p&gt;Some reserve a portion of every week for writing production code.&lt;br&gt;
Not prototypes. Not experiments. Production code that ships and&lt;br&gt;
runs and breaks and gets debugged. The amount varies but the&lt;br&gt;
commitment is treated as non-negotiable in the same way that&lt;br&gt;
a weekly architecture review is non-negotiable. It goes on the&lt;br&gt;
calendar. It survives meeting requests.&lt;/p&gt;

&lt;p&gt;Some rotate through on-call rotations. Being on call forces&lt;br&gt;
contact with the operational reality of the system. You are&lt;br&gt;
not reading about incidents. You are debugging them. The debugger&lt;br&gt;
does not care about your title. It shows you the stack trace.&lt;br&gt;
You have to read it. This contact with reality is valuable&lt;br&gt;
precisely because it bypasses the abstraction layer that&lt;br&gt;
seniority tends to insert between an engineer and the system.&lt;/p&gt;

&lt;p&gt;Some maintain personal projects or contribute to open source&lt;br&gt;
with genuine commitment, not for portfolio reasons, but as&lt;br&gt;
a way of staying in contact with the experience of building&lt;br&gt;
without the organisational pressures that cause the drift in&lt;br&gt;
the first place.&lt;/p&gt;

&lt;p&gt;The common thread is deliberate, protected time for building&lt;br&gt;
that is treated as a first-class responsibility rather than&lt;br&gt;
something that happens when there is space in the schedule.&lt;br&gt;
There is never space in the schedule. Space has to be created&lt;br&gt;
and defended.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# An approach some senior engineers use to stay current:
# Build the tooling you wish existed for your team.
# Not a proof of concept. Not a demo.
# A tool that your team uses in production.
&lt;/span&gt;
&lt;span class="c1"&gt;# This has several properties that make it effective:
# 1. It forces contact with the current state of the tools
# 2. The output has real users who will tell you when it breaks
# 3. The scope is usually small enough to complete
# 4. The domain is something you understand deeply
# 5. It produces something valuable rather than just exercises
&lt;/span&gt;
&lt;span class="c1"&gt;# Examples that fit this pattern:
# - Internal CLI tools for deployment workflows
# - Monitoring dashboards the team actually uses
# - Code generation tools for boilerplate reduction
# - Test helpers that reduce the friction of writing tests
# - Performance profiling scripts for the specific system
&lt;/span&gt;
&lt;span class="c1"&gt;# The anti-pattern is the proof of concept that lives in a branch
# and is shown at a demo and never runs in production.
# That touches code without the accountability that makes it useful.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The honest conversation about titles
&lt;/h2&gt;

&lt;p&gt;There is a version of the senior engineer transition that is&lt;br&gt;
genuinely right. Some engineers find that the leverage work,&lt;br&gt;
the architecture work, the mentoring and the writing and the&lt;br&gt;
planning, is where they create the most value and where they find&lt;br&gt;
the most satisfaction. They grow into staff or principal roles&lt;br&gt;
and they are genuinely excellent at them. They maintain enough&lt;br&gt;
technical currency to keep their judgment calibrated, even if&lt;br&gt;
they are not writing production code every week.&lt;/p&gt;

&lt;p&gt;There is another version that is happening to engineers who&lt;br&gt;
would rather be building but are being pulled away from it by&lt;br&gt;
a combination of organisational incentives and social expectations&lt;br&gt;
about what seniority looks like. These engineers are losing&lt;br&gt;
something they valued, substituting something they value less,&lt;br&gt;
and calling it career growth because that is the available&lt;br&gt;
vocabulary.&lt;/p&gt;

&lt;p&gt;The engineering industry does not have good vocabulary for&lt;br&gt;
the engineer who wants to be deeply senior and deeply technical&lt;br&gt;
simultaneously. Individual contributor tracks exist at many&lt;br&gt;
companies but they are often treated as consolation prizes&lt;br&gt;
for engineers who were not good enough for management, rather&lt;br&gt;
than as the legitimate and valuable career paths they should be.&lt;/p&gt;

&lt;p&gt;The engineers who want to stay close to building need to name&lt;br&gt;
this explicitly, to themselves and to the organisations they&lt;br&gt;
work for. Not "I am not interested in management" which sounds&lt;br&gt;
like a limitation. But "I am most valuable and most effective&lt;br&gt;
when I am close to the code and I am going to protect that."&lt;br&gt;
This is a statement of professional judgment, not of career&lt;br&gt;
limitation.&lt;/p&gt;

&lt;p&gt;Organisations that understand this will make space for it.&lt;br&gt;
Organisations that do not will slowly pull the engineer away&lt;br&gt;
from the thing that made them valuable in the first place,&lt;br&gt;
and be confused about why the force multiplication stopped&lt;br&gt;
working.&lt;/p&gt;

&lt;h2&gt;
  
  
  The recovery
&lt;/h2&gt;

&lt;p&gt;For engineers who have drifted and want to return, the recovery&lt;br&gt;
is not complicated. It is uncomfortable, and it takes time, and&lt;br&gt;
that is all.&lt;/p&gt;

&lt;p&gt;The first step is accepting that capability has decayed. This&lt;br&gt;
is the hardest part. An engineer who has been treated as an&lt;br&gt;
authority for several years has to sit down with a codebase&lt;br&gt;
or a tool they do not fully understand and feel incompetent&lt;br&gt;
for a while. The internal experience of this is uncomfortable&lt;br&gt;
regardless of how much intellectual acceptance there is.&lt;/p&gt;

&lt;p&gt;The second step is building something small and finishing it.&lt;br&gt;
Not learning a new framework theoretically. Building something&lt;br&gt;
that ships and runs. Small enough that the finish line is&lt;br&gt;
visible. Real enough that it has users and breaks and requires&lt;br&gt;
debugging.&lt;/p&gt;

&lt;p&gt;The third step is doing this consistently enough that the&lt;br&gt;
muscle memory returns. This takes months, not weeks. The&lt;br&gt;
calibration of judgment to current reality takes longer than&lt;br&gt;
the recapture of technical mechanics, because it requires&lt;br&gt;
enough accumulated experience with the current state of things&lt;br&gt;
to have a reliable sense of where the hard parts are.&lt;/p&gt;

&lt;p&gt;None of this is dramatic. Engineering is a practice and like&lt;br&gt;
all practices it responds to consistent effort over time.&lt;br&gt;
The engineer who drifted and then returned, who has both the&lt;br&gt;
accumulated judgment of years of experience and the current&lt;br&gt;
calibration of active practice, is genuinely rare and genuinely&lt;br&gt;
valuable.&lt;/p&gt;

&lt;p&gt;Most do not make the return. The drift feels like growth.&lt;br&gt;
The calendar stays full. The meetings require the same preparation&lt;br&gt;
as building used to require. The work feels substantial because&lt;br&gt;
it is substantial. The specific thing that was lost does not&lt;br&gt;
announce its absence loudly.&lt;/p&gt;

&lt;p&gt;It is just gone, quietly, while nobody was looking.&lt;/p&gt;

&lt;p&gt;The best time to notice this is before it is very far along.&lt;br&gt;
The second best time is now.&lt;/p&gt;

</description>
      <category>career</category>
      <category>architecture</category>
      <category>devops</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Microservices Were Never About Technology</title>
      <dc:creator>Benard Otieno</dc:creator>
      <pubDate>Thu, 21 May 2026 14:20:37 +0000</pubDate>
      <link>https://dev.to/benard_otieno_cdb9e6d4907/microservices-were-never-about-technology-2ej7</link>
      <guid>https://dev.to/benard_otieno_cdb9e6d4907/microservices-were-never-about-technology-2ej7</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Every failed microservices adoption I have seen made the same mistake: treating microservices as an infrastructure pattern instead of an organisational one. The technology is the easy part. The hard part is everything else.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The microservices conversation in most engineering teams goes something&lt;br&gt;
like this. The monolith is getting unwieldy. Deployments are slow.&lt;br&gt;
The codebase is hard to navigate. A senior engineer proposes breaking&lt;br&gt;
things apart into services. The team agrees. They spend six months&lt;br&gt;
doing it. Things get worse.&lt;/p&gt;

&lt;p&gt;The services are too small or too large. Nobody agrees on where the&lt;br&gt;
boundaries should be. Simple features now require coordinating changes&lt;br&gt;
across three repositories. A bug that used to take twenty minutes to&lt;br&gt;
debug now takes two hours because it crosses service boundaries.&lt;br&gt;
Deployments are more frequent but individually more fragile. The team&lt;br&gt;
is working harder than before and moving slower than before.&lt;/p&gt;

&lt;p&gt;They blame the technology. The technology is not the problem.&lt;/p&gt;

&lt;p&gt;Microservices failed for them for the same reason they fail for most&lt;br&gt;
teams that adopt them: the team treated decomposition as a technical&lt;br&gt;
decision and ignored the organisational reality that microservices&lt;br&gt;
are actually designed to solve. You cannot separate the architecture&lt;br&gt;
from the team structure. Conway's Law is not optional.&lt;/p&gt;
&lt;h2&gt;
  
  
  What microservices were actually invented for
&lt;/h2&gt;

&lt;p&gt;Amazon is the origin story most people know. The mandate from Jeff&lt;br&gt;
Bezos in the early 2000s: all teams will expose their data and&lt;br&gt;
functionality through service interfaces. All teams will communicate&lt;br&gt;
through these interfaces. No other form of interprocess communication&lt;br&gt;
is allowed. Anyone who doesn't do this will be fired. He was serious.&lt;/p&gt;

&lt;p&gt;The problem Bezos was solving was not technical. Amazon's engineering&lt;br&gt;
organisation had grown to a size where teams were deeply entangled with&lt;br&gt;
each other. Team A could not deploy without coordinating with Team B,&lt;br&gt;
which needed sign-off from Team C, which had a dependency on Team D.&lt;br&gt;
Every change required a synchronisation meeting. Every release was a&lt;br&gt;
negotiation. The coupling between teams was strangling the organisation's&lt;br&gt;
ability to move.&lt;/p&gt;

&lt;p&gt;Services were the solution because they forced a contract between teams.&lt;br&gt;
If Team A owns Service A, and Team B owns Service B, and they communicate&lt;br&gt;
only through a defined API, then Team A can change anything inside&lt;br&gt;
Service A without asking Team B's permission. Team B can deploy Service B&lt;br&gt;
on its own schedule. The organisational autonomy is enforced by the&lt;br&gt;
technical boundary.&lt;/p&gt;

&lt;p&gt;This is the thing that most microservices adoptions miss entirely.&lt;br&gt;
Services are not primarily a way to scale technology. They are a way&lt;br&gt;
to scale teams. The technical properties of services (independent&lt;br&gt;
deployment, technology flexibility, fault isolation) are valuable&lt;br&gt;
side effects of the organisational property (team autonomy).&lt;/p&gt;

&lt;p&gt;If you adopt microservices without the organisational changes that&lt;br&gt;
make them valuable, you get all the costs of distributed systems&lt;br&gt;
and none of the benefits.&lt;/p&gt;
&lt;h2&gt;
  
  
  The cost of distribution
&lt;/h2&gt;

&lt;p&gt;A monolith, for all its problems, has properties that distributed&lt;br&gt;
systems do not have and cannot have.&lt;/p&gt;

&lt;p&gt;A function call inside a monolith is reliable. Either it works or&lt;br&gt;
it throws an exception. It is fast. It completes in microseconds.&lt;br&gt;
It participates in the same database transaction as the code that&lt;br&gt;
called it. If the whole operation needs to be rolled back, it is.&lt;/p&gt;

&lt;p&gt;A network call between services is unreliable. It might succeed.&lt;br&gt;
It might fail. It might succeed on the server side and fail on the&lt;br&gt;
network before the response reaches the caller. It might time out,&lt;br&gt;
leaving you with no information about whether the remote operation&lt;br&gt;
completed. It is slow relative to a function call. It crosses&lt;br&gt;
a transaction boundary, which means if something fails after the&lt;br&gt;
call succeeded, you have a consistency problem that cannot be&lt;br&gt;
resolved by a rollback.&lt;/p&gt;

&lt;p&gt;This is not an implementation detail to be engineered around. It is&lt;br&gt;
a fundamental property of distributed systems, described precisely&lt;br&gt;
in the fallacies of distributed computing that Peter Deutsch wrote&lt;br&gt;
in 1994 and that the industry has been rediscovering ever since.&lt;/p&gt;

&lt;p&gt;The network is not reliable. Latency is not zero. Bandwidth is not&lt;br&gt;
infinite. The network is not secure. Topology changes. There is not&lt;br&gt;
one administrator. Transport cost is not zero. The network is not&lt;br&gt;
homogeneous.&lt;/p&gt;

&lt;p&gt;Every service boundary you add to your system is a place where these&lt;br&gt;
fallacies apply. Every service call is an opportunity for latency,&lt;br&gt;
failure, and consistency problems that simply do not exist inside&lt;br&gt;
a monolith. The question is whether the organisational benefits of&lt;br&gt;
the boundary justify the distributed systems cost of maintaining it.&lt;/p&gt;

&lt;p&gt;For a team of eight people working on one product, they almost never do.&lt;/p&gt;
&lt;h2&gt;
  
  
  Where the boundaries actually belong
&lt;/h2&gt;

&lt;p&gt;The most common failure mode in microservices adoption is drawing&lt;br&gt;
service boundaries around technical concerns rather than business&lt;br&gt;
ones. Teams create an "auth service," a "notification service,"&lt;br&gt;
a "user service," a "payment service." These feel like natural&lt;br&gt;
decompositions because they map to recognisable technical concepts.&lt;/p&gt;

&lt;p&gt;They are terrible service boundaries.&lt;/p&gt;

&lt;p&gt;An auth service that every other service must call to validate a&lt;br&gt;
token is not a service. It is a shared library that has been deployed&lt;br&gt;
as infrastructure, adding network latency and a failure mode to every&lt;br&gt;
authenticated request in the system. If the auth service is slow,&lt;br&gt;
everything is slow. If the auth service is down, everything is down.&lt;br&gt;
You have taken a piece of logic that could live as a function call&lt;br&gt;
and made it a distributed systems problem.&lt;/p&gt;

&lt;p&gt;A notification service is not a service. It is a collection of side&lt;br&gt;
effects that have been externalized, creating a situation where the&lt;br&gt;
service that wants to send an email must make a network call, handle&lt;br&gt;
the failure case, and figure out what to do if the notification&lt;br&gt;
service is unavailable at the moment the email needs to be sent.&lt;/p&gt;

&lt;p&gt;The boundaries that work are the ones that map to bounded contexts&lt;br&gt;
in the business domain. Not "the thing that handles auth" but "the&lt;br&gt;
thing that owns everything about how customers interact with our&lt;br&gt;
platform." Not "the thing that sends notifications" but "the thing&lt;br&gt;
that owns the customer communication history and all the rules about&lt;br&gt;
when and how to communicate."&lt;/p&gt;

&lt;p&gt;These boundaries are harder to identify. They require understanding&lt;br&gt;
the business deeply enough to know where the real seams are. They&lt;br&gt;
require conversations with product managers and domain experts, not&lt;br&gt;
just with engineers. They change as the business evolves. But they&lt;br&gt;
are the boundaries that, when you respect them, produce services&lt;br&gt;
that teams can own autonomously and evolve independently.&lt;/p&gt;

&lt;p&gt;Domain-Driven Design's concept of bounded contexts is the clearest&lt;br&gt;
framework for finding these boundaries. The bounded context defines&lt;br&gt;
the scope within which a particular domain model applies. At the&lt;br&gt;
edge of the bounded context, the model changes. That is where the&lt;br&gt;
service boundary belongs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# A service boundary drawn around a technical concern.
# Every other service calls this. Auth is now a distributed dependency.
#
# Bad:
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AuthService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;revoke_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;


&lt;span class="c1"&gt;# A service boundary drawn around a business capability.
# This service owns everything about an order, including its auth context.
# Other services don't call into it for auth. They communicate
# through events when they need to know something happened.
#
# Better:
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;place_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Auth context is resolved here, not farmed out to a network call
&lt;/span&gt;        &lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;can_place_orders&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;InsufficientPermissionsError&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cancel_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requesting_customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;requesting_customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;InsufficientPermissionsError&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conway's Law is a constraint, not a suggestion
&lt;/h2&gt;

&lt;p&gt;Mel Conway observed in 1967 that organisations produce systems that&lt;br&gt;
mirror their communication structures. A team with three groups will&lt;br&gt;
produce a system with three components. This is not because they&lt;br&gt;
planned to. It is because the system reflects who talks to whom.&lt;/p&gt;

&lt;p&gt;The implication that most teams don't fully absorb: if you want a&lt;br&gt;
particular system architecture, you need the corresponding&lt;br&gt;
organisational structure. You cannot have a microservices architecture&lt;br&gt;
with a team structure designed for a monolith. The organisation will&lt;br&gt;
fight the architecture until one of them wins, and the organisation&lt;br&gt;
usually wins because it existed first.&lt;/p&gt;

&lt;p&gt;This is why Amazon's microservices worked. The service boundaries&lt;br&gt;
and the team boundaries were the same boundaries. Team A owns Service A.&lt;br&gt;
Not "Team A and Team B both contribute to Service A." Not "Service A&lt;br&gt;
is maintained by whoever has time." One team, one service, full ownership.&lt;br&gt;
The organisational autonomy and the technical autonomy were the same thing.&lt;/p&gt;

&lt;p&gt;Most microservices adoptions separate these. The same team that used&lt;br&gt;
to work on the monolith now works on six services. They have all the&lt;br&gt;
coordination overhead of distributed systems and none of the team&lt;br&gt;
autonomy that makes it worth it. They still talk to each other constantly&lt;br&gt;
because they're the same people. The service boundaries don't reflect&lt;br&gt;
team boundaries because there are no team boundaries. There is one team&lt;br&gt;
doing distributed systems for no organisational reason.&lt;/p&gt;

&lt;p&gt;The inverse Conway maneuver, a term coined by Thoughtworks, is the&lt;br&gt;
deliberate version: you design the team structure you want, then&lt;br&gt;
let the architecture follow from it. If you want a payments service&lt;br&gt;
that can be developed and deployed independently, you need a payments&lt;br&gt;
team that can make decisions and ship code independently. If you do&lt;br&gt;
not have or cannot create that team, you do not have the prerequisite&lt;br&gt;
for the payments service.&lt;br&gt;
The prerequisite check before splitting a service:&lt;/p&gt;

&lt;p&gt;Who will own this service?&lt;br&gt;
"The backend team" is not an answer.&lt;br&gt;
A named, stable, small team is an answer.&lt;br&gt;
Can that team deploy the service without coordinating with&lt;br&gt;
other teams?&lt;br&gt;
If not, the boundary is wrong or the ownership is wrong.&lt;br&gt;
Can that team change the service's internal implementation&lt;br&gt;
without changing any other service?&lt;br&gt;
If not, the boundary is wrong.&lt;br&gt;
Is there a defined contract (API, event schema) between this&lt;br&gt;
service and its consumers?&lt;br&gt;
If not, you don't have a service. You have a distributed module.&lt;br&gt;
Does the team have enough context about the business domain&lt;br&gt;
this service represents to make good decisions autonomously?&lt;br&gt;
If not, the team needs to exist and stabilise before the service&lt;br&gt;
should be extracted.&lt;/p&gt;

&lt;p&gt;If any of these answers is no, the split is premature.&lt;/p&gt;
&lt;h2&gt;
  
  
  The operational surface nobody accounts for
&lt;/h2&gt;

&lt;p&gt;When a team decides to split their monolith into ten services, they&lt;br&gt;
usually have a plan for the technical decomposition. They rarely have&lt;br&gt;
a plan for what they are about to own operationally.&lt;/p&gt;

&lt;p&gt;A monolith has one deployment pipeline. One set of infrastructure&lt;br&gt;
to configure. One place to look at logs. One set of metrics. One&lt;br&gt;
runbook for when things go wrong. The operational complexity is low.&lt;/p&gt;

&lt;p&gt;Ten services have ten deployment pipelines. Ten infrastructure&lt;br&gt;
configurations. Log aggregation that spans services. Distributed&lt;br&gt;
tracing to follow a request through multiple services. Ten runbooks,&lt;br&gt;
except the incidents that matter will involve multiple services and&lt;br&gt;
none of the runbooks will cover that. Service discovery. Health&lt;br&gt;
checking at the inter-service level. Circuit breakers for when&lt;br&gt;
downstream services are degraded.&lt;/p&gt;

&lt;p&gt;None of this complexity is impossible to manage. It is all solvable.&lt;br&gt;
But it requires a team that has the capacity to manage it, tools&lt;br&gt;
that have been set up before the split happens, and expertise that&lt;br&gt;
takes time to develop.&lt;/p&gt;

&lt;p&gt;Most teams split their services and then build the operational&lt;br&gt;
infrastructure retroactively, while also trying to deliver product&lt;br&gt;
work, while also debugging the new distributed systems problems they&lt;br&gt;
did not have before. This is where the eighteen months of slowdown&lt;br&gt;
comes from.&lt;/p&gt;

&lt;p&gt;The teams that do this well build the operational infrastructure&lt;br&gt;
first. They get distributed tracing working in the monolith before&lt;br&gt;
they split it. They standardise their deployment pipeline before&lt;br&gt;
they have ten of them. They establish logging conventions before&lt;br&gt;
they have ten services emitting logs in subtly different formats.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The operational baseline that must exist before splitting services.&lt;/span&gt;
&lt;span class="c1"&gt;# This is not optional infrastructure to add later.&lt;/span&gt;

&lt;span class="c1"&gt;# Centralised structured logging&lt;/span&gt;
&lt;span class="na"&gt;logging&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;json&lt;/span&gt;
  &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${SERVICE_NAME}&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${SERVICE_VERSION}&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${ENVIRONMENT}&lt;/span&gt;
    &lt;span class="na"&gt;trace_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${TRACE_ID}&lt;/span&gt;    &lt;span class="c1"&gt;# Must be propagated across service calls&lt;/span&gt;
    &lt;span class="na"&gt;span_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${SPAN_ID}&lt;/span&gt;

&lt;span class="c1"&gt;# Every service exposes these endpoints. No exceptions.&lt;/span&gt;
&lt;span class="na"&gt;health_endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;liveness&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/healthz&lt;/span&gt;      &lt;span class="c1"&gt;# Is the process running?&lt;/span&gt;
  &lt;span class="na"&gt;readiness&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/ready&lt;/span&gt;       &lt;span class="c1"&gt;# Is it ready to serve traffic?&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/metrics&lt;/span&gt;       &lt;span class="c1"&gt;# Prometheus metrics&lt;/span&gt;

&lt;span class="c1"&gt;# Every inter-service call propagates these headers&lt;/span&gt;
&lt;span class="na"&gt;trace_propagation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traceparent&lt;/span&gt;           &lt;span class="c1"&gt;# W3C Trace Context&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tracestate&lt;/span&gt;

&lt;span class="c1"&gt;# Every service has these alerts configured before it handles traffic&lt;/span&gt;
&lt;span class="na"&gt;minimum_alerts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;error_rate_above_1_percent&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;p99_latency_above_1_second&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;service_unavailable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The monolith that should stay a monolith
&lt;/h2&gt;

&lt;p&gt;Not every system should be microservices. This is easy to say and&lt;br&gt;
hard to accept in an industry where microservices became the mark&lt;br&gt;
of a serious engineering organisation.&lt;/p&gt;

&lt;p&gt;The monolith that should stay a monolith is the one where:&lt;/p&gt;

&lt;p&gt;The team is small enough that coordination overhead is low. Five&lt;br&gt;
to eight engineers can coordinate in a daily standup without the&lt;br&gt;
synchronisation cost becoming significant. For a team this size,&lt;br&gt;
the organisational problem that microservices solve does not exist.&lt;/p&gt;

&lt;p&gt;The domain is not yet well understood. Early-stage products have&lt;br&gt;
unstable domain models. The concepts that seem fundamental change&lt;br&gt;
as you learn what you're actually building. Service boundaries drawn&lt;br&gt;
around an unstable domain model have to be redrawn as the domain&lt;br&gt;
stabilises, which is expensive and demoralising. The monolith lets&lt;br&gt;
the domain model evolve cheaply. Split when the domain is understood.&lt;/p&gt;

&lt;p&gt;The operational team does not exist. If nobody owns the infrastructure&lt;br&gt;
that a distributed system requires, the system will be operated badly.&lt;br&gt;
A well-operated monolith beats a poorly-operated distributed system&lt;br&gt;
every time.&lt;/p&gt;

&lt;p&gt;The internal structure can be improved without splitting. A modular&lt;br&gt;
monolith with clear internal boundaries, enforced through package&lt;br&gt;
structure and dependency rules, provides most of the cognitive benefits&lt;br&gt;
of microservices (clear ownership, bounded contexts, interface discipline)&lt;br&gt;
without the distributed systems cost. It is not a compromise. For the&lt;br&gt;
right team and domain, it is the correct architecture.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# A modular monolith with enforced boundaries.
# orders/ cannot import directly from payments/.
# They communicate through defined interfaces.
# This is achievable without distributed systems.
&lt;/span&gt;
&lt;span class="c1"&gt;# src/orders/service.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;orders.repository&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OrderRepository&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;orders.events&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OrderPlaced&lt;/span&gt;  &lt;span class="c1"&gt;# Orders emits events
# from payments.service import PaymentService  # This import is forbidden
&lt;/span&gt;                                               &lt;span class="c1"&gt;# enforced by linting rules
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OrderRepository&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;event_bus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EventBus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;payment_gateway&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PaymentGateway&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Interface, not concrete payments module
&lt;/span&gt;    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repository&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_bus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event_bus&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payment_gateway&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payment_gateway&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;place_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_bus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OrderPlaced&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;

&lt;span class="c1"&gt;# The payments module listens for OrderPlaced events.
# It never gets called directly by orders.
# The boundary is real. It is enforced by design, not by a network.
&lt;/span&gt;
&lt;span class="c1"&gt;# src/payments/handlers.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;orders.events&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OrderPlaced&lt;/span&gt;  &lt;span class="c1"&gt;# Reading event schema is allowed
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PaymentEventHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_order_placed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OrderPlaced&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payment_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;initiate_payment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a real architecture that scales further than most teams&lt;br&gt;
think before the overhead of splitting services becomes worth paying.&lt;br&gt;
Shopify ran a version of this for years. Stack Overflow still does.&lt;br&gt;
They are not unsophisticated organisations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the good teams understand
&lt;/h2&gt;

&lt;p&gt;The teams that have figured out distributed systems share a&lt;br&gt;
perspective that took most of them several years and at least one&lt;br&gt;
failed microservices adoption to arrive at.&lt;/p&gt;

&lt;p&gt;Services are not about code organisation. They are about team&lt;br&gt;
organisation. A service boundary that does not correspond to a team&lt;br&gt;
boundary is overhead without benefit.&lt;/p&gt;

&lt;p&gt;The overhead of distributed systems is real, permanent, and&lt;br&gt;
compounding. You pay it forever. It needs to buy something worth&lt;br&gt;
having. For a team that is too large to coordinate, team autonomy&lt;br&gt;
is worth having. For a team that is not yet at that size, it is&lt;br&gt;
not.&lt;/p&gt;

&lt;p&gt;The correct direction of reasoning is: we have an organisational&lt;br&gt;
problem, what architecture solves it? Not: we have an architecture&lt;br&gt;
trend, what organisation do we need to adopt it?&lt;/p&gt;

&lt;p&gt;Microservices adopted as a technical decision produce the costs&lt;br&gt;
of distribution and the politics of boundary negotiation without&lt;br&gt;
the autonomy that makes them valuable. Microservices adopted as&lt;br&gt;
an organisational decision, by teams that have done the work of&lt;br&gt;
defining ownership and building operational foundations, produce&lt;br&gt;
systems that actually deliver what the pattern promises.&lt;/p&gt;

&lt;p&gt;The technology has never been the hard part.&lt;br&gt;
The hard part is everything the technology forces you to sort out first.&lt;br&gt;
Most teams skip that part and wonder why the technology failed them.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>devops</category>
      <category>ai</category>
      <category>career</category>
    </item>
    <item>
      <title>The GPU Is the New Database</title>
      <dc:creator>Benard Otieno</dc:creator>
      <pubDate>Wed, 20 May 2026 14:22:32 +0000</pubDate>
      <link>https://dev.to/benard_otieno_cdb9e6d4907/the-gpu-is-the-new-database-3b4i</link>
      <guid>https://dev.to/benard_otieno_cdb9e6d4907/the-gpu-is-the-new-database-3b4i</guid>
      <description>&lt;p&gt;_&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Twenty years ago, teams had no idea how to run databases at scale. They made every mistake possible before the patterns solidified. We are now in the same position with GPU infrastructure, making the same mistakes, faster.&lt;br&gt;
Find more articles at&lt;a href="//blog.bennerdo.org"&gt;This site&lt;/a&gt;&lt;br&gt;
_&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In 2004, if you were running a web application at any meaningful scale,&lt;br&gt;
your biggest infrastructure problem was the database. Not the application&lt;br&gt;
servers, those were stateless, you could add more. The database was the&lt;br&gt;
single stateful thing everything depended on, it didn't scale horizontally,&lt;br&gt;
it was expensive to run, and almost nobody knew how to operate it well.&lt;/p&gt;

&lt;p&gt;Teams made every mistake. They put too much logic in the application and&lt;br&gt;
not enough in the database. They put too much in the database and not&lt;br&gt;
enough in the application. They didn't index correctly. They didn't cache&lt;br&gt;
correctly. They scaled vertically until they couldn't, then scrambled to&lt;br&gt;
shard. They had no idea what their query plans looked like. They treated&lt;br&gt;
the database as a black box until it stopped working, then learned the&lt;br&gt;
hard way that it wasn't.&lt;/p&gt;

&lt;p&gt;Over the following decade, the patterns solidified. Connection pooling.&lt;br&gt;
Read replicas. Query analysis. Proper indexing strategy. Cache layers.&lt;br&gt;
The knowledge became common. The tools improved. Managed database services&lt;br&gt;
abstracted most of the complexity. Today a competent team can run a&lt;br&gt;
database at significant scale without extraordinary expertise.&lt;/p&gt;

&lt;p&gt;We are now, in 2026, in the same position with GPU infrastructure. The&lt;br&gt;
GPU is the new database, the expensive, stateful, poorly-understood&lt;br&gt;
bottleneck that everything AI depends on, that doesn't scale the way&lt;br&gt;
people expect, that is being operated badly by the majority of teams&lt;br&gt;
running it, and for which the patterns have not yet solidified.&lt;/p&gt;

&lt;p&gt;The teams that figure this out first will have an infrastructure advantage&lt;br&gt;
that is very difficult to close. The teams that don't will spend the&lt;br&gt;
next five years making the same mistakes everyone made with databases&lt;br&gt;
in 2004, just faster and more expensively.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why the GPU is not just a fast CPU
&lt;/h2&gt;

&lt;p&gt;The first mistake most teams make with GPU infrastructure is treating&lt;br&gt;
GPUs as very fast CPUs. They're not. They're a fundamentally different&lt;br&gt;
computational model, and the mismatch between that model and how most&lt;br&gt;
people use them is where most of the waste comes from.&lt;/p&gt;

&lt;p&gt;A CPU is optimised for latency ,completing a single complex task as&lt;br&gt;
quickly as possible. It has a small number of powerful cores, large&lt;br&gt;
caches, sophisticated branch prediction, and out-of-order execution.&lt;br&gt;
It's good at sequential logic, conditional branching, and tasks where&lt;br&gt;
each step depends on the result of the previous one.&lt;/p&gt;

&lt;p&gt;A GPU is optimised for throughput, completing an enormous number of&lt;br&gt;
simple tasks simultaneously. It has thousands of smaller, simpler cores.&lt;br&gt;
It's good at the same operation applied in parallel to a large amount&lt;br&gt;
of data. It's bad at anything sequential, anything with complex&lt;br&gt;
branching, and anything where you need to move data back to the CPU&lt;br&gt;
in the middle of computation.&lt;/p&gt;

&lt;p&gt;The practical consequence: a GPU that is not batching work is a GPU&lt;br&gt;
that is mostly idle. The most common pattern for teams deploying AI&lt;br&gt;
inference in production, one request comes in, run the model, return&lt;br&gt;
the result, wait for the next request, uses a small fraction of the&lt;br&gt;
GPU's actual capacity. The GPU's utilisation number looks reasonable.&lt;br&gt;
The GPU's actual computational throughput is terrible.&lt;/p&gt;

&lt;p&gt;This is the equivalent of a database that opens a new connection for&lt;br&gt;
every query, executes it, and closes the connection. Technically&lt;br&gt;
functional. Completely missing how the system should be used.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# What most teams do: one request, one inference
# GPU utilisation looks like 20-40%, but throughput is poor
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_inference_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# GPU mostly idle while waiting
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;


&lt;span class="c1"&gt;# What should be happening: dynamic batching
# Multiple requests grouped and processed together
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InferenceBatcher&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_batch_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_wait_ms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_batch_size&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_wait_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_wait_ms&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;infer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Future&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_batch_worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="n"&gt;deadline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_wait_ms&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Collect requests until batch is full or deadline passes
&lt;/span&gt;            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_batch_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;deadline&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;

            &lt;span class="n"&gt;prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="c1"&gt;# Single GPU call processes all requests simultaneously
&lt;/span&gt;            &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dynamic batching is the connection pooling of GPU inference. It is&lt;br&gt;
not optional if you care about cost or throughput. It is also not&lt;br&gt;
implemented by default in most hand-rolled inference deployments,&lt;br&gt;
for the same reason that early web applications didn't implement&lt;br&gt;
connection pooling: teams didn't know they needed it until they&lt;br&gt;
hit the wall.&lt;/p&gt;
&lt;h2&gt;
  
  
  The memory hierarchy nobody teaches you
&lt;/h2&gt;

&lt;p&gt;GPU memory is not like CPU memory. Understanding the difference is&lt;br&gt;
the difference between a system that works and one that doesn't, and&lt;br&gt;
between inference costs that are manageable and ones that are not.&lt;/p&gt;

&lt;p&gt;A GPU has its own on-device memory ,VRAM. VRAM is fast, finite,&lt;br&gt;
and expensive. A GPU with 80GB of VRAM is a very expensive GPU.&lt;br&gt;
The model you're running must fit in VRAM. If it doesn't fit, you&lt;br&gt;
can use techniques like quantization to make it smaller, or you can&lt;br&gt;
distribute it across multiple GPUs, but you cannot simply overflow&lt;br&gt;
to system RAM without taking a catastrophic performance hit. The&lt;br&gt;
bandwidth between CPU RAM and GPU VRAM is orders of magnitude slower&lt;br&gt;
than VRAM bandwidth. When you hear about models being "quantized&lt;br&gt;
to 4-bit," this is why 4-bit quantization halves the memory&lt;br&gt;
footprint roughly, which is the difference between fitting on one&lt;br&gt;
GPU and not fitting on one GPU.&lt;/p&gt;

&lt;p&gt;Within the GPU itself, there is a memory hierarchy that determines&lt;br&gt;
how fast computation runs. The KV cache, the cached attention&lt;br&gt;
computation for the tokens already processed in a conversation&lt;br&gt;
lives in VRAM and grows with sequence length. Managing KV cache&lt;br&gt;
is one of the most consequential performance decisions in LLM serving,&lt;br&gt;
and most teams don't think about it at all until they start hitting&lt;br&gt;
out-of-memory errors on long contexts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# KV cache management: what happens without it
# Each new token regenerates attention for the entire context
# Cost is O(n²) in sequence length
&lt;/span&gt;
&lt;span class="c1"&gt;# What vLLM and similar systems do differently:
# PagedAttention manages KV cache in fixed-size blocks
# like virtual memory paging in an OS
&lt;/span&gt;
&lt;span class="c1"&gt;# This allows:
# 1. Sharing KV cache between requests with the same prefix
# 2. Better memory utilisation (no internal fragmentation)
# 3. Handling variable-length sequences without pre-allocating
#    worst-case memory
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vllm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SamplingParams&lt;/span&gt;

&lt;span class="c1"&gt;# vLLM handles KV cache management automatically
# This is not a minor optimisation — it's 2-4x throughput improvement
# on typical workloads versus naive implementations
&lt;/span&gt;
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3-8b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gpu_memory_utilization&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.90&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Leave 10% headroom
&lt;/span&gt;    &lt;span class="n"&gt;max_model_len&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;# Maximum sequence length
&lt;/span&gt;    &lt;span class="n"&gt;enable_prefix_caching&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;# Cache common prefixes (system prompts)
&lt;/span&gt;    &lt;span class="n"&gt;tensor_parallel_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;# Number of GPUs for this model
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;sampling_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SamplingParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Prefix caching means your system prompt is computed once
# and cached for all subsequent requests — significant for
# long system prompts used with every inference
&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sampling_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most teams serving LLMs in production are not using PagedAttention.&lt;br&gt;
They're using naive inference implementations that waste fifty to&lt;br&gt;
seventy percent of their GPU memory to fragmentation and redundant&lt;br&gt;
computation. The cost difference is not marginal.&lt;/p&gt;
&lt;h2&gt;
  
  
  The scaling question everyone asks wrong
&lt;/h2&gt;

&lt;p&gt;When a team's AI infrastructure starts struggling under load, the&lt;br&gt;
first question is almost always: "should we add more GPUs?"&lt;/p&gt;

&lt;p&gt;This is the wrong question asked at the wrong time, for the same&lt;br&gt;
reason that "should we add more database servers" was the wrong&lt;br&gt;
first question when a database was struggling in 2008. The right&lt;br&gt;
question is: "why are we using our current GPUs so inefficiently?"&lt;/p&gt;

&lt;p&gt;GPU utilisation that is below sixty percent is almost always a&lt;br&gt;
batching problem. Requests are not being grouped efficiently before&lt;br&gt;
hitting the GPU. You can add more GPUs and halve your utilisation&lt;br&gt;
number, which means you now have twice the infrastructure running at&lt;br&gt;
thirty percent capacity instead of one set running at sixty. You've&lt;br&gt;
doubled your cost and solved nothing.&lt;/p&gt;

&lt;p&gt;GPU utilisation that is high but latency is still bad is almost&lt;br&gt;
always a model sizing problem. The model is too large for the request&lt;br&gt;
volume being served. A smaller quantized model, or a different&lt;br&gt;
architecture, may serve your request latency requirements at a&lt;br&gt;
fraction of the compute cost.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Measuring what actually matters before deciding to scale
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psutil&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;prometheus_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;

&lt;span class="c1"&gt;# These metrics tell you where the problem actually is
&lt;/span&gt;
&lt;span class="n"&gt;GPU_UTILISATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpu_utilisation_percent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GPU compute utilisation&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;device_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;GPU_MEMORY_USED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpu_memory_used_bytes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GPU VRAM in use&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;device_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inference_batch_size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Number of requests processed per batch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;TOKENS_PER_SECOND&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inference_tokens_per_second&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Throughput of inference in tokens per second&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;TIME_TO_FIRST_TOKEN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inference_ttft_seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Time from request to first token generated&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[.&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;REQUEST_QUEUE_DEPTH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inference_queue_depth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Number of requests waiting for GPU&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InstrumentedInferenceServer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;infer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;REQUEST_QUEUE_DEPTH&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;qsize&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

        &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_run_inference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;

        &lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;TOKENS_PER_SECOND&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you can see batch sizes, queue depth, tokens per second, and&lt;br&gt;
time-to-first-token alongside GPU utilisation and VRAM usage, the&lt;br&gt;
question of "do we need more GPUs" almost answers itself. Usually&lt;br&gt;
the answer is "no, we need to batch better" or "no, we need to use&lt;br&gt;
a smaller model" and scaling turns out to be unnecessary.&lt;/p&gt;
&lt;h2&gt;
  
  
  The cold start problem nobody planned for
&lt;/h2&gt;

&lt;p&gt;Databases take seconds to start. GPU inference servers take minutes.&lt;/p&gt;

&lt;p&gt;A database that restarts unexpectedly is back within thirty seconds&lt;br&gt;
in most cases. An LLM inference server that restarts needs to load&lt;br&gt;
model weights from storage into VRAM before it can serve any requests.&lt;br&gt;
A 70B parameter model stored in 4-bit quantization is roughly 35GB.&lt;br&gt;
Loading 35GB from network storage into VRAM, at typical cloud storage&lt;br&gt;
bandwidth, takes several minutes under good conditions.&lt;/p&gt;

&lt;p&gt;This changes incident dynamics entirely. A database blip is a brief&lt;br&gt;
interruption. A GPU server blip is a several-minute outage for every&lt;br&gt;
affected instance. Autoscaling, which works well for stateless&lt;br&gt;
application servers and adequately for databases, works badly for&lt;br&gt;
GPU inference because new instances take so long to become ready.&lt;/p&gt;

&lt;p&gt;The teams that have worked this out run warm pools ,GPU instances&lt;br&gt;
with models already loaded, sitting idle, waiting for traffic that&lt;br&gt;
hasn't arrived yet. This feels wasteful. It's the only way to handle&lt;br&gt;
traffic spikes without minutes-long latency blowouts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Kubernetes deployment with warm pool strategy&lt;/span&gt;
&lt;span class="c1"&gt;# Minimum replicas keep instances warm even at low traffic&lt;/span&gt;
&lt;span class="c1"&gt;# This costs money. The alternative is cold start latency.&lt;/span&gt;

&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-inference&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;  &lt;span class="c1"&gt;# Never scale below this. These are your warm pool.&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inference-server&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myteam/inference:latest&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
          &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
        &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
          &lt;span class="c1"&gt;# Model loading takes time. This probe must not pass&lt;/span&gt;
          &lt;span class="c1"&gt;# until the model is fully loaded in VRAM.&lt;/span&gt;
          &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;180&lt;/span&gt;   &lt;span class="c1"&gt;# 3 minutes minimum&lt;/span&gt;
          &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
          &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;       &lt;span class="c1"&gt;# 5 more minutes of retries&lt;/span&gt;
        &lt;span class="na"&gt;lifecycle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;preStop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;exec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="c1"&gt;# Drain in-flight requests before shutdown&lt;/span&gt;
              &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/bin/sh"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sleep&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;30"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;autoscaling/v2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HorizontalPodAutoscaler&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-inference-hpa&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scaleTargetRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-inference&lt;/span&gt;
  &lt;span class="na"&gt;minReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;    &lt;span class="c1"&gt;# Warm pool floor&lt;/span&gt;
  &lt;span class="na"&gt;maxReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;External&lt;/span&gt;
    &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;metric&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inference_queue_depth&lt;/span&gt;
      &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AverageValue&lt;/span&gt;
        &lt;span class="na"&gt;averageValue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;  &lt;span class="c1"&gt;# Scale when queue exceeds 5 requests per replica&lt;/span&gt;
  &lt;span class="na"&gt;behavior&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;scaleUp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;stabilizationWindowSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
      &lt;span class="na"&gt;policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pods&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
        &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;120&lt;/span&gt;   &lt;span class="c1"&gt;# Add 2 pods max every 2 minutes&lt;/span&gt;
                             &lt;span class="c1"&gt;# Fast enough to respond, slow enough&lt;/span&gt;
                             &lt;span class="c1"&gt;# to not over-provision during spikes&lt;/span&gt;
    &lt;span class="na"&gt;scaleDown&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;stabilizationWindowSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;600&lt;/span&gt;  &lt;span class="c1"&gt;# Wait 10 minutes before scaling down&lt;/span&gt;
                                       &lt;span class="c1"&gt;# Cold start cost makes yo-yo scaling&lt;/span&gt;
                                       &lt;span class="c1"&gt;# extremely expensive&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scaleDown stabilization window is long deliberately. The cold start&lt;br&gt;
cost is so high that scaling down and back up in response to a brief&lt;br&gt;
traffic dip is more expensive than just keeping the instances running.&lt;br&gt;
This is counterintuitive if you're coming from stateless web services.&lt;br&gt;
It's the operational reality of GPU infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost model is upside down
&lt;/h2&gt;

&lt;p&gt;Database costs scale with data volume and query complexity. You pay&lt;br&gt;
more as your data grows and your queries get more complex.&lt;/p&gt;

&lt;p&gt;GPU costs scale with time. You pay for every second a GPU exists,&lt;br&gt;
whether it's serving requests or not. An idle GPU costs the same as&lt;br&gt;
a busy GPU.&lt;/p&gt;

&lt;p&gt;This inverts the normal infrastructure economics. With stateless&lt;br&gt;
application servers, idle capacity is cheap, you can scale to zero&lt;br&gt;
when traffic drops and pay nothing. With GPU inference, scaling to&lt;br&gt;
zero means cold starts when traffic returns. The minimum viable&lt;br&gt;
capacity for a production inference service is not zero, it's&lt;br&gt;
whatever your warm pool needs to be, which is determined by your&lt;br&gt;
acceptable cold start latency and your traffic spike patterns.&lt;/p&gt;

&lt;p&gt;The teams that have made peace with this have stopped thinking about&lt;br&gt;
GPU cost as a variable cost that tracks usage and started thinking&lt;br&gt;
about it as a fixed cost that buys capacity. The question is not&lt;br&gt;
"how do we pay less for GPU when traffic is low?" The question is&lt;br&gt;
"what is the right amount of always-on capacity, and how do we make&lt;br&gt;
sure we use it efficiently?"&lt;/p&gt;

&lt;p&gt;Efficient use means high batch fill rates, high token throughput per&lt;br&gt;
GPU-hour, low idle time. The metrics above are the inputs to this&lt;br&gt;
calculation. Without them you're guessing about whether your&lt;br&gt;
infrastructure is sized correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern that's emerging
&lt;/h2&gt;

&lt;p&gt;The teams operating GPU infrastructure well in 2026 look, in their&lt;br&gt;
operational discipline, a lot like the teams that operated databases&lt;br&gt;
well in 2012 after enough people had been burned that the patterns&lt;br&gt;
were starting to solidify.&lt;/p&gt;

&lt;p&gt;They treat GPU utilisation as a lagging indicator and token throughput&lt;br&gt;
as the leading one. They instrument everything: batch sizes, queue&lt;br&gt;
depth, time-to-first-token, VRAM usage, KV cache hit rates. They&lt;br&gt;
size their warm pools based on measured traffic patterns rather&lt;br&gt;
than intuition. They run the smallest model that meets their quality&lt;br&gt;
bar, not the largest model they can afford, because smaller models&lt;br&gt;
batched efficiently outperform larger models batched poorly on&lt;br&gt;
almost every practical metric.&lt;/p&gt;

&lt;p&gt;They've also accepted something that takes a while to accept: that&lt;br&gt;
the right abstraction for GPU infrastructure is not "fast compute"&lt;br&gt;
but "throughput capacity." The question is not "how fast can this&lt;br&gt;
machine process one request?" GPUs are fast at that regardless.&lt;br&gt;
The question is "how many requests per dollar can this infrastructure&lt;br&gt;
handle at acceptable latency?" That question requires different&lt;br&gt;
metrics, different architecture, and a different mental model than&lt;br&gt;
the one most teams bring from their experience with CPU infrastructure.&lt;/p&gt;

&lt;p&gt;The database analogy runs deeper than it looks. In 2004, the teams&lt;br&gt;
that treated the database as a black box, put data in, get data&lt;br&gt;
out, add more RAM when it's slow, eventually hit walls that their&lt;br&gt;
architecture couldn't get past. The teams that understood what was&lt;br&gt;
happening inside the database query plans, index usage, lock&lt;br&gt;
contention, buffer pool behaviour built things that scaled.&lt;/p&gt;

&lt;p&gt;The GPU is not a black box. It has a memory hierarchy, a batching&lt;br&gt;
model, a cost structure, and performance characteristics that reward&lt;br&gt;
understanding and punish ignorance in the same way the database did.&lt;/p&gt;

&lt;p&gt;The patterns are forming. The teams learning them now will have the&lt;br&gt;
same advantage in five years that database-literate engineers had in 2015.&lt;/p&gt;

&lt;p&gt;The mistakes are happening right now, at scale, expensively.&lt;br&gt;
Most of them are the same mistakes. Most of them are avoidable.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>infrastructure</category>
      <category>devops</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Unit Tests Are Overrated and You Know It</title>
      <dc:creator>Benard Otieno</dc:creator>
      <pubDate>Tue, 19 May 2026 10:55:03 +0000</pubDate>
      <link>https://dev.to/benard_otieno_cdb9e6d4907/unit-tests-are-overrated-and-you-know-it-479b</link>
      <guid>https://dev.to/benard_otieno_cdb9e6d4907/unit-tests-are-overrated-and-you-know-it-479b</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;We test the wrong things obsessively and the right things barely at all. The unit test orthodoxy has produced codebases with 90% coverage that break constantly in production. It's time to say this out loud.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I'm going to say something that will make some people close this tab&lt;br&gt;
immediately: most unit tests are not worth the time it takes to write&lt;br&gt;
and maintain them, and the culture around unit testing has caused more&lt;br&gt;
harm to software quality than it has prevented.&lt;/p&gt;

&lt;p&gt;Not all unit tests. Not testing in general. Specifically the orthodoxy&lt;br&gt;
that says you should test every function, mock every dependency, aim for&lt;br&gt;
maximum coverage, and measure quality by how many green checkmarks your&lt;br&gt;
test runner produces.&lt;/p&gt;

&lt;p&gt;That orthodoxy is producing codebases that are simultaneously over-tested&lt;br&gt;
and under-validated. Teams that spend enormous engineering hours maintaining&lt;br&gt;
test suites that don't catch the bugs that actually affect users. Developers&lt;br&gt;
who spend more time making tests pass than making software work. Coverage&lt;br&gt;
reports that read ninety percent and services that break every other&lt;br&gt;
deployment.&lt;/p&gt;

&lt;p&gt;If this makes you uncomfortable, good. Stay with the discomfort for a&lt;br&gt;
minute, because the alternative is continuing to do something that doesn't&lt;br&gt;
work while calling it best practice.&lt;/p&gt;
&lt;h2&gt;
  
  
  What unit tests actually test
&lt;/h2&gt;

&lt;p&gt;A unit test tests a unit of code in isolation. The unit is typically a&lt;br&gt;
function or a class. The dependencies of that unit — other functions,&lt;br&gt;
databases, external services — are replaced with mocks or fakes that&lt;br&gt;
return controlled responses.&lt;/p&gt;

&lt;p&gt;This is valuable for exactly one category of problem: logic that lives&lt;br&gt;
in pure functions, isolated from external state, where the relationship&lt;br&gt;
between input and output is the entire thing being tested.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is worth unit testing. The logic is the point.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_discount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;customer_tier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_quantity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;customer_tier&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enterprise&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tier_discount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.20&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;customer_tier&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tier_discount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tier_discount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;quantity_discount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.05&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;order_quantity&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nc"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;total_discount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tier_discount&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;quantity_discount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.25&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;base_price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;total_discount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A unit test for this function is testing the right thing. The function&lt;br&gt;
is pure. Its behavior is entirely determined by its inputs. There are&lt;br&gt;
no external dependencies to mock. The test directly validates the&lt;br&gt;
business logic.&lt;/p&gt;

&lt;p&gt;Now look at what most unit tests actually test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is what unit tests look like in most codebases.
&lt;/span&gt;&lt;span class="nd"&gt;@patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;app.services.payment.stripe_client&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;app.services.payment.db&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;app.services.payment.email_service&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;app.services.payment.inventory_service&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_process_payment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;mock_inventory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mock_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mock_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mock_stripe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;mock_stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_payment_intent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Mock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pi_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;succeeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mock_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Mock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;49.99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mock_inventory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reserve&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="n"&gt;mock_email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;process_payment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;mock_stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_payment_intent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert_called_once&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;mock_inventory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reserve&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert_called_once&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;mock_email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert_called_once&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What is this test actually testing? It is testing that when everything&lt;br&gt;
works exactly as mocked, the function calls the mocked things in the&lt;br&gt;
expected order and returns the expected result.&lt;/p&gt;

&lt;p&gt;It is not testing what happens when Stripe returns an error. It is not&lt;br&gt;
testing what happens when the database is unavailable. It is not testing&lt;br&gt;
what happens when inventory reservation fails after payment succeeds,&lt;br&gt;
leaving a paid order in a broken state. It is not testing the actual&lt;br&gt;
integration between these components.&lt;/p&gt;

&lt;p&gt;It is testing that the code is wired together the way it was wired&lt;br&gt;
together when the test was written. It is a snapshot of the implementation&lt;br&gt;
masquerading as a validation of the behavior.&lt;/p&gt;

&lt;p&gt;And it will pass green on every run until the day something real breaks&lt;br&gt;
in production, at which point it will still pass green because the mocks&lt;br&gt;
are still returning what you told them to return.&lt;/p&gt;
&lt;h2&gt;
  
  
  The mock problem
&lt;/h2&gt;

&lt;p&gt;Mocks are the original sin of unit testing culture. They were created&lt;br&gt;
to solve a real problem — tests that depend on external services are&lt;br&gt;
slow, unreliable, and hard to set up — and they solved that problem&lt;br&gt;
by replacing the external service with a fake version that does whatever&lt;br&gt;
the test needs it to do.&lt;/p&gt;

&lt;p&gt;The consequence is that your test suite no longer tests your software.&lt;br&gt;
It tests your software's interaction with your software's assumptions&lt;br&gt;
about how its dependencies behave. When those assumptions are wrong —&lt;br&gt;
when the real Stripe API returns a response shape that's slightly&lt;br&gt;
different from what you mocked, when the real database has a different&lt;br&gt;
transaction isolation level than your mock assumes, when the real email&lt;br&gt;
service deduplicates in a way your mock doesn't — your tests pass and&lt;br&gt;
your production breaks.&lt;/p&gt;

&lt;p&gt;I have debugged more production incidents that were caused by the gap&lt;br&gt;
between mocked behavior and real behavior than I can count. The test&lt;br&gt;
said it worked. The mock said the API returned this. The real API&lt;br&gt;
does not return this. The test was wrong about the contract, and because&lt;br&gt;
the test was wrong, the code was deployed with a broken assumption that&lt;br&gt;
nobody caught.&lt;/p&gt;

&lt;p&gt;The more you mock, the less your tests tell you about whether the&lt;br&gt;
software works. This is not a design smell to be managed — it's a&lt;br&gt;
fundamental property of mocking. Every mock is a place where reality&lt;br&gt;
has been replaced with assumption.&lt;/p&gt;
&lt;h2&gt;
  
  
  The coverage lie
&lt;/h2&gt;

&lt;p&gt;Coverage is the most destructive metric in software engineering.&lt;/p&gt;

&lt;p&gt;Not because high coverage is bad. Because coverage as a target produces&lt;br&gt;
the wrong behavior. When coverage is a goal, developers write tests to&lt;br&gt;
cover code rather than to validate behavior. These are different&lt;br&gt;
activities that produce very different tests.&lt;/p&gt;

&lt;p&gt;A test written to cover code asks: how do I execute this line?&lt;br&gt;
A test written to validate behavior asks: what should this system do,&lt;br&gt;
and how do I know it's doing it?&lt;/p&gt;

&lt;p&gt;Tests written to cover code tend to be thin — they call the function&lt;br&gt;
with happy-path inputs and assert that it doesn't throw. They increase&lt;br&gt;
coverage. They do not increase confidence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Written to cover code. Gets you to 100% on this function.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_create_user&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="c1"&gt;# Written to validate behavior. Tests what actually matters.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_create_user_hashes_password&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;password_hash&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;verify_password&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;password_hash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_create_user_rejects_duplicate_email&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DuplicateEmailError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;different&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_create_user_sends_verification_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fake_email_sender&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fake_email_sender&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sent&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_create_user_with_invalid_email&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ValidationError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invalid email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not-an-email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second set has the same line coverage as the first if the function&lt;br&gt;
is simple. It tests fundamentally different things. A system with the&lt;br&gt;
first kind of tests has coverage. A system with the second kind has&lt;br&gt;
confidence.&lt;/p&gt;

&lt;p&gt;Coverage rewards quantity. Confidence comes from quality. These are&lt;br&gt;
not correlated, and treating them as if they are has produced an&lt;br&gt;
industry-wide habit of writing many low-value tests instead of fewer&lt;br&gt;
high-value ones.&lt;/p&gt;
&lt;h2&gt;
  
  
  What actually breaks in production
&lt;/h2&gt;

&lt;p&gt;Here is a list of things that unit tests, as typically practiced,&lt;br&gt;
will never catch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The query that works correctly against your test database with
twenty rows and times out against production with two million rows&lt;/li&gt;
&lt;li&gt;The race condition that only manifests when two requests hit the
same endpoint within fifty milliseconds of each other&lt;/li&gt;
&lt;li&gt;The API response from your payment provider that changed shape
slightly in a minor version update&lt;/li&gt;
&lt;li&gt;The session expiry behavior that's different in the production
Redis configuration than in the in-memory fake you test against&lt;/li&gt;
&lt;li&gt;The cascade delete behavior that your ORM handles differently than
the raw SQL you use in the migration script&lt;/li&gt;
&lt;li&gt;The encoding issue that only appears when a user's name contains
a character outside the ASCII range&lt;/li&gt;
&lt;li&gt;The timeout that is set correctly in the service but not propagated
to the client that calls it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every item on this list is a production incident I have personally&lt;br&gt;
been part of. None of them was caught by unit tests. Most of them&lt;br&gt;
would have been caught by integration tests that weren't written&lt;br&gt;
because the team was busy maintaining the unit test suite.&lt;/p&gt;

&lt;p&gt;This is the trade you make when you prioritize unit testing: you get&lt;br&gt;
fast, reliable tests that validate your assumptions, and you skip the&lt;br&gt;
slower, harder tests that would challenge them.&lt;/p&gt;
&lt;h2&gt;
  
  
  What to do instead
&lt;/h2&gt;

&lt;p&gt;I am not arguing for no tests. I'm arguing for tests calibrated to&lt;br&gt;
where the real risk is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration tests over unit tests for anything with dependencies.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a function touches a database, a cache, a message queue, or an&lt;br&gt;
external service — test it against the real thing, or as close to&lt;br&gt;
the real thing as you can get. Not a mock. Not an in-memory fake&lt;br&gt;
that you wrote. A real database with a real schema and real data&lt;br&gt;
volumes. A real Redis instance. A real queue.&lt;/p&gt;

&lt;p&gt;Yes, these tests are slower. Run them in CI, not on every save.&lt;br&gt;
They are dramatically more valuable than unit tests that mock the&lt;br&gt;
same dependencies because they test what the code actually does,&lt;br&gt;
not what you assumed the code would do.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is worth the setup cost. It catches real problems.
&lt;/span&gt;&lt;span class="nd"&gt;@pytest.mark.integration&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_process_payment_handles_stripe_card_declined&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;test_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# Real PostgreSQL, real schema
&lt;/span&gt;    &lt;span class="n"&gt;stripe_mock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Stripe's own test environment, not our mock
&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;create_test_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;49.99&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Stripe's test mode has real card numbers that trigger specific behaviors
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;process_payment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;card_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tok_chargeDeclined&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Stripe test token for declines
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;failure_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;card_declined&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Verify the order status was updated correctly in the real database
&lt;/span&gt;    &lt;span class="n"&gt;updated_order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;test_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT status FROM orders WHERE id = $1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;updated_order&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payment_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Verify no inventory was reserved for a failed payment
&lt;/span&gt;    &lt;span class="n"&gt;reservation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;test_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id FROM inventory_reservations WHERE order_id = $1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;reservation&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test uses a real database and Stripe's test environment. It is&lt;br&gt;
slower than a mocked unit test. It tests whether the actual system&lt;br&gt;
behaves correctly when a real dependency does something unexpected.&lt;br&gt;
It is the test you actually need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test behavior at the system boundary, not implementation in the middle.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most valuable tests are the ones that call your API, your&lt;br&gt;
message handler, your batch job — the public interface of your&lt;br&gt;
system — and assert on the observable output. Not which functions&lt;br&gt;
were called, not which mocks were invoked. What came out.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@pytest.mark.integration&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_order_api_returns_correct_status_after_payment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;test_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Create an order through the API
&lt;/span&gt;    &lt;span class="n"&gt;create_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;create_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;create_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Process payment through the API
&lt;/span&gt;    &lt;span class="n"&gt;payment_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/orders/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/pay&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;card_token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tok_visa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;payment_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

    &lt;span class="c1"&gt;# Verify the order status reflects the payment
&lt;/span&gt;    &lt;span class="n"&gt;order_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/orders/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;order_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;order_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;succeeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test goes through the API, through the service layer, through&lt;br&gt;
the database, and back. It validates the entire vertical slice. It&lt;br&gt;
would catch a bug in the API handler, a bug in the service logic,&lt;br&gt;
a bug in the database query, or a bug in the response serializer.&lt;br&gt;
A unit test that mocked all the layers would catch none of these&lt;br&gt;
except the one in the specific layer being tested.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reserve unit tests for pure logic.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unit tests are excellent for exactly what they're suited for:&lt;br&gt;
pure functions with complex branching logic where the relationship&lt;br&gt;
between input and output is the whole point. Discount calculations.&lt;br&gt;
Validation rules. Data transformations. Parsing logic. Algorithms.&lt;/p&gt;

&lt;p&gt;These are worth unit testing because the test is actually testing&lt;br&gt;
the logic. There's nothing to mock. The test runs in microseconds.&lt;br&gt;
Failures tell you exactly what's wrong.&lt;/p&gt;

&lt;p&gt;For everything else — anything that touches infrastructure, anything&lt;br&gt;
that coordinates between components, anything that talks to external&lt;br&gt;
systems — integration tests are not just better, they're the only&lt;br&gt;
tests that tell you anything true.&lt;/p&gt;

&lt;h2&gt;
  
  
  The heresy in full
&lt;/h2&gt;

&lt;p&gt;Here is the position I'm staking out, clearly, so it can be clearly&lt;br&gt;
disagreed with:&lt;/p&gt;

&lt;p&gt;A codebase with forty percent coverage from integration tests that&lt;br&gt;
test real behavior against real dependencies is more reliable than&lt;br&gt;
a codebase with ninety percent coverage from unit tests that mock&lt;br&gt;
every external interaction.&lt;/p&gt;

&lt;p&gt;Coverage is not quality. Mocks are not validation. A green test suite&lt;br&gt;
is not a guarantee that the software works — it's a guarantee that&lt;br&gt;
the software works according to the assumptions baked into the tests,&lt;br&gt;
which may or may not match reality.&lt;/p&gt;

&lt;p&gt;The software quality crisis is not a testing crisis. We test more&lt;br&gt;
than we ever have. The crisis is a misalignment between what we test&lt;br&gt;
and what breaks. We test pure logic obsessively and integration&lt;br&gt;
boundaries barely. The bugs live at the integration boundaries. They&lt;br&gt;
always have.&lt;/p&gt;

&lt;p&gt;The counterargument I hear most often: integration tests are slow.&lt;br&gt;
Yes. They are. They are slow because they do real things. Real things&lt;br&gt;
take time. The alternative is fast tests that don't do real things&lt;br&gt;
and therefore don't tell you whether the real things work.&lt;/p&gt;

&lt;p&gt;Speed is not a virtue in a test suite. Accuracy is.&lt;/p&gt;




&lt;p&gt;I expect this to generate disagreement. That's fine. The developers&lt;br&gt;
most likely to disagree are the ones who have invested the most in&lt;br&gt;
unit testing culture, which makes their disagreement somewhat&lt;br&gt;
self-referential. The developers most likely to agree quietly are the&lt;br&gt;
ones who have been paged at 3am because a perfectly unit-tested&lt;br&gt;
function didn't work the way its mocks said it would.&lt;/p&gt;

&lt;p&gt;Those developers know. They've always known. This is just someone&lt;br&gt;
finally saying it.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>architecture</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>You Are Building for the Wrong User</title>
      <dc:creator>Benard Otieno</dc:creator>
      <pubDate>Sun, 17 May 2026 10:58:59 +0000</pubDate>
      <link>https://dev.to/benard_otieno_cdb9e6d4907/you-are-building-for-the-wrong-user-1e17</link>
      <guid>https://dev.to/benard_otieno_cdb9e6d4907/you-are-building-for-the-wrong-user-1e17</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;The user in your head when you make product decisions is not your actual user. The gap between those two people is where most product failures live.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every product decision gets made with a user in mind. When you design&lt;br&gt;
an API, you're imagining how it will be called. When you write error&lt;br&gt;
messages, you're imagining who will read them. When you decide how a&lt;br&gt;
feature should work, you're imagining someone using it. When you choose&lt;br&gt;
what to build next, you're imagining who will need it.&lt;/p&gt;

&lt;p&gt;The problem is that this user — the one in your head — is almost always&lt;br&gt;
wrong. Not slightly wrong. Systematically, structurally wrong in ways that&lt;br&gt;
compound with every decision.&lt;/p&gt;

&lt;p&gt;And the engineers and teams who understand this build qualitatively&lt;br&gt;
different things from the ones who don't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who the imagined user actually is
&lt;/h2&gt;

&lt;p&gt;The user most engineers imagine when building is a technically sophisticated,&lt;br&gt;
highly motivated person who wants to use the product correctly. They read&lt;br&gt;
the documentation before starting. They understand the conceptual model&lt;br&gt;
the product is built around. They know what they want to achieve and they're&lt;br&gt;
trying to figure out how the product helps them achieve it.&lt;/p&gt;

&lt;p&gt;This person does not exist in your user base in large numbers. They exist&lt;br&gt;
among early adopters, among the colleagues who gave you feedback when you&lt;br&gt;
were building, among the developers on your own team who use the product&lt;br&gt;
to test it. They are massively overrepresented in the feedback you receive&lt;br&gt;
because they're the ones who care enough to give feedback. They are&lt;br&gt;
massively underrepresented in your actual user base.&lt;/p&gt;

&lt;p&gt;Your actual users are not unsophisticated. They're busy. They have&lt;br&gt;
twenty-three other things demanding their attention. They encounter your&lt;br&gt;
product in the middle of trying to accomplish something else. They have not&lt;br&gt;
read the documentation. They will not read the documentation. They are&lt;br&gt;
trying to figure out if your product can solve their problem in the next&lt;br&gt;
ninety seconds, and if it's not obvious that it can, they will stop trying.&lt;/p&gt;

&lt;p&gt;These are the same people. The difference is not intelligence or capability.&lt;br&gt;
It's context. The imagined user has context — they know what the product&lt;br&gt;
does, they're focused on it, they're motivated to learn it. The actual user&lt;br&gt;
has none of that. They showed up to solve a problem. Whether they stay&lt;br&gt;
depends entirely on how quickly they can see that the product helps.&lt;/p&gt;

&lt;p&gt;Every design decision made for the imagined user makes the product slightly&lt;br&gt;
worse for the actual user. A feature that's powerful but requires&lt;br&gt;
configuration: the imagined user configures it. The actual user sees a blank&lt;br&gt;
state and leaves. An error message that's technically accurate: the imagined&lt;br&gt;
user understands it. The actual user doesn't know what to do next.&lt;br&gt;
Documentation that's comprehensive and well-organized: the imagined user&lt;br&gt;
reads it. The actual user never opens it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The specific ways this goes wrong
&lt;/h2&gt;

&lt;p&gt;I want to be concrete, because "build for actual users" is advice that&lt;br&gt;
sounds obvious and is almost universally ignored, and the reason it's&lt;br&gt;
ignored is that the failure modes are invisible until you're looking for them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Onboarding built for people who already understand the product.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most expensive minute in any product's relationship with a user is&lt;br&gt;
the first one. Not expensive in compute cost — expensive in the sense&lt;br&gt;
that the user is forming the impression that determines whether they&lt;br&gt;
ever come back. The product gets roughly sixty seconds to communicate:&lt;br&gt;
what this does, whether it's for you, and what to do first.&lt;/p&gt;

&lt;p&gt;Most onboarding fails this because it's designed by people who deeply&lt;br&gt;
understand the product, which makes it impossible for them to accurately&lt;br&gt;
simulate not understanding it. The team knows what the product does,&lt;br&gt;
so the product's value feels obvious. The team knows the mental model&lt;br&gt;
the product is built around, so the conceptual framework feels natural.&lt;br&gt;
The actual new user has none of this scaffolding, and the onboarding&lt;br&gt;
that feels clear to the team is opaque to them.&lt;/p&gt;

&lt;p&gt;The fix is not better copy. It's building onboarding by watching people&lt;br&gt;
who have never seen the product try to use it. Not asking them what&lt;br&gt;
they think. Watching what they do. Where do they click first? Where do&lt;br&gt;
they stop? Where do they look confused? Where do they give up?&lt;/p&gt;

&lt;p&gt;Five sessions of this will surface more real problems than a month of&lt;br&gt;
internal review, and most teams have never done it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error messages written for developers.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Error messages are the product's voice in the moment the user is most&lt;br&gt;
frustrated. They are almost universally written by the developer who&lt;br&gt;
implemented the feature, for an audience of developers who understand&lt;br&gt;
the system.&lt;br&gt;
Error: Invalid parameter 'start_date'. Expected ISO 8601 format.&lt;/p&gt;

&lt;p&gt;This message is technically accurate. The developer reading it knows&lt;br&gt;
immediately what to fix. The non-developer user — or even the developer&lt;br&gt;
who is not familiar with ISO 8601 — reads this and has several questions:&lt;br&gt;
what's a parameter? What's ISO 8601? What did I type that was wrong?&lt;br&gt;
What should I type instead?&lt;/p&gt;

&lt;p&gt;The message answered none of these questions. It described the problem&lt;br&gt;
in terms that require prior knowledge to decode. It provided no path&lt;br&gt;
forward.&lt;br&gt;
The start date you entered isn't in the right format.&lt;br&gt;
Try: 2026-05-16  (year-month-day)&lt;br&gt;
You entered: 05/16/2026&lt;/p&gt;

&lt;p&gt;Same information. Different audience. The second version costs nothing&lt;br&gt;
extra to implement and turns a moment of frustration into a moment of&lt;br&gt;
clarity. Most error messages in most products are written like the first.&lt;/p&gt;

&lt;p&gt;The reason is that the developer writes the error message while&lt;br&gt;
implementing the validation logic, in the mental context of the&lt;br&gt;
implementation, for an imagined user who shares that context. The actual&lt;br&gt;
user is never imagined at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features complete enough to ship but not complete enough to use.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There is a version of shipping fast that is genuinely good: getting a&lt;br&gt;
working feature in front of users quickly, learning from real usage,&lt;br&gt;
iterating. There is a version that is genuinely bad: shipping something&lt;br&gt;
that is technically functional but missing the parts that make it&lt;br&gt;
actually usable, because those parts didn't make it into the sprint.&lt;/p&gt;

&lt;p&gt;The difference between these two versions is whether the shipped thing&lt;br&gt;
works for actual users or only for imagined ones.&lt;/p&gt;

&lt;p&gt;An API endpoint that returns data but has no pagination is complete&lt;br&gt;
enough for the imagined user, who is building a demo with twenty records.&lt;br&gt;
It is not complete enough for the actual user, who is trying to process&lt;br&gt;
a real dataset. A form that collects information but has no confirmation&lt;br&gt;
state works for the imagined user, who is testing the happy path. It&lt;br&gt;
doesn't work for the actual user, who isn't sure if their submission went&lt;br&gt;
through and submits again, creating duplicates.&lt;/p&gt;

&lt;p&gt;The imagined user will find a way to make incomplete features work.&lt;br&gt;
The actual user will encounter the gap between the feature and their&lt;br&gt;
reality and leave. The sprint velocity that comes from shipping incomplete&lt;br&gt;
features is borrowed against the retention you lose when actual users&lt;br&gt;
encounter them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation written for the moment of maximum understanding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Documentation is written by someone who understands the product deeply,&lt;br&gt;
at the moment when they understand it most deeply, and then it is&lt;br&gt;
assumed to be complete.&lt;/p&gt;

&lt;p&gt;The person reading the documentation is someone who understands the&lt;br&gt;
product least, at the moment when they need help most. These are&lt;br&gt;
maximally mismatched participants. The writer has internalized everything&lt;br&gt;
that the reader doesn't yet know. What feels like a clear explanation&lt;br&gt;
to the writer is often a chain of assumptions that the reader can't&lt;br&gt;
follow.&lt;/p&gt;

&lt;p&gt;The specific failure: concepts used before they're explained. A getting&lt;br&gt;
started guide that says "first, configure your workspace" where "workspace"&lt;br&gt;
is a domain concept that the reader doesn't yet understand. A reference&lt;br&gt;
document that uses the product's internal terminology throughout, assuming&lt;br&gt;
the reader has already acquired that vocabulary. A tutorial that assumes&lt;br&gt;
the reader knows why they'd want to do what they're being shown, rather&lt;br&gt;
than establishing the motivation first.&lt;/p&gt;

&lt;p&gt;The person who wrote this documentation was not explaining from first&lt;br&gt;
principles. They were documenting from expertise. Those are different&lt;br&gt;
cognitive activities and they produce different artifacts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data you're probably not looking at
&lt;/h2&gt;

&lt;p&gt;Most teams have more data about their actual users than they use. The&lt;br&gt;
data is uncomfortable, so it gets looked at less than it should.&lt;/p&gt;

&lt;p&gt;Session recordings of real users encountering real problems. The vast&lt;br&gt;
majority of teams with access to session recording tools use them&lt;br&gt;
reactively — to investigate a specific reported problem — rather than&lt;br&gt;
proactively to understand where users generally struggle. A few hours&lt;br&gt;
of watching session recordings from new users will show you more about&lt;br&gt;
where your product fails actual users than any amount of internal review.&lt;/p&gt;

&lt;p&gt;Activation funnel drop-off. Where in the onboarding flow do users stop?&lt;br&gt;
Most teams know this number but don't sit with what it implies. A 60%&lt;br&gt;
drop-off at step three of onboarding means four in ten users who started&lt;br&gt;
your onboarding never got to step four. What is step three? What does&lt;br&gt;
it ask the user to do? Is it actually necessary at that point, or is it&lt;br&gt;
there because it was the logical next step from an implementation&lt;br&gt;
perspective?&lt;/p&gt;

&lt;p&gt;Support tickets, literally read. Not summarised. Not categorised. Read,&lt;br&gt;
one by one, by the people who made the design decisions that generated&lt;br&gt;
them. The support ticket is the user telling you, in their own words,&lt;br&gt;
what your product did that didn't match their expectation. It is&lt;br&gt;
unmediated feedback from actual users about actual failure modes. Most&lt;br&gt;
teams process support tickets through a support function and those&lt;br&gt;
learnings never reach the people making the product decisions.&lt;/p&gt;

&lt;p&gt;Search queries within the product, if you have a search function or a&lt;br&gt;
help center with search. What are users typing? The search query is&lt;br&gt;
the user telling you what they're looking for that they couldn't find&lt;br&gt;
on their own. A user who searches "how do I delete my account" is a&lt;br&gt;
user who couldn't find the account deletion flow. A user who searches&lt;br&gt;
"why is my data wrong" is a user encountering a data integrity problem&lt;br&gt;
they don't understand. The aggregate of these queries is a map of where&lt;br&gt;
your product is failing actual users.&lt;/p&gt;

&lt;h2&gt;
  
  
  The proximity problem
&lt;/h2&gt;

&lt;p&gt;The reason teams build for the imagined user rather than the actual user&lt;br&gt;
is structural, not intentional. It's a proximity problem.&lt;/p&gt;

&lt;p&gt;The people making product decisions are close to the product and far&lt;br&gt;
from the users. They understand how the product works, why it works that&lt;br&gt;
way, what the tradeoffs were. They use the product themselves, but they&lt;br&gt;
use it with expert knowledge that insulates them from the actual experience&lt;br&gt;
of a new user. When they imagine a user, they imagine someone like&lt;br&gt;
themselves with the same context they have.&lt;/p&gt;

&lt;p&gt;The actual users are far from the people making decisions and leave&lt;br&gt;
signals that are filtered, delayed, and translated before they reach&lt;br&gt;
anyone who can act on them. A user who struggles and leaves doesn't file&lt;br&gt;
a bug report. They just don't come back. A user who figures something&lt;br&gt;
out eventually doesn't report that it was hard. They just move on. The&lt;br&gt;
signals that make it back are the ones from the vocal minority who cared&lt;br&gt;
enough to write something down, which is not a representative sample.&lt;/p&gt;

&lt;p&gt;Closing this gap requires deliberate effort because it doesn't close&lt;br&gt;
on its own. The product gets more complex and the team's expertise&lt;br&gt;
increases over time, which means the gap between their mental model&lt;br&gt;
and the new user's experience widens if nothing is done to counter it.&lt;/p&gt;

&lt;p&gt;The specific practices that work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regular sessions watching new users.&lt;/strong&gt; Not asking for opinions.&lt;br&gt;
Watching where they click, where they pause, where they read, where&lt;br&gt;
they give up. Monthly, with the whole team watching, not just the&lt;br&gt;
designer or the PM. Watching is a different cognitive activity from&lt;br&gt;
asking. Watching gives you behavior. Asking gives you rationalizations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Someone on the team responsible for carrying the user's perspective.&lt;/strong&gt;&lt;br&gt;
Not a UX researcher who files reports that get read and filed. Someone&lt;br&gt;
with standing in product discussions who can say "a user who doesn't&lt;br&gt;
know what a workspace is would not understand this" and have that&lt;br&gt;
land as a real input to the decision. The imagined user has many&lt;br&gt;
advocates on the team — everyone building the product is effectively&lt;br&gt;
advocating for the imagined user's needs. The actual user needs&lt;br&gt;
an explicit advocate because their perspective is not naturally&lt;br&gt;
represented in the room.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Requiring first-use documentation.&lt;/strong&gt; Before any feature ships, someone&lt;br&gt;
who didn't build it has to be able to use it with no guidance. Not&lt;br&gt;
as a QA pass. As a design gate. If the person who didn't build it&lt;br&gt;
needs explanation to use the feature, the feature is not ready for&lt;br&gt;
actual users who also won't receive explanation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reading your own error messages as a user.&lt;/strong&gt; Take the last five error&lt;br&gt;
messages that appeared in your logs or support tickets. Read them as&lt;br&gt;
someone who doesn't know your system. What do they tell you to do?&lt;br&gt;
If the answer is "nothing concrete," the error messages are for your&lt;br&gt;
debugger, not for your user.&lt;/p&gt;

&lt;h2&gt;
  
  
  The version of this that compounds
&lt;/h2&gt;

&lt;p&gt;The teams that build for actual users from the beginning develop a&lt;br&gt;
capability that's hard to acquire later: accurate intuition about where&lt;br&gt;
their product fails people who are not already experts in it.&lt;/p&gt;

&lt;p&gt;This intuition is worth more than it looks like on paper. It's the&lt;br&gt;
thing that means you don't have to watch session recordings before every&lt;br&gt;
release, because the person who would have struggled with this is already&lt;br&gt;
present in the designer's mind during design. It's the thing that means&lt;br&gt;
your error messages are clear because clarity for actual users is already&lt;br&gt;
the default, not an afterthought. It's the thing that means your&lt;br&gt;
onboarding works for the user who's distracted and skeptical, not just&lt;br&gt;
for the one who's engaged and motivated.&lt;/p&gt;

&lt;p&gt;Building this intuition requires sustained exposure to actual users&lt;br&gt;
struggling with the actual product. There's no shortcut. The teams that&lt;br&gt;
have it got it by watching, regularly, without the filter of what the&lt;br&gt;
product was supposed to do, the reality of what actual users experience&lt;br&gt;
when they encounter it.&lt;/p&gt;

&lt;p&gt;The imagined user is comfortable. They use the product well, they&lt;br&gt;
appreciate the features, they understand the mental model. Building for&lt;br&gt;
them is building for the team's own reflection.&lt;/p&gt;

&lt;p&gt;The actual user is uncomfortable to watch. They do unexpected things.&lt;br&gt;
They miss obvious affordances. They read things wrong. They give up&lt;br&gt;
at moments that feel, to the team, like they should be easy. Watching&lt;br&gt;
this is genuinely difficult when the product is something you made.&lt;/p&gt;

&lt;p&gt;It's also the only accurate feedback you have on whether what you made&lt;br&gt;
works.&lt;/p&gt;

&lt;p&gt;Build for the person in the session recording, not the person in your&lt;br&gt;
head. They are not the same person. One of them is your actual user.&lt;/p&gt;

</description>
      <category>career</category>
      <category>architecture</category>
      <category>dev</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Pick Boring Technology. Yes, Especially for AI.</title>
      <dc:creator>Benard Otieno</dc:creator>
      <pubDate>Fri, 15 May 2026 11:50:52 +0000</pubDate>
      <link>https://dev.to/benard_otieno_cdb9e6d4907/pick-boring-technology-yes-especially-for-ai-2021</link>
      <guid>https://dev.to/benard_otieno_cdb9e6d4907/pick-boring-technology-yes-especially-for-ai-2021</guid>
      <description>&lt;h2&gt;
  
  
  What "Boring" Actually Means
&lt;/h2&gt;

&lt;p&gt;Boring technology does not mean old technology. It does not mean slow, limited, or low-quality. It means technology that has been in production long enough that its failure modes are documented, its operational characteristics are well understood, and the person debugging it at 2am has a reasonable chance of finding a Stack Overflow answer that is not from a beta forum post in 2024.&lt;/p&gt;

&lt;p&gt;Postgres is boring. Redis is boring. S3 is boring. A plain HTTP API with JSON is boring. SQLite, for things that fit in SQLite, is boring. None of these things are slow, limited, or embarrassing to use. They are boring because they have been deployed by enough people, at enough scale, for long enough that the surprises have mostly been found. The surface area of "things that can go wrong that nobody has written about" is small.&lt;/p&gt;

&lt;p&gt;When Dan McKinley wrote the Choose Boring Technology essay in 2015, he framed it as a budget: you get a limited number of new technologies per project, and you should spend that budget intentionally. That framing is still correct. What's changed is that AI products have a non-negotiable budget item now: the model and the scaffolding around it. That item is expensive. It is genuinely new. It has failure modes that nobody fully understands yet. That is the place you are choosing to spend your novelty budget. Everywhere else, the argument for boring is stronger than it has ever been.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Vector Database Problem
&lt;/h2&gt;

&lt;p&gt;The most common place I see this play out is in the retrieval layer. A team is building RAG — retrieval-augmented generation, some form of semantic search over a corpus. They need to store embeddings and query them by similarity. There are purpose-built vector databases for this: Pinecone, Weaviate, Qdrant, Chroma. They have impressive benchmarks, polished SDKs, and marketing copy that makes Postgres look like a horse and buggy.&lt;/p&gt;

&lt;p&gt;So teams reach for them. Then six months later they are managing two separate databases — Postgres for everything else, Pinecone for vectors — running two separate migration workflows, debugging sync issues between them, and paying for an additional managed service. The team that wanted to move fast has added an operational surface area that requires dedicated attention.&lt;/p&gt;

&lt;p&gt;pgvector exists. It is a Postgres extension. It is boring. It stores vectors in Postgres, queries them in Postgres, transactions with them in Postgres. You run one database. You use the migration tooling you already have. You query it with SQL you already know. The performance ceiling is lower than a dedicated system optimised for nothing but ANN search — but the teams I've talked to who hit that ceiling with pgvector are building at a scale where infrastructure complexity is genuinely their problem to manage. Most teams are not those teams.&lt;/p&gt;

&lt;p&gt;The right question is not "what is the best vector database." It is "what is the simplest thing that handles my actual query volume, that I can operate with my existing knowledge, that does not require me to manage data consistency across two systems." The answer to that question, for most products, is Postgres.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- pgvector: you already know how to do this&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;         &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt;    &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt;  &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;metadata&lt;/span&gt;   &lt;span class="n"&gt;jsonb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Retrieval: pure SQL, same connection pool as everything else&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the retrieval layer for a production RAG system. It is a Postgres query. You already know how to read it, index it, back it up, and monitor it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent Frameworks Are the Same Problem, Bigger
&lt;/h2&gt;

&lt;p&gt;The vector database situation is a contained example. Agent frameworks are the same problem, scaled up.&lt;/p&gt;

&lt;p&gt;There are now a meaningful number of agent frameworks in active development: LangChain, LangGraph, AutoGen, CrewAI, Pydantic AI, and several more depending on when you are reading this. They differ in their abstractions for tool calling, memory management, multi-agent coordination, and state persistence. Some of them are good. Some of them are in the process of becoming good. All of them are new enough that you are, to some degree, a beta tester.&lt;/p&gt;

&lt;p&gt;The alternative is to not use a framework for the parts that don't require one. The model's tool-calling API is not complicated. You define tools as JSON schemas. The model returns a function call. You route it and return the result. That is the loop. You can implement the core of it in a hundred lines of Python that &lt;em&gt;you wrote&lt;/em&gt;, that you understand completely, that has no transitive dependencies you didn't choose.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_handlers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# No tool use → we're done
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Process tool calls
&lt;/span&gt;        &lt;span class="n"&gt;tool_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_handlers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No handler for tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Extend conversation with model turn and tool results
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a complete agentic loop. No framework. No magic. Every line of it is readable by someone who has never seen it before. When it breaks, you know where to look. When you need to add a checkpoint before a destructive operation, you know exactly where to put it. When a framework update ships a breaking change to how tool results are structured, you are unaffected because you wrote the tool result handling yourself.&lt;/p&gt;

&lt;p&gt;Frameworks earn their keep when they solve problems you genuinely have: complex multi-agent coordination, built-in state persistence, graph-based execution flows where you need cycle detection and conditional edges. If you have those problems, use a framework. But reach for your own loop first, and upgrade to a framework when you have a reason, not because the README has a compelling architecture diagram.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Counterargument Is Real
&lt;/h2&gt;

&lt;p&gt;I want to be honest about the case against this position, because it is not trivial.&lt;/p&gt;

&lt;p&gt;Boring technology is not always available in the form you need. pgvector has a performance ceiling. If you are running similarity search across a hundred million vectors with sub-10ms latency requirements, you need a dedicated ANN index and the purpose-built databases are probably worth their operational cost. If your agent coordination is genuinely complex — multiple agents with heterogeneous capabilities, conditional routing based on intermediate state, nested tool calls — a framework that has solved those problems is better than reinventing it.&lt;/p&gt;

&lt;p&gt;The real trap is not "using new technology." It is using new technology as the &lt;em&gt;default&lt;/em&gt; rather than as the &lt;em&gt;exception&lt;/em&gt;. When you reach for Pinecone before asking whether pgvector handles your actual query volume, you have made a choice you probably did not mean to make. The question is whether you made it consciously.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changes When AI Is Involved
&lt;/h2&gt;

&lt;p&gt;The argument for boring technology is not new. What AI changes is the &lt;em&gt;urgency&lt;/em&gt; of it, for a specific reason: the model is already the source of novel, hard-to-predict behavior in your system. The model hallucinates. The model handles edge cases in ways you did not anticipate. The model's output quality varies with context length, with phrasing, with temperature settings you forgot you changed. The model is a continuous source of surprises, and managing those surprises is the actual engineering work.&lt;/p&gt;

&lt;p&gt;When the model is already the unpredictable component, adding unpredictable infrastructure around it is compounding risk. A flaky external API call in your tool chain plus a model that sometimes decides to call that tool three times in a row plus a vector database that occasionally returns inconsistent results under concurrent load is not three small problems. It is three small problems that interact in ways you cannot enumerate in advance.&lt;/p&gt;

&lt;p&gt;Boring infrastructure shrinks the problem space. When the retrieval layer is Postgres and the queue is Redis and the API is plain HTTP, the list of things that can behave unexpectedly in hard-to-reproduce ways is shorter. You are not eliminating surprises — the model will still surprise you — but you are constraining where they can come from.&lt;/p&gt;

&lt;p&gt;The system that is easiest to debug is not the one with the fewest components. It is the one where the largest number of components have predictable, documented behavior. Build toward that, and let the model be the interesting part.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Heuristic I Actually Use
&lt;/h2&gt;

&lt;p&gt;When evaluating a new technology for an AI product, I ask three questions before I let it into the stack:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. What happens when this fails?&lt;/strong&gt; Not "can it fail" — everything can fail. What does failure look like? Is it a clean error or silent corruption? Is it recoverable without data loss? Is there a runbook for it, or will I be writing one?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Can I replace it in a weekend?&lt;/strong&gt; This is not about whether I will replace it. It is about whether the abstraction is thin enough that swapping the implementation does not require a rewrite. If replacing the vector store requires touching thirty files, the abstraction is wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Does my boring alternative exist and have I ruled it out?&lt;/strong&gt; Postgres, Redis, S3, plain HTTP. If one of these handles the problem, I need a specific reason not to use it — not just a feeling that the new thing is more purpose-built.&lt;/p&gt;

&lt;p&gt;If a technology passes all three, it can earn its place in the stack. If it fails the first question and the second and the third, the burden of proof is high.&lt;/p&gt;




&lt;p&gt;The teams that ship boring AI products are not the teams that lack ambition. They are the teams that understand where the ambition should go. The model is where the novel bets live. The model is where you spend the engineering attention on failure modes you have never seen before, on evaluation strategies that do not exist in textbooks yet, on product decisions that require genuine taste about AI behavior. That is the hard, interesting work.&lt;/p&gt;

&lt;p&gt;Letting the infrastructure be interesting too is not ambitious. It is just expensive.&lt;/p&gt;

&lt;p&gt;Make the retrieval layer boring. Make the queue boring. Make the API boring. Let Postgres handle the things Postgres is good at, which turn out to be most things. And spend the attention you just freed up on the part of the system that actually requires it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>tooling</category>
      <category>devops</category>
    </item>
    <item>
      <title>Your Observability Is Looking at the Wrong Things</title>
      <dc:creator>Benard Otieno</dc:creator>
      <pubDate>Thu, 14 May 2026 16:01:35 +0000</pubDate>
      <link>https://dev.to/benard_otieno_cdb9e6d4907/your-observability-is-looking-at-the-wrong-things-4klo</link>
      <guid>https://dev.to/benard_otieno_cdb9e6d4907/your-observability-is-looking-at-the-wrong-things-4klo</guid>
      <description>&lt;p&gt;I've been in incident calls where every dashboard was green. Latency nominal. Error rate under 0.1%. CPU humming along at a comfortable 40%. And somewhere downstream, a critical workflow had been silently producing wrong results for six hours.&lt;/p&gt;

&lt;p&gt;Nobody had an alert for "the thing is doing something, just not the right thing."&lt;/p&gt;

&lt;p&gt;This is the gap most observability setups never close: they're watching the infrastructure, not the behavior. They'll tell you the system is alive. They won't tell you it's lying.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Dials Everyone Watches
&lt;/h2&gt;

&lt;p&gt;The default observability stack for most teams converges on the same three signals: uptime, latency, and error rate. These show up in every runbook, every SLA, every on-call rotation. They're not useless — a spike in error rate is real signal, a latency cliff is real signal — but they share a critical property: they're all lagging indicators of failure that's already happened.&lt;/p&gt;

&lt;p&gt;More importantly, they only fire when the system is explicitly misbehaving. They say nothing about a system that's doing exactly what you told it to do, but where what you told it to do was wrong.&lt;/p&gt;

&lt;p&gt;I had a recommendation service that returned results within 50ms, with a 0.02% error rate, and near-perfect uptime. It was also returning the same stale set of recommendations to every user because a cache invalidation job had silently stopped running four days earlier. The system was technically flawless. It had completely stopped serving its purpose.&lt;/p&gt;

&lt;p&gt;The dashboard gave it a clean bill of health.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logs Are Not a Narrative
&lt;/h2&gt;

&lt;p&gt;The second failure mode is subtler. Most teams log well, in the sense that they log a lot. Request in. Response out. Exceptions caught and written somewhere. Database queries above a threshold. Auth events.&lt;/p&gt;

&lt;p&gt;What they don't have is a narrative — a way to reconstruct what actually happened during a user's session, a job's execution, a transaction's lifecycle. Individual log lines are breadcrumbs. What you need is the trail.&lt;/p&gt;

&lt;p&gt;The difference shows up immediately when something goes wrong. With breadcrumbs, you spend the first hour of an incident correlating timestamps across three different log streams, mentally assembling a sequence of events that should have been assembled for you. With a trail — structured traces with a shared correlation ID flowing through every service that touched a request — you open one query and see the story.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;contextvars&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ContextVar&lt;/span&gt;

&lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ContextVar&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ContextVar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;correlation_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;traced&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@functools.wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;cid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;correlation_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cid&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;correlation_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;correlation_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;

&lt;span class="c1"&gt;# At the edge — set once, propagate everywhere
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Correlation-ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not complicated. It's not expensive. The reason most teams don't have it is that they added logging incrementally — one print statement at a time — and never stepped back to ask whether the sum of those statements could tell a story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Metrics Without a Baseline Are Just Numbers
&lt;/h2&gt;

&lt;p&gt;Here's a metric: your API is returning responses in 340ms.&lt;/p&gt;

&lt;p&gt;Is that good? Bad? Degraded from yesterday? Normal for this time of week? You cannot answer without a baseline, and most teams don't have one that's precise enough to be useful.&lt;/p&gt;

&lt;p&gt;What typically exists is a static threshold: alert if latency exceeds 500ms. That threshold was set during initial deployment, when load was a tenth of what it is now, and hasn't been revisited since. It's not a baseline — it's a guess that calcified into a rule.&lt;/p&gt;

&lt;p&gt;A real baseline is dynamic. It accounts for time of day, day of week, and recent trend. It flags when you're 30% above your own normal, not when you cross an arbitrary line someone set two years ago.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;deque&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stdev&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AdaptiveBaseline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1440&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# 24h of per-minute samples
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;samples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deque&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maxlen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_anomalous&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold_stdev&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;  &lt;span class="c1"&gt;# not enough data to have an opinion
&lt;/span&gt;        &lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;stdev&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold_stdev&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stdev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;stdev&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Static thresholds are a lazy stand-in for understanding your system's normal. They exist because setting them takes five minutes, and building real baselines takes an afternoon. That tradeoff looks different at 2am when an alert fires on a load pattern that's been there for three weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Belongs in Your Dashboards
&lt;/h2&gt;

&lt;p&gt;The signals that matter fall into a different category than infrastructure health. They're about whether the system is doing its job, measured in terms the business cares about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Throughput on the critical path.&lt;/strong&gt; Not "requests per second" in aggregate — the specific count of the transactions that matter. Orders placed. Reports generated. Messages delivered. If that number is lower than expected, something is wrong, even if all your infra metrics are green.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Queue depth and processing age.&lt;/strong&gt; If you have async workers, the age of the oldest unprocessed item is a more honest health signal than worker CPU. A queue that's growing is a system falling behind, regardless of what the workers themselves are reporting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business-level error rates, not HTTP error rates.&lt;/strong&gt; A 200 response that returns an empty result set is not a success. A job that completes without exception but produces zero output has failed. You need to define success in terms of what the system was supposed to produce, then measure whether it produced it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Derivative metrics.&lt;/strong&gt; If your checkout conversion rate drops from 68% to 51%, that's a signal — even if no individual service is throwing errors. Tracking rates and ratios, not just raw counts, catches the class of failures where something is working but working worse.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Prometheus recording rules — compute these, don't query them live&lt;/span&gt;
&lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;business_health&lt;/span&gt;
    &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;
    &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;job:orders_per_minute:rate&lt;/span&gt;
        &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rate(orders_completed_total[5m]) * &lt;/span&gt;&lt;span class="m"&gt;60&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;job:checkout_conversion:ratio&lt;/span&gt;
        &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;rate(checkouts_completed_total[10m])&lt;/span&gt;
          &lt;span class="s"&gt;/ rate(checkout_initiated_total[10m])&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;record&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;job:queue_age_seconds:max&lt;/span&gt;
        &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;time() - min(job_enqueued_timestamp_seconds)&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;alerts&lt;/span&gt;
    &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConversionRateDrop&lt;/span&gt;
        &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;job:checkout_conversion:ratio &amp;lt; &lt;/span&gt;&lt;span class="m"&gt;0.55&lt;/span&gt;
        &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
        &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
        &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Checkout&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;conversion&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;below&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;55%&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;5+&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;minutes"&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;QueueProcessingStalled&lt;/span&gt;
        &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;job:queue_age_seconds:max &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;300&lt;/span&gt;
        &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
        &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Alerts Should Be Harder to Silence Than to Fix
&lt;/h2&gt;

&lt;p&gt;The last thing most teams get wrong is the incentive structure around noise. When alerts fire too often on non-issues, engineers start ignoring them — or worse, start routing around them. The standard fix is to raise thresholds and add retry logic so the alert doesn't fire. This is treating the symptom. The alert was lying because the metric was wrong, and the right fix is to measure something that's actually meaningful.&lt;/p&gt;

&lt;p&gt;There's a useful rule here: if an alert fired and the on-call engineer's first instinct was to check whether it was a false positive, the alert is already broken. A good alert should produce a specific, directed response — not a "let me see if this is real" investigation. If you find yourself constantly confirming that real alerts are real, your signal-to-noise ratio is telling you something.&lt;/p&gt;

&lt;p&gt;Flaky alerts are the observability equivalent of flaky tests. You know you have them. You've learned to distrust them. And every week they stay in the rotation makes you slightly less responsive to the ones that actually matter.&lt;/p&gt;

&lt;p&gt;Track your alert false-positive rate like you track your error rate. Alert on your alerts. Set a rule that any alert firing more than twice without a corresponding incident review gets flagged for audit. This sounds bureaucratic until the first time you catch that a critical alert has been misfiring for three weeks and nobody noticed because the team had learned to dismiss it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You're Actually Missing
&lt;/h2&gt;

&lt;p&gt;Most observability stacks are built to answer one question: is the system up? That's a fine question. It's just not the most important one.&lt;/p&gt;

&lt;p&gt;The more useful questions are: is the system doing what users need? Is it doing it as well as it was yesterday? Is anything changing that I should know about before it becomes a problem?&lt;/p&gt;

&lt;p&gt;Those questions require measuring at the level of behavior and outcome, not infrastructure and response codes. They require traces that tell a story instead of logs that record events. They require baselines instead of thresholds, and business metrics instead of system metrics.&lt;/p&gt;

&lt;p&gt;None of this is exotic. The tooling exists — OpenTelemetry, Prometheus recording rules, structured logging with correlation IDs. The gap isn't tooling. It's the habit of reaching for the infrastructure dashboard first and calling it observability.&lt;/p&gt;

&lt;p&gt;Start with one question: if your system silently started doing the wrong thing at 3am, how long would it take you to find out? If the answer is "until a user complained," your dashboards are watching the machine, not the work.&lt;/p&gt;

&lt;p&gt;That's the thing worth fixing.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>cloud</category>
      <category>kubernetes</category>
      <category>tooling</category>
    </item>
    <item>
      <title>blog.bennerdo.org</title>
      <dc:creator>Benard Otieno</dc:creator>
      <pubDate>Fri, 08 May 2026 09:09:21 +0000</pubDate>
      <link>https://dev.to/benard_otieno_cdb9e6d4907/blogbennerdoorg-126b</link>
      <guid>https://dev.to/benard_otieno_cdb9e6d4907/blogbennerdoorg-126b</guid>
      <description></description>
    </item>
  </channel>
</rss>
