<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ovaise Qayoom</title>
    <description>The latest articles on DEV Community by Ovaise Qayoom (@ovaiseq).</description>
    <link>https://dev.to/ovaiseq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3847171%2Ff72d805e-a2d8-4bc0-8fc4-736dbfb330ed.png</url>
      <title>DEV Community: Ovaise Qayoom</title>
      <link>https://dev.to/ovaiseq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ovaiseq"/>
    <language>en</language>
    <item>
      <title>From 0 to 10M Requests/Day, Architecting a Boring but Bulletproof Backend</title>
      <dc:creator>Ovaise Qayoom</dc:creator>
      <pubDate>Thu, 02 Apr 2026 16:01:35 +0000</pubDate>
      <link>https://dev.to/ovaiseq/from-0-to-10m-requestsday-architecting-a-boring-but-bulletproof-backend-poh</link>
      <guid>https://dev.to/ovaiseq/from-0-to-10m-requestsday-architecting-a-boring-but-bulletproof-backend-poh</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — You do not need Kafka, Kubernetes, or 17 microservices to handle serious traffic. A well-structured monolith, PostgreSQL, Redis, PgBouncer, a CDN, and horizontal application scaling can take you to 10 million requests per day with less risk, less cost, and far less operational pain.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Most backend scaling guides start at the wrong point.&lt;br&gt;
They show you the architecture &lt;em&gt;after&lt;/em&gt; a company has 200 engineers, multiple platform teams, and years of accumulated infrastructure. Then they present that architecture as if it were the natural starting point.&lt;/p&gt;

&lt;p&gt;It was not.&lt;/p&gt;

&lt;p&gt;Most high-traffic systems begin the same way: one application, one database, one deployment path, one team trying to ship product fast enough to matter. The systems that survive are not the ones that adopt the most technology. They are the ones that adopt the right technology at the right time.&lt;/p&gt;

&lt;p&gt;This is a guide to that path.&lt;br&gt;
Not the hype version.&lt;br&gt;&lt;br&gt;
Not the conference-talk version.&lt;br&gt;&lt;br&gt;
The production version.&lt;/p&gt;


&lt;h2&gt;
  
  
  What This Guide Covers
&lt;/h2&gt;

&lt;p&gt;This post is an opinionated case study plus guide for building a &lt;strong&gt;high-traffic backend&lt;/strong&gt; using a &lt;strong&gt;scalable monolith&lt;/strong&gt; and a small number of carefully chosen supporting components.&lt;/p&gt;

&lt;p&gt;It is optimized for teams that care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shipping fast without building infrastructure theater&lt;/li&gt;
&lt;li&gt;Scaling predictably under real production traffic&lt;/li&gt;
&lt;li&gt;Keeping debugging and deployment simple&lt;/li&gt;
&lt;li&gt;Avoiding premature microservices&lt;/li&gt;
&lt;li&gt;Reaching 10M requests/day without rewriting the whole backend&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Core keywords this guide is built around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;scalable monolith&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;high-traffic backend&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pragmatic architecture&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;production scaling&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;backend architecture&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Node.js scaling&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PostgreSQL performance&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Redis caching&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PgBouncer&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;boring technology&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Why Boring Technology Wins
&lt;/h2&gt;

&lt;p&gt;Boring technology is not boring because it lacks power.&lt;/p&gt;

&lt;p&gt;It is boring because it is &lt;strong&gt;understood&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That matters more than people admit.&lt;/p&gt;

&lt;p&gt;A system that uses PostgreSQL, Redis, Nginx, and a monolith is easier to reason about, easier to observe, easier to debug at 2am, and easier to hire for than a system built from six trendy abstractions nobody fully understands.&lt;/p&gt;

&lt;p&gt;The highest-leverage backend architecture principle is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Prefer the simplest system that survives your current load with margin.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That single rule eliminates most bad architectural decisions.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Traffic Stages
&lt;/h2&gt;

&lt;p&gt;Every backend goes through roughly four traffic stages. The architecture that is reasonable at one stage is often unnecessary or actively harmful at another.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmq32t4xgdgd63nsli98l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmq32t4xgdgd63nsli98l.png" alt="Diagram 1" width="800" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The mistake most teams make is solving Stage 4 problems while still in Stage 1.&lt;/p&gt;

&lt;p&gt;That is how you end up with architecture that looks impressive and ships slowly.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Boring Stack
&lt;/h2&gt;

&lt;p&gt;Here is the baseline stack for a &lt;strong&gt;pragmatic high-traffic backend&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;th&gt;Why it stays&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Runtime&lt;/td&gt;
&lt;td&gt;Node.js + Fastify&lt;/td&gt;
&lt;td&gt;Fast enough, mature, simple developer experience&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary database&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Transactional, reliable, great indexing, JSON support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache&lt;/td&gt;
&lt;td&gt;Redis&lt;/td&gt;
&lt;td&gt;Perfect for read-heavy data, sessions, rate limits, queues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queue&lt;/td&gt;
&lt;td&gt;BullMQ&lt;/td&gt;
&lt;td&gt;Uses Redis, avoids introducing another broker too early&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection pooler&lt;/td&gt;
&lt;td&gt;PgBouncer&lt;/td&gt;
&lt;td&gt;Protects Postgres from connection explosion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reverse proxy&lt;/td&gt;
&lt;td&gt;Nginx&lt;/td&gt;
&lt;td&gt;Stable, battle-tested, high-performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CDN&lt;/td&gt;
&lt;td&gt;Cloudflare&lt;/td&gt;
&lt;td&gt;Offloads traffic before it reaches origin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring&lt;/td&gt;
&lt;td&gt;Prometheus + Grafana&lt;/td&gt;
&lt;td&gt;Standard, simple, effective&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logs&lt;/td&gt;
&lt;td&gt;Structured JSON logs&lt;/td&gt;
&lt;td&gt;Searchable, aggregatable, production-friendly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What is intentionally &lt;em&gt;not&lt;/em&gt; here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kafka by default&lt;/li&gt;
&lt;li&gt;Microservices by default&lt;/li&gt;
&lt;li&gt;Kubernetes by default&lt;/li&gt;
&lt;li&gt;Event-driven everything&lt;/li&gt;
&lt;li&gt;Service mesh&lt;/li&gt;
&lt;li&gt;Distributed transactions&lt;/li&gt;
&lt;li&gt;Premature CQRS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of those are inherently bad. They are just not your starting point.&lt;/p&gt;


&lt;h2&gt;
  
  
  Stage 1: Build a Monolith That Is Easy to Scale
&lt;/h2&gt;

&lt;p&gt;At low traffic, the correct architecture is usually one codebase, one app process group, one database, one deployment pipeline.&lt;/p&gt;

&lt;p&gt;That is not a temporary embarrassment. That is good architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pcs7jxi0tmkdebnfyq3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pcs7jxi0tmkdebnfyq3.png" alt="Diagram" width="800" height="108"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Why a Monolith Is the Right Default
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;scalable monolith&lt;/strong&gt; gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One deployable unit&lt;/li&gt;
&lt;li&gt;One place to debug business logic&lt;/li&gt;
&lt;li&gt;One database transaction boundary&lt;/li&gt;
&lt;li&gt;No network hops between internal features&lt;/li&gt;
&lt;li&gt;No distributed systems complexity&lt;/li&gt;
&lt;li&gt;No cross-service schema drift&lt;/li&gt;
&lt;li&gt;No internal API versioning burden&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this stage, your job is not to create elegant infrastructure. Your job is to create a stable product with clean boundaries inside a single codebase.&lt;/p&gt;

&lt;p&gt;That means modularizing &lt;em&gt;inside&lt;/em&gt; the monolith.&lt;/p&gt;
&lt;h3&gt;
  
  
  Monolith Structure That Ages Well
&lt;/h3&gt;

&lt;p&gt;A good monolith is not a folder full of chaos. It has domain boundaries even though it deploys as one service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/
  modules/
    auth/
      auth.routes.js
      auth.service.js
      auth.repo.js
    billing/
      billing.routes.js
      billing.service.js
      billing.repo.js
    users/
      users.routes.js
      users.service.js
      users.repo.js
  lib/
    db.js
    cache.js
    queue.js
    logger.js
  app.js
  server.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you the operational simplicity of a monolith and the code organization of a more mature system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start with Pooling Immediately
&lt;/h3&gt;

&lt;p&gt;Even at low traffic, do not connect to PostgreSQL casually from every request path. Use pooling from the start.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Fastify&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fastify&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;postgres&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@fastify/postgres&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Fastify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;postgres&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;connectionString&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;min&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;idleTimeoutMillis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;connectionTimeoutMillis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is not that you need massive pooling on day one.&lt;/p&gt;

&lt;p&gt;The point is that you build the habit before traffic arrives.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 1 Database Design: Decisions That Matter Later
&lt;/h2&gt;

&lt;p&gt;Most scaling pain is not caused by traffic alone. It is caused by data model decisions that looked harmless when traffic was small.&lt;/p&gt;

&lt;h3&gt;
  
  
  Index Foreign Keys Immediately
&lt;/h3&gt;

&lt;p&gt;This is one of the most common missing pieces in production systems.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;CASCADE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;slug&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_posts_user_id&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_posts_created_at&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_posts_published&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'published'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That partial index on &lt;code&gt;status = 'published'&lt;/code&gt; matters because it reflects the access pattern you are likely to have in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Design for Query Shapes, Not Just Entities
&lt;/h3&gt;

&lt;p&gt;A schema is not just a representation of objects. It is a representation of future queries.&lt;/p&gt;

&lt;p&gt;Ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What will be filtered often?&lt;/li&gt;
&lt;li&gt;What will be sorted often?&lt;/li&gt;
&lt;li&gt;What joins will appear on critical paths?&lt;/li&gt;
&lt;li&gt;What should be cached?&lt;/li&gt;
&lt;li&gt;What can remain eventually consistent?&lt;/li&gt;
&lt;li&gt;What must stay transactional?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That level of thinking matters more than fashionable architecture diagrams.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 2: Remove the Easy Bottlenecks
&lt;/h2&gt;

&lt;p&gt;Once traffic starts becoming meaningful, most problems are still boring.&lt;/p&gt;

&lt;p&gt;They usually fall into one of these buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repeated identical reads hitting the database&lt;/li&gt;
&lt;li&gt;Too many application connections reaching Postgres&lt;/li&gt;
&lt;li&gt;Missing indexes&lt;/li&gt;
&lt;li&gt;Static assets reaching the app server unnecessarily&lt;/li&gt;
&lt;li&gt;Expensive synchronous operations done inside requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where good &lt;strong&gt;production scaling&lt;/strong&gt; begins.&lt;/p&gt;




&lt;h2&gt;
  
  
  Add Redis the Right Way
&lt;/h2&gt;

&lt;p&gt;Redis should enter the architecture as a targeted optimization, not as a random dependency.&lt;/p&gt;

&lt;p&gt;Three strong uses at this stage:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Caching&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rate limiting&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Session storage / temporary state&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Cache-Aside Pattern
&lt;/h3&gt;

&lt;p&gt;The most reliable caching pattern for application data is still cache-aside.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Cache&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;del&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;del&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;getOrFetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fetchFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fresh&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchFn&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fresh&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;fresh&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fresh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fresh&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/posts/:slug&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`post:slug:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;post&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOrFetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT * FROM posts WHERE slug = $1 AND status = $2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;published&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="mi"&gt;600&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Not found&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cache What Is Expensive and Stable
&lt;/h3&gt;

&lt;p&gt;Good cache candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product details&lt;/li&gt;
&lt;li&gt;Blog posts&lt;/li&gt;
&lt;li&gt;User profile summaries&lt;/li&gt;
&lt;li&gt;Public pricing data&lt;/li&gt;
&lt;li&gt;Aggregated dashboard numbers&lt;/li&gt;
&lt;li&gt;Settings that rarely change&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad cache candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Highly volatile counters without clear invalidation&lt;/li&gt;
&lt;li&gt;Permission-sensitive data unless keyed safely&lt;/li&gt;
&lt;li&gt;Write-heavy rows with constant mutation&lt;/li&gt;
&lt;li&gt;Anything you cannot invalidate confidently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Caching is not magic. It is a trade: memory and invalidation complexity in exchange for lower latency and lower database load.&lt;/p&gt;




&lt;h2&gt;
  
  
  Protect PostgreSQL with PgBouncer
&lt;/h2&gt;

&lt;p&gt;At moderate traffic, PostgreSQL often stops being limited by raw query power and starts being limited by connection handling.&lt;/p&gt;

&lt;p&gt;That is what PgBouncer solves.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flxh3fw64mvwh87o54fu3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flxh3fw64mvwh87o54fu3.png" alt="Diagram-" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why PgBouncer Matters
&lt;/h3&gt;

&lt;p&gt;Your app may create many logical connections.&lt;/p&gt;

&lt;p&gt;Postgres should not have to maintain that many physical ones.&lt;/p&gt;

&lt;p&gt;PgBouncer lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smooth connection spikes&lt;/li&gt;
&lt;li&gt;Keep Postgres stable&lt;/li&gt;
&lt;li&gt;Scale app instances without linearly scaling DB connections&lt;/li&gt;
&lt;li&gt;Reduce memory pressure at the database layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Critical config principle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;pool_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;transaction&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one setting is often the difference between a useful PgBouncer deployment and a misleading one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 3: Horizontal Scale Without Losing Simplicity
&lt;/h2&gt;

&lt;p&gt;Now the backend starts looking more serious.&lt;/p&gt;

&lt;p&gt;Not because you adopted microservices.&lt;br&gt;&lt;br&gt;
Because you removed single points of failure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3irwv3tnh5l74msvsa3w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3irwv3tnh5l74msvsa3w.png" alt="Diagram 2" width="800" height="623"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is still a boring architecture.&lt;/p&gt;

&lt;p&gt;It is also enough for a &lt;strong&gt;high-traffic backend&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Add Read Replicas When Read Load Dominates
&lt;/h2&gt;

&lt;p&gt;The right time to add Postgres read replicas is when your write volume is manageable but reads are crowding the primary.&lt;/p&gt;

&lt;p&gt;That is a common pattern for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SaaS dashboards&lt;/li&gt;
&lt;li&gt;CMS-backed sites&lt;/li&gt;
&lt;li&gt;APIs with heavy lookup traffic&lt;/li&gt;
&lt;li&gt;B2B platforms with read-heavy admin views&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Split Reads and Writes Explicitly
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;pg&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pg&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;writePool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;pg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;connectionString&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DATABASE_PRIMARY_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;readPool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;pg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;connectionString&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DATABASE_REPLICA_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;writePool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;

  &lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;readPool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is deliberately explicit.&lt;/p&gt;

&lt;p&gt;You want engineers to &lt;em&gt;know&lt;/em&gt; when they are reading from a replica versus writing to a primary.&lt;/p&gt;
&lt;h3&gt;
  
  
  Understand Replica Lag
&lt;/h3&gt;

&lt;p&gt;Read replicas are not free throughput.&lt;/p&gt;

&lt;p&gt;They introduce a real tradeoff: &lt;strong&gt;replica lag&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That means a request can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write to primary&lt;/li&gt;
&lt;li&gt;Immediately read from replica&lt;/li&gt;
&lt;li&gt;Not see its own write yet&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So do not route consistency-sensitive reads blindly to replicas.&lt;/p&gt;

&lt;p&gt;Examples of reads that should still hit primary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Immediately after creating a resource&lt;/li&gt;
&lt;li&gt;Checkout confirmation flows&lt;/li&gt;
&lt;li&gt;Billing updates&lt;/li&gt;
&lt;li&gt;Permission changes&lt;/li&gt;
&lt;li&gt;Authentication-adjacent state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where many scaling guides get too hand-wavy. Replica lag is not theoretical. You must design for it.&lt;/p&gt;


&lt;h2&gt;
  
  
  Add a CDN Earlier Than Most Teams Do
&lt;/h2&gt;

&lt;p&gt;A CDN is not just for images.&lt;/p&gt;

&lt;p&gt;It is a traffic absorber.&lt;/p&gt;

&lt;p&gt;A CDN should sit in front of your system as early as possible because it gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lower latency globally&lt;/li&gt;
&lt;li&gt;Reduced origin load&lt;/li&gt;
&lt;li&gt;Edge caching for static assets&lt;/li&gt;
&lt;li&gt;Basic DDoS mitigation&lt;/li&gt;
&lt;li&gt;TLS termination&lt;/li&gt;
&lt;li&gt;Better burst handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloudflare alone can eliminate a surprising amount of backend work before the request ever reaches your application.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Should Be Cached at the Edge
&lt;/h3&gt;

&lt;p&gt;Good CDN candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JS, CSS, fonts, images&lt;/li&gt;
&lt;li&gt;Static marketing pages&lt;/li&gt;
&lt;li&gt;Public docs pages&lt;/li&gt;
&lt;li&gt;Public blog content with short revalidation windows&lt;/li&gt;
&lt;li&gt;Some anonymous API responses if safe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad CDN candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authenticated user dashboards&lt;/li&gt;
&lt;li&gt;Personalized responses&lt;/li&gt;
&lt;li&gt;Permission-sensitive resources&lt;/li&gt;
&lt;li&gt;Anything with ambiguous cache headers&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Add Rate Limiting Before You Think You Need It
&lt;/h2&gt;

&lt;p&gt;A production system gets stressed not only by success but by abuse, bugs, retries, crawlers, scripts, and bursty clients.&lt;/p&gt;

&lt;p&gt;Rate limiting is not just security. It is stability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;RateLimiterRedis&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rate-limiter-flexible&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RateLimiterRedis&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;storeClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;keyPrefix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rate_limit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;points&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;blockDuration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;consume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Too many requests&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crucial point here is that the limiter state lives in Redis, not memory.&lt;/p&gt;

&lt;p&gt;If the limiter is in memory per instance, it becomes inconsistent behind a load balancer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 4: Make the System Degrade Gracefully
&lt;/h2&gt;

&lt;p&gt;At 1M to 10M requests/day, you stop thinking only about scale and start thinking about &lt;strong&gt;failure shape&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The question changes from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Can the system handle this?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How does the system behave when it cannot?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a more mature question.&lt;/p&gt;




&lt;h2&gt;
  
  
  Move Non-Critical Work Out of the Request Path
&lt;/h2&gt;

&lt;p&gt;Anything not required to complete the user-visible response should leave the request path.&lt;/p&gt;

&lt;p&gt;That includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Emails&lt;/li&gt;
&lt;li&gt;Webhook delivery&lt;/li&gt;
&lt;li&gt;Image processing&lt;/li&gt;
&lt;li&gt;Video transcoding&lt;/li&gt;
&lt;li&gt;Search indexing&lt;/li&gt;
&lt;li&gt;Report generation&lt;/li&gt;
&lt;li&gt;Analytics fanout&lt;/li&gt;
&lt;li&gt;Notification dispatch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where a Redis-backed queue like BullMQ is exactly right.&lt;/p&gt;

&lt;h3&gt;
  
  
  Background Jobs with BullMQ
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Worker&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bullmq&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;REDIS_HOST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6379&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;emailQueue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;defaultJobOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;backoff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;exponential&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;removeOnComplete&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;removeOnFail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;emailWorker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/users/:id/welcome-email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT id, email, name FROM users WHERE id = $1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;User not found&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;emailQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;welcome&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Welcome aboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;welcome&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;202&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Email queued&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a major step in &lt;strong&gt;backend architecture&lt;/strong&gt; maturity.&lt;/p&gt;

&lt;p&gt;Not because queues are fashionable. Because they protect request latency and isolate failure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build a Caching Hierarchy
&lt;/h2&gt;

&lt;p&gt;At 10M requests/day, caching must exist at more than one layer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhumecymgttrd6wgd2vj.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhumecymgttrd6wgd2vj.webp" alt="Diagram 3" width="800" height="35"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each layer exists for a different reason:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CDN&lt;/strong&gt; handles global traffic and static/public content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nginx&lt;/strong&gt; can absorb repeated identical upstream requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis&lt;/strong&gt; handles application-level object caching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read replicas&lt;/strong&gt; absorb remaining read pressure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Primary&lt;/strong&gt; is reserved for writes and consistency-sensitive reads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If everything reaches the primary database, the rest of your scaling story is mostly fiction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability: The Part Everyone Delays Too Long
&lt;/h2&gt;

&lt;p&gt;A backend is not scalable because it can survive load once in a benchmark.&lt;/p&gt;

&lt;p&gt;It is scalable when you can understand what is happening under production load quickly enough to act.&lt;/p&gt;

&lt;p&gt;That means metrics, logs, health checks, and dashboards.&lt;/p&gt;

&lt;h3&gt;
  
  
  Minimum Metrics You Need
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Request rate&lt;/td&gt;
&lt;td&gt;Traffic shape and load changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p95 latency&lt;/td&gt;
&lt;td&gt;Real user experience under load&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error rate&lt;/td&gt;
&lt;td&gt;Detects systemic failure quickly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DB query duration&lt;/td&gt;
&lt;td&gt;Tells you when DB is the bottleneck&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache hit rate&lt;/td&gt;
&lt;td&gt;Shows whether Redis is doing useful work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queue depth&lt;/td&gt;
&lt;td&gt;Reveals background work backlog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU / memory&lt;/td&gt;
&lt;td&gt;Capacity planning and saturation signals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replica lag&lt;/td&gt;
&lt;td&gt;Prevents stale-read surprises&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Example Prometheus Metrics
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;prom-client&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collectDefaultMetrics&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;httpDuration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http_request_duration_seconds&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;help&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;HTTP request duration in seconds&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;labelNames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;method&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;route&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;status_code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dbQueryDuration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;db_query_duration_seconds&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;help&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Database query duration in seconds&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;labelNames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;operation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;table&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Practical Thresholds
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Warning&lt;/th&gt;
&lt;th&gt;Critical&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;p95 request latency&lt;/td&gt;
&lt;td&gt;&amp;gt; 200ms&lt;/td&gt;
&lt;td&gt;&amp;gt; 500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p95 DB query latency&lt;/td&gt;
&lt;td&gt;&amp;gt; 50ms&lt;/td&gt;
&lt;td&gt;&amp;gt; 200ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache hit rate&lt;/td&gt;
&lt;td&gt;&amp;lt; 85%&lt;/td&gt;
&lt;td&gt;&amp;lt; 70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5xx error rate&lt;/td&gt;
&lt;td&gt;&amp;gt; 0.5%&lt;/td&gt;
&lt;td&gt;&amp;gt; 2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queue backlog growth&lt;/td&gt;
&lt;td&gt;Sustained 5 min&lt;/td&gt;
&lt;td&gt;Sustained 15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you do not know these numbers, you do not yet know whether your backend is healthy.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Not to Use Microservices
&lt;/h2&gt;

&lt;p&gt;This section matters because too many teams still treat microservices as a rite of passage.&lt;/p&gt;

&lt;p&gt;They are not.&lt;/p&gt;

&lt;p&gt;Microservices are a tax. Sometimes a necessary one, often an unnecessary one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do Not Break the Monolith If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;One team can still understand the codebase&lt;/li&gt;
&lt;li&gt;Deployments are still manageable&lt;/li&gt;
&lt;li&gt;The database is not your deployment bottleneck&lt;/li&gt;
&lt;li&gt;Most modules do not need independent scaling&lt;/li&gt;
&lt;li&gt;Internal network calls would replace in-process calls for no clear gain&lt;/li&gt;
&lt;li&gt;Your operational maturity is still limited&lt;/li&gt;
&lt;li&gt;Your debugging pipeline is still weak&lt;/li&gt;
&lt;li&gt;You do not have platform engineers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Consider Extracting a Service Only If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;One subsystem has drastically different scaling needs&lt;/li&gt;
&lt;li&gt;One subsystem needs a different runtime or storage model&lt;/li&gt;
&lt;li&gt;Deployment coupling is causing continuous friction&lt;/li&gt;
&lt;li&gt;Team boundaries are stable and long-lived&lt;/li&gt;
&lt;li&gt;You already have strong observability and operational discipline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffa7c3dy5l0gs7lm6qamf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffa7c3dy5l0gs7lm6qamf.png" alt="Diagram 4" width="800" height="966"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The goal is not ideological purity.&lt;/p&gt;

&lt;p&gt;The goal is operational sanity.&lt;/p&gt;




&lt;h2&gt;
  
  
  What 10M Requests/Day Actually Means
&lt;/h2&gt;

&lt;p&gt;This number sounds dramatic, but it helps to reduce it.&lt;/p&gt;

&lt;p&gt;10M requests/day is about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;115 requests per second on average&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Often &lt;strong&gt;300–600 req/s at peak&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Higher during bursts, launches, crawls, or regional concentration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is large enough to be real.&lt;/p&gt;

&lt;p&gt;It is also absolutely within reach of a well-built monolith plus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;horizontal app scaling&lt;/li&gt;
&lt;li&gt;Redis caching&lt;/li&gt;
&lt;li&gt;PgBouncer&lt;/li&gt;
&lt;li&gt;read replicas&lt;/li&gt;
&lt;li&gt;CDN offload&lt;/li&gt;
&lt;li&gt;async job processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The scary part is not the request count.&lt;/p&gt;

&lt;p&gt;The scary part is waste.&lt;/p&gt;

&lt;p&gt;A bad query, missing index, unbounded N+1 pattern, or synchronous email send can collapse a backend long before traffic itself becomes the true problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production Checklist
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Database
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] PostgreSQL primary configured correctly&lt;/li&gt;
&lt;li&gt;[ ] Read replicas added when read traffic justifies them&lt;/li&gt;
&lt;li&gt;[ ] PgBouncer deployed in transaction mode&lt;/li&gt;
&lt;li&gt;[ ] All foreign keys indexed&lt;/li&gt;
&lt;li&gt;[ ] Slow query logging enabled&lt;/li&gt;
&lt;li&gt;[ ] Backup and restore tested, not just configured&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;pg_stat_statements&lt;/code&gt; enabled&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Caching
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Redis in place for cache, sessions, or rate limiting&lt;/li&gt;
&lt;li&gt;[ ] Cache key naming convention documented&lt;/li&gt;
&lt;li&gt;[ ] Invalidation exists on all write paths&lt;/li&gt;
&lt;li&gt;[ ] Cache hit ratio monitored&lt;/li&gt;
&lt;li&gt;[ ] No sensitive cross-user cache leakage&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  App Layer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Health checks at &lt;code&gt;/health&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Graceful shutdown implemented&lt;/li&gt;
&lt;li&gt;[ ] Request timeouts enforced&lt;/li&gt;
&lt;li&gt;[ ] Rate limiting in Redis, not memory&lt;/li&gt;
&lt;li&gt;[ ] Background tasks moved off request path&lt;/li&gt;
&lt;li&gt;[ ] Structured logs everywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Infrastructure
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Nginx or equivalent reverse proxy configured&lt;/li&gt;
&lt;li&gt;[ ] CDN in front of public traffic&lt;/li&gt;
&lt;li&gt;[ ] HTTPS enforced&lt;/li&gt;
&lt;li&gt;[ ] Compression enabled&lt;/li&gt;
&lt;li&gt;[ ] Static assets cached aggressively&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Prometheus metrics exposed&lt;/li&gt;
&lt;li&gt;[ ] Grafana dashboards for latency, errors, cache hit rate, queue depth&lt;/li&gt;
&lt;li&gt;[ ] Alerts configured before incidents happen&lt;/li&gt;
&lt;li&gt;[ ] Logs searchable centrally&lt;/li&gt;
&lt;li&gt;[ ] Replica lag visible if using replicas&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Real Scaling Mindset
&lt;/h2&gt;

&lt;p&gt;The most useful scaling principle is not “build for hyperscale.”&lt;/p&gt;

&lt;p&gt;It is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Remove the current bottleneck without introducing unnecessary permanent complexity.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is how serious systems are actually built.&lt;/p&gt;

&lt;p&gt;A good backend architecture is not the one with the most boxes in the diagram.&lt;/p&gt;

&lt;p&gt;It is the one that stays understandable while traffic grows.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;pragmatic architecture&lt;/strong&gt; wins because it keeps your team fast, your failure modes legible, and your operating costs reasonable. A &lt;strong&gt;scalable monolith&lt;/strong&gt; wins because most systems need better boundaries, better caching, and better query discipline long before they need service decomposition.&lt;/p&gt;

&lt;p&gt;If you want to reach 10M requests/day, do not start by asking whether you need Kafka.&lt;/p&gt;

&lt;p&gt;Start by asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are my queries indexed?&lt;/li&gt;
&lt;li&gt;Is my cache hit rate healthy?&lt;/li&gt;
&lt;li&gt;Am I blocking requests on background work?&lt;/li&gt;
&lt;li&gt;Can Postgres survive connection spikes?&lt;/li&gt;
&lt;li&gt;Can my system degrade gracefully under load?&lt;/li&gt;
&lt;li&gt;Can I explain every major component in one sentence?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer to those questions is yes, you are much closer than you think.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Real scale rarely demands flashy architecture first. It demands disciplined engineering first.&lt;/em&gt;&lt;/p&gt;







&lt;p&gt;If this post gave you even one useful insight, that’s a win.&lt;/p&gt;

&lt;p&gt;I focus on writing practical, no-BS content that actually helps you build better — not just consume more.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://www.kallis.in/blog" rel="noopener noreferrer"&gt;Read more blogs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to reach out or discuss anything:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://www.kallis.in/#contact" rel="noopener noreferrer"&gt;Contact me&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're stuck in any project or need help, feel free to connect — I actually respond.&lt;/p&gt;




&lt;p&gt;Follow for more real-world dev + design content 🚀&lt;/p&gt;

</description>
      <category>backend</category>
      <category>architecture</category>
      <category>node</category>
      <category>ai</category>
    </item>
    <item>
      <title>Why One Extra Network Hop Silently Breaks Your Latency Budget in Production</title>
      <dc:creator>Ovaise Qayoom</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:45:55 +0000</pubDate>
      <link>https://dev.to/ovaiseq/why-one-extra-network-hop-silently-breaks-your-latency-budget-in-production-19ck</link>
      <guid>https://dev.to/ovaiseq/why-one-extra-network-hop-silently-breaks-your-latency-budget-in-production-19ck</guid>
      <description>

&lt;h2&gt;
  
  
  Your Latency Budget Is Lying: The Real Cost of a Single Extra Network Hop
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;That one "harmless" extra service call is quietly burning your p99. Here's the math, the failure modes, and how to fix it.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;You shipped a feature. Everything looked fine in staging. The integration tests passed. The average response time in production is &lt;strong&gt;120ms&lt;/strong&gt; — well within the &lt;strong&gt;200ms&lt;/strong&gt; target your team agreed on six months ago.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Then someone checks the &lt;strong&gt;p99&lt;/strong&gt;.&lt;br&gt;
It's &lt;strong&gt;780ms&lt;/strong&gt;.&lt;br&gt;
The dashboards look fine at a glance, users aren't screaming yet, but something is clearly wrong. You start digging. You find that three weeks ago, someone added a call to a new internal service — a feature flag resolver, a permission check, a logging sidecar flush — and nobody thought much of it. "It only adds about 5ms," they said.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And they were right, at the median. But at the tail? It quietly murdered your latency budget.&lt;/p&gt;

&lt;p&gt;This is the story of how that happens, why it's almost always invisible until it isn't, and what you can actually do about it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;A single network hop looks trivial in isolation. In a distributed system, it's never just one hop.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;First, What Even Is a Latency Budget?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A latency budget is a constraint. It's the total time you have available to fulfill a request end-to-end — from the client sending the first byte to the client receiving the last byte — before the experience degrades.&lt;/p&gt;

&lt;p&gt;Your product team says "the page must load in under 200ms." That 200ms is the budget. Now you have to allocate it across every layer of your stack.&lt;/p&gt;

&lt;p&gt;A typical allocation for a server-rendered web request might look like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Allocated Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DNS resolution (cached)&lt;/td&gt;
&lt;td&gt;~1ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TCP + TLS handshake (cached)&lt;/td&gt;
&lt;td&gt;~5ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network transit (round trip)&lt;/td&gt;
&lt;td&gt;~20ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load balancer + reverse proxy&lt;/td&gt;
&lt;td&gt;~3ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Application logic&lt;/td&gt;
&lt;td&gt;~80ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database query&lt;/td&gt;
&lt;td&gt;~40ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response serialization&lt;/td&gt;
&lt;td&gt;~10ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network return&lt;/td&gt;
&lt;td&gt;~20ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~179ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That gives you roughly 21ms of buffer. Sounds reasonable. But notice that this model assumes &lt;strong&gt;one&lt;/strong&gt; path through your system. In reality, modern distributed systems don't have one path. They have a graph of paths, and each path has its own tail behavior.&lt;/p&gt;

&lt;p&gt;The moment you add one more synchronous network hop — another service call, a proxy that wasn't there before, a new sidecar — you don't just add the median latency of that hop. You add its entire latency distribution. Including its p99. Including its occasional 2-second timeout spike. And those distributions don't add linearly.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Math They Don't Put in Your Architecture Diagram&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let's be precise about this, because it's the core of everything.&lt;/p&gt;

&lt;p&gt;If you make a single call to a service with the following latency distribution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p50: 5ms&lt;/li&gt;
&lt;li&gt;p95: 20ms&lt;/li&gt;
&lt;li&gt;p99: 80ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...then at the &lt;strong&gt;50th percentile&lt;/strong&gt;, your caller sees 5ms. Fine.&lt;/p&gt;

&lt;p&gt;But now suppose you're calling &lt;strong&gt;five&lt;/strong&gt; services in series. Even if every one of them has the same "5ms median" profile:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The compound tail problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If each service independently has a 1% chance of hitting 80ms, then the probability that &lt;em&gt;at least one of them&lt;/em&gt; hits 80ms in a single request is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;P(at least one slow) = 1 - P(all fast)
                     = 1 - (0.99)^5
                     = 1 - 0.951
                     = 4.9%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So your compound p95 is now being shaped by the &lt;em&gt;slowest&lt;/em&gt; of five services, not the average. What was a 1-in-100 event for each service individually becomes a nearly 1-in-20 event for the composite request.&lt;/p&gt;

&lt;p&gt;Add ten services and the math gets grimmer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;P(at least one slow) = 1 - (0.99)^10 = 9.6%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your p99 just became your p90. In production, at scale, that's thousands of requests per minute hitting the tail.&lt;/p&gt;

&lt;p&gt;This is the phenomenon described in the classic Google paper "The Tail at Scale" — and it's been reproduced in real systems countless times since.  &lt;a href="https://research.google/pubs/the-tail-at-scale/" rel="noopener noreferrer"&gt;research&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What Actually Happens Inside a Single Extra Hop&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When you add a synchronous call to another service, here's what actually happens on the wire — most of which is invisible in your flame graphs if you're not looking:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. TCP Connection Overhead&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If the connection isn't kept alive (common in naive HTTP/1.1 setups or misconfigured HTTP/2), every call involves a TCP handshake: ~1 RTT. At a typical inter-datacenter latency of 1–5ms, that's 1–5ms before you've sent a single byte of your request.&lt;/p&gt;

&lt;p&gt;Connection pooling eliminates most of this, but only if you've set it up correctly and your pool isn't exhausted under load.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. TLS Negotiation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If the service-to-service call is over HTTPS (as it should be in a zero-trust setup), TLS adds latency. A full TLS 1.3 handshake with a session resumption costs roughly 0.5–2ms. Without session resumption, it's a full 1–2 RTT.&lt;/p&gt;

&lt;p&gt;In a service mesh like Istio with mutual TLS (mTLS), every single pod-to-pod call goes through TLS — it's automatic and transparent, which is great for security and brutal for people who thought "service mesh is free."  &lt;a href="https://foci.uw.edu/papers/socc23-meshinsight.pdf" rel="noopener noreferrer"&gt;foci.uw&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Benchmarks of Istio with Envoy sidecars have shown consistent per-hop overhead of &lt;strong&gt;1–5ms added latency&lt;/strong&gt; at the median, with p99 overheads stretching into tens of milliseconds under load, depending on payload size and connection concurrency.  &lt;a href="https://oneuptime.com/blog/post/2026-01-24-service-mesh-overhead/view" rel="noopener noreferrer"&gt;oneuptime&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Serialization and Deserialization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Your service sends a request body. JSON, Protobuf, MessagePack — doesn't matter, it costs something. JSON serialization of a medium-complexity object (10–20 fields, some nested) in Node.js or Go costs roughly 0.05–0.5ms. Across many hops at high concurrency, this adds up. More importantly, large payloads increase memory allocation, which can trigger GC pauses — and GC pauses are essentially uncapped.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Queueing at the Receiving End&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Even if the downstream service is fast on average, under real traffic it's doing other things. Goroutines are scheduled. Thread pools have limits. Connection queues fill up. The incoming request waits.&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;queueing&lt;/strong&gt; component of latency — often the largest and most volatile contributor to tail latency — and it's completely invisible to the caller. Your request could sit in a queue for 0ms at 10 RPS and 200ms at 1000 RPS, and your p50 will look fine the whole time while your p99 is on fire.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. The Return Trip&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;All of the above applies symmetrically on the way back: serialization of the response, TCP acknowledgment, return network latency. A "fast" synchronous RPC call to an internal service that "only" takes 3ms median has already consumed 3ms of your budget before your code has done anything with the result.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Visualizing the Compounding Effect&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let's walk through a concrete example.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The scenario: an e-commerce checkout endpoint&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Your &lt;code&gt;/checkout&lt;/code&gt; endpoint has a 200ms latency budget. Here's the architecture three months ago vs. today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkm1g5k9dbv196al0ra9q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkm1g5k9dbv196al0ra9q.png" alt="Diagram 1 — Before" width="800" height="65"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Measured latency breakdown:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network + gateway: 5ms&lt;/li&gt;
&lt;li&gt;Checkout service logic: 30ms&lt;/li&gt;
&lt;li&gt;DB query (indexed): 25ms&lt;/li&gt;
&lt;li&gt;Response serialization + return: 10ms&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total p50: ~70ms. p99: ~130ms. Budget remaining: ~70ms.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After (four new hops added over three months):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faks1segrd3ad8cjztd8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faks1segrd3ad8cjztd8s.png" alt="Diagram 2 — After " width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let's reconstruct the budget:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hop&lt;/th&gt;
&lt;th&gt;p50&lt;/th&gt;
&lt;th&gt;p99&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Network + gateway&lt;/td&gt;
&lt;td&gt;5ms&lt;/td&gt;
&lt;td&gt;10ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth service call&lt;/td&gt;
&lt;td&gt;8ms&lt;/td&gt;
&lt;td&gt;60ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feature flag service&lt;/td&gt;
&lt;td&gt;4ms&lt;/td&gt;
&lt;td&gt;40ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Checkout logic&lt;/td&gt;
&lt;td&gt;30ms&lt;/td&gt;
&lt;td&gt;55ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DB query&lt;/td&gt;
&lt;td&gt;25ms&lt;/td&gt;
&lt;td&gt;70ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inventory service call&lt;/td&gt;
&lt;td&gt;10ms&lt;/td&gt;
&lt;td&gt;90ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing service call&lt;/td&gt;
&lt;td&gt;12ms&lt;/td&gt;
&lt;td&gt;85ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Return + serialization&lt;/td&gt;
&lt;td&gt;10ms&lt;/td&gt;
&lt;td&gt;20ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~104ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~430ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The p50 looks fine. Still well under 200ms. But the p99 has blown past the budget more than twice over — and the team didn't notice because their alerting was on average response time.&lt;/p&gt;

&lt;p&gt;This is an extremely common pattern. It's how systems that "feel fast" break under scrutiny.  &lt;a href="https://www.systemoverflow.com/learn/design-fundamentals/communication-patterns/latency-budgets-and-tail-amplification-in-multi-hop-synchronous-chains" rel="noopener noreferrer"&gt;systemoverflow&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Every hop through your data center carries overhead that compounds across the request chain.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Tail Latency: The Number That Actually Matters for Users&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most teams instrument p50. Some instrument p95. Very few actually act on p99. This is a mistake.&lt;/p&gt;

&lt;p&gt;The p99 is the latency that 1 in 100 of your users experiences. At 100 requests per second, that's 1 user every second hitting a degraded experience. At 10,000 requests per second, it's 100 users per second.&lt;/p&gt;

&lt;p&gt;More critically: the p99 of your &lt;em&gt;composite&lt;/em&gt; service is almost always &lt;strong&gt;dominated by the worst single component&lt;/strong&gt; in your call chain. If you have ten services and one of them has an occasionally misbehaving garbage collector, that GC pause becomes &lt;em&gt;your&lt;/em&gt; p99 — even if the other nine services are perfectly tuned.&lt;/p&gt;

&lt;p&gt;Here's a simulation in Go that demonstrates the compound distribution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"math/rand"&lt;/span&gt;
    &lt;span class="s"&gt;"sort"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// simulateHopLatency returns a latency in ms for a single service hop.&lt;/span&gt;
&lt;span class="c"&gt;// Models a bimodal distribution: usually fast, occasionally slow.&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;simulateHopLatency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rng&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;rand&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Rand&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Float64&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;0.99&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Fast path: normally distributed around 5ms&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="m"&gt;5.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NormFloat64&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;1.5&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c"&gt;// Slow path: GC pause, queue buildup, etc.&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="m"&gt;5.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="m"&gt;60.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NormFloat64&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;10.0&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sorted&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="m"&gt;100.0&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;rng&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;rand&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rand&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnixNano&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="n"&gt;samples&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;numHops&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;numHops&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;numHops&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0.0&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;numHops&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;simulateHopLatency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Float64s&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Hops: %d | p50: %.1fms | p95: %.1fms | p99: %.1fms&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;numHops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;95&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;99&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running this produces roughly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hops: 1 | p50: 5.0ms  | p95: 8.1ms  | p99: 64.3ms
Hops: 2 | p50: 10.0ms | p95: 69.2ms | p99: 124.8ms
Hops: 3 | p50: 15.0ms | p95: 128.4ms| p99: 185.0ms
Hops: 4 | p50: 20.0ms | p95: 134.1ms| p99: 246.1ms
Hops: 5 | p50: 25.1ms | p95: 193.8ms| p99: 317.2ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what happened: from 1 hop to 2 hops, the p95 jumped from 8ms to 69ms. Not because the services got slower — because the &lt;em&gt;probability of hitting at least one slow response&lt;/em&gt; nearly doubled. This is tail amplification, and it's the reason p50 monitoring is effectively useless for latency budget tracking.  &lt;a href="https://aerospike.com/blog/what-is-p99-latency/" rel="noopener noreferrer"&gt;aerospike&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Invisible Hops You Forget to Count&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Here's the thing: engineers are usually aware of the obvious hops — the service calls they wrote. What they miss are the silent ones:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Service mesh sidecars.&lt;/strong&gt; In Istio or Linkerd, every outbound and inbound request passes through an Envoy/Linkerd proxy. That's two extra network hops per RPC call. The proxy has its own CPU overhead, memory allocation, and queue. At high RPS, this isn't free. Benchmarks show Istio adding 1–5ms to median latency, with meaningfully worse tail behavior under load.  &lt;a href="https://foci.uw.edu/papers/socc23-meshinsight.pdf" rel="noopener noreferrer"&gt;foci.uw&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Feature flag SDKs calling home.&lt;/strong&gt; Some feature flag systems are backed by an SDK that does a remote HTTP call to resolve flags per request. If your flag SDK is calling out to a remote service on every checkout request, that's a hop you probably forgot to count. It's especially painful because flag evaluation &lt;em&gt;feels&lt;/em&gt; like it should be pure local logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auth middleware calling an external service.&lt;/strong&gt; JWT validation is local and fast. But if your auth middleware is calling a user service or an OAuth introspection endpoint to validate tokens &lt;em&gt;per request&lt;/em&gt;, you've added a hop that's invisible in your app code but very visible in your latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Centralized rate limiters.&lt;/strong&gt; Redis-backed rate limiters are common and reasonable. But a call to Redis over the network on every request adds 0.5–3ms depending on co-location, even when it's just a INCR. At high traffic, Redis also becomes a hot node, and its tail latency degrades.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distributed tracing agents.&lt;/strong&gt; Most tracing SDKs are async and non-blocking. Some aren't, or have internal queues that fill up under load and start blocking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Load balancers in front of load balancers.&lt;/strong&gt; Cloud-managed load balancers in front of ingress controllers in front of service mesh proxies in front of your app. That's three layers before your code runs.&lt;/p&gt;

&lt;p&gt;None of these hops appear in your architecture diagram. All of them show up in your flame graphs.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Queueing Theory, Very Briefly&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You don't need a PhD in queueing theory to understand why adding hops is dangerous. You just need one intuition from &lt;strong&gt;Little's Law&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;L = λW&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;L&lt;/em&gt; = average number of requests in the system&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;λ&lt;/em&gt; = arrival rate&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;W&lt;/em&gt; = average time a request spends in the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As W (the latency per request) increases due to extra hops, &lt;em&gt;L&lt;/em&gt; (backlog) grows proportionally. When backlog grows, queueing delays increase, which makes W larger, which makes L larger. This feedback loop is what turns a "5ms extra hop" into "500ms occasional spikes" — the system tips past its natural equilibrium.&lt;/p&gt;

&lt;p&gt;The practical implication: &lt;strong&gt;every hop you add reduces your headroom before the system becomes queue-bound under load.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;How to Actually Measure Your Latency Budget&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Knowing the theory is one thing. Measuring it in production is where most teams fail. Here's how to do it properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Trace every request end-to-end with OpenTelemetry&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Distributed tracing is the single most important tool for latency budget tracking. If you're not already using OpenTelemetry, this is the baseline.&lt;/p&gt;

&lt;p&gt;A basic setup in Node.js:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// tracing.js — initialize before anything else&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NodeSDK&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/sdk-node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;OTLPTraceExporter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/exporter-trace-otlp-http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;HttpInstrumentation&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/instrumentation-http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ExpressInstrumentation&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/instrumentation-express&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sdk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NodeSDK&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;traceExporter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OTLPTraceExporter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OTEL_EXPORTER_ENDPOINT&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:4318/v1/traces&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;instrumentations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HttpInstrumentation&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ExpressInstrumentation&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;sdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you have traces flowing into Jaeger, Tempo, or Honeycomb, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;See the &lt;strong&gt;waterfall diagram&lt;/strong&gt; for every request&lt;/li&gt;
&lt;li&gt;Identify which span is consuming the most time&lt;/li&gt;
&lt;li&gt;Filter by p99 requests specifically (filter by &lt;code&gt;duration &amp;gt; 400ms&lt;/code&gt;) and see what's different about them&lt;/li&gt;
&lt;li&gt;Compare span durations across percentiles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key metric to extract from your traces: &lt;strong&gt;span duration by percentile, per service.&lt;/strong&gt; Not aggregate. Per service. That's how you find the outlier.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Calculate your budget utilization per span&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Most teams look at total response time. What you want is a &lt;strong&gt;budget utilization view&lt;/strong&gt; — a percentage of the budget consumed at each hop.&lt;/p&gt;

&lt;p&gt;This is trivially expressible as a Prometheus query if you're using span metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Fraction of total budget consumed by each service
histogram_quantile(0.99,
  sum(rate(http_server_duration_bucket{service_name="inventory-service"}[5m])) by (le)
)
/ 0.200  # divided by your 200ms budget
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If this query returns 0.45 for inventory-service, that single service is consuming 45% of your budget at p99. You now have a number to act on.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Measure, don't estimate, the overhead of infrastructure layers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before profiling your application code, measure the bare overhead of your infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add a &lt;code&gt;/health&lt;/code&gt; endpoint to your service that does nothing except return 200&lt;/li&gt;
&lt;li&gt;Measure its latency from another pod in the same cluster&lt;/li&gt;
&lt;li&gt;That number is your infrastructure floor: it includes DNS, proxy overhead, TLS, and serialization&lt;/li&gt;
&lt;li&gt;Anything under your application latency is "free"; anything above it needs a reason&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a well-tuned Kubernetes cluster without a service mesh, this baseline is typically 0.5–2ms. With Istio mTLS, it's typically 2–8ms, sometimes higher.  &lt;a href="https://oneuptime.com/blog/post/2026-01-24-service-mesh-overhead/view" rel="noopener noreferrer"&gt;oneuptime&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Mistakes Teams Make (That Kill Their Latency Budget)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;These are the patterns I see repeatedly in real systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Alerting on p50 instead of p99&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Average and median latency look good right up until your on-call engineer gets paged by an angry stakeholder. Alert on p95 &lt;em&gt;and&lt;/em&gt; p99. The p50 is almost useless for user-facing latency SLOs.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Adding hops without counting them&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Every architectural decision that adds a synchronous network call should be an explicit tradeoff discussion: "This adds approximately Xms to our median latency and introduces Y% tail risk." That conversation almost never happens because teams think about correctness, not latency topology.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Treating timeouts as a safety net, not a budget item&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A timeout of 500ms on a downstream call is not "safe." If that downstream service is called on every request and occasionally hits its 500ms timeout, your &lt;em&gt;caller&lt;/em&gt; will block for 500ms before getting an error and returning a degraded response. Timeouts are not a performance feature. They're a correctness feature. Tune them aggressively.&lt;/p&gt;

&lt;p&gt;The right mental model: &lt;strong&gt;your timeout is the maximum you're willing to spend on that hop.&lt;/strong&gt; It should be a fraction of your total budget, not a failsafe.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Ignoring retry amplification&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Retries with no budget awareness are latency multipliers. If service A times out calling service B and retries twice, a single user request has now made three calls to service B. Under load, this turns transient slowness into a cascading failure. Always budget for retries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;effective_timeout = (retry_count + 1) * per_attempt_timeout + (retry_count * retry_delay)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you have 3 retries, a 100ms per-attempt timeout, and 50ms retry delay, a single user request can block for up to 450ms on that one hop. That's your entire budget, gone, on error handling.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Not accounting for fan-out in parallel calls&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Parallel service calls look free on a timeline diagram. They're not. The total latency of N parallel calls is &lt;code&gt;max(L1, L2, ..., LN)&lt;/code&gt; — the slowest one. And as N grows, the probability that at least one of them hits its p99 grows exponentially. A "parallel" checkout that calls 8 services simultaneously will hit the worst p99 of those 8 services on nearly every request.  &lt;a href="https://www.systemoverflow.com/learn/design-fundamentals/communication-patterns/latency-budgets-and-tail-amplification-in-multi-hop-synchronous-chains" rel="noopener noreferrer"&gt;systemoverflow&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Trusting that the service mesh is zero-cost&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Istio and Linkerd are excellent tools. They are not zero-cost. Benchmark them. Measure the overhead in your specific workload. The overhead depends heavily on payload size, connection concurrency, and CPU availability on the sidecar. At high RPS with large payloads, the overhead is significant.  &lt;a href="https://foci.uw.edu/papers/socc23-meshinsight.pdf" rel="noopener noreferrer"&gt;foci.uw&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Latency compounds invisibly across layers. Observability is the only way to see the full picture.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;How to Reduce Latency and Reclaim Your Budget&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Once you've measured the problem, here's how to actually fix it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Eliminate unnecessary synchronous hops entirely&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is the most impactful change and the hardest to get approved. Ask for every synchronous service call in your hot path: "Does this &lt;em&gt;need&lt;/em&gt; to happen before I return a response?"&lt;/p&gt;

&lt;p&gt;Feature flag resolution: cache flags locally and refresh asynchronously. Don't call a remote service on every request.&lt;/p&gt;

&lt;p&gt;Auth token validation: validate JWTs locally with a public key. Don't introspect them via HTTP.&lt;/p&gt;

&lt;p&gt;Audit logging: write to a local queue and flush asynchronously. The audit log doesn't need to be consistent before the user gets their response.&lt;/p&gt;

&lt;p&gt;Each hop you remove doesn't just save its own latency. It removes its entire tail distribution from your compound calculation.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Move from HTTP to faster transports where it matters&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;HTTP/1.1 → HTTP/2 for multiplexing. HTTP/2 → gRPC with connection reuse for internal service calls. gRPC with Protobuf serialization typically cuts serialization overhead by 3–10x compared to JSON, and connection multiplexing eliminates most connection-establishment overhead. This won't save you from architectural problems, but in a path where every millisecond counts, it's worth it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Parallel what can be parallel, but with a real fan-out budget&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If your request genuinely needs data from multiple services, call them in parallel. But bound the fan-out. If you're calling 8 services in parallel, set a hedged timeout — not "wait for all of them," but "wait until 95% respond and use degraded data for the rest." This is called &lt;strong&gt;partial responses&lt;/strong&gt; or &lt;strong&gt;timeout hedging&lt;/strong&gt;, and it's a powerful pattern for high-availability systems.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Example: parallel fetch with timeout and partial result tolerance&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;fetchWithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;services&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;ServiceClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;budgetMs&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;budgetMs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;services&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;svc&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;services&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="n"&gt;ServiceClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Degraded&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c"&gt;// Use fallback&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
        &lt;span class="p"&gt;}(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;svc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;4. Cache aggressively and correctly&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Not "add Redis in front of everything," but cache at the right layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;In-process cache&lt;/strong&gt; for data that rarely changes: feature flags, configuration, rate limit thresholds. This eliminates the hop entirely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distributed cache&lt;/strong&gt; (Redis, Memcached) for data that changes moderately and is expensive to recompute. But remember: a Redis call is still a network hop. Measure it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CDN or edge caching&lt;/strong&gt; for responses that are fully cacheable. The fastest hop is the one that never reaches your origin.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Tune your connection pools aggressively&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Connection pool exhaustion is one of the most common causes of sudden latency spikes in production. When a pool is exhausted, new requests queue waiting for a connection — and that queueing can spike your p99 into seconds even when the underlying service is healthy.&lt;/p&gt;

&lt;p&gt;For every downstream HTTP client in your system, explicitly configure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maximum connections&lt;/li&gt;
&lt;li&gt;Connection timeout (how long to wait for a connection from the pool)&lt;/li&gt;
&lt;li&gt;Request timeout (how long to wait for a response)&lt;/li&gt;
&lt;li&gt;Idle timeout (how long to keep an unused connection alive)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most HTTP client libraries default to conservative settings that are badly mismatched for high-throughput internal service calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6. Profile your serialization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Particularly in JVM-based and Node.js services, JSON serialization of large objects is surprisingly expensive. If you're serializing the same data structure on every request, consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-computing and caching the serialized form&lt;/li&gt;
&lt;li&gt;Switching to Protobuf or MessagePack for internal APIs&lt;/li&gt;
&lt;li&gt;Trimming your response payloads — only send what the caller actually uses&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Architectural Checklist&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before you ship any change that adds a new service call to a latency-sensitive path, run this checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Measured the baseline latency&lt;/strong&gt; of the new dependency (p50, p95, p99) in production or under realistic load&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Calculated the new compound p99&lt;/strong&gt; for the full request chain after adding this hop&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Verified the new p99 is within the latency budget&lt;/strong&gt; with margin for growth&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Considered async alternatives&lt;/strong&gt;: can this happen outside the request path?&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Set an explicit timeout&lt;/strong&gt; on the call — not a default, a deliberate number based on the budget allocation for this hop&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Defined a fallback&lt;/strong&gt; for when this call fails or times out — degraded response, cached result, default value&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Added tracing instrumentation&lt;/strong&gt; so this hop appears in distributed traces&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Added latency alerting&lt;/strong&gt; on this specific service-to-service call at p99&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Reviewed retry policy&lt;/strong&gt; — retries are multiplied against the timeout; have you budgeted for them?&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Checked connection pool settings&lt;/strong&gt; — are they tuned for the expected concurrency of this call?&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Reviewed if TLS/mTLS overhead&lt;/strong&gt; has been measured and accounted for in the budget&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If any of these items can't be answered confidently, the PR should not merge into a latency-sensitive path without an explicit team discussion.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;A Real Latency Budget Calculation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let's close the loop with a worked example you can adapt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System:&lt;/strong&gt; A mobile app backend. The product requirement is 150ms end-to-end response at p95 for the home feed endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget allocation:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Budget (p95)&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DNS + TCP + TLS (mobile to CDN edge)&lt;/td&gt;
&lt;td&gt;10ms&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CDN to origin gateway&lt;/td&gt;
&lt;td&gt;5ms&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gateway + auth (JWT local validation)&lt;/td&gt;
&lt;td&gt;5ms&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feature flag resolution (local cache)&lt;/td&gt;
&lt;td&gt;1ms&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feed service business logic&lt;/td&gt;
&lt;td&gt;30ms&lt;/td&gt;
&lt;td&gt;App team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary DB query (indexed read)&lt;/td&gt;
&lt;td&gt;25ms&lt;/td&gt;
&lt;td&gt;App team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recommendations service call&lt;/td&gt;
&lt;td&gt;35ms&lt;/td&gt;
&lt;td&gt;ML team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response serialization + compression&lt;/td&gt;
&lt;td&gt;8ms&lt;/td&gt;
&lt;td&gt;App team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Return path network&lt;/td&gt;
&lt;td&gt;10ms&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total allocated&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;129ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Remaining headroom&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;21ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This leaves 21ms of headroom before hitting the 150ms SLO. Now someone proposes adding a "personalization boost" service call. Its measured p95 is 18ms.&lt;/p&gt;

&lt;p&gt;If you add it synchronously to the hot path, your headroom drops to 3ms. Any slight increase in traffic, any GC event in any service, any network hiccup — and you're over budget. The right conversation is: "Can this call happen asynchronously? Can we pre-compute and cache the result? Does it need to be in the hot path?" Often the answer is no, it doesn't.&lt;/p&gt;

&lt;p&gt;This is how you defend your latency budget: with numbers, not intuition.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;More Engineering Reads&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If this kind of production-depth engineering writing is useful to you, more of it lives at &lt;strong&gt;&lt;a href="https://www.kallis.in" rel="noopener noreferrer"&gt;kallis.in&lt;/a&gt;&lt;/strong&gt; — a growing collection of engineering content covering system design, architecture, observability, and real-world development patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Takeaway&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The problem with latency budgets isn't that engineers don't care about them. It's that the damage is cumulative, invisible at the median, and always attributed to "the system getting more complex" rather than the specific architectural decisions that caused it.&lt;/p&gt;

&lt;p&gt;One extra hop is never just 5ms. It's 5ms at p50, and it's the entire tail distribution of that service — including its worst-day behavior — injected into every request that goes through it. Multiply that across five services added over six months, and you've turned a snappy product into something that users feel is "kinda slow sometimes."&lt;/p&gt;

&lt;p&gt;The tools to fight this aren't exotic. Distributed tracing, explicit budget allocation, p99 alerting, aggressive timeout tuning, and a cultural habit of treating every new synchronous hop as a cost that needs justification. That's it.&lt;/p&gt;

&lt;p&gt;Your architecture diagram shows boxes and arrows. Your users experience latency distributions. Make sure someone on your team is closing the gap between those two views — before your p99 starts closing it for you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;#performance #distributedsystems #systemdesign #backend #architecture #microservices&lt;/em&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>backend</category>
      <category>distributedsystems</category>
      <category>performance</category>
    </item>
    <item>
      <title>Building Reliable Agents with the Transactional Outbox Pattern and Redis Streams.</title>
      <dc:creator>Ovaise Qayoom</dc:creator>
      <pubDate>Sun, 29 Mar 2026 21:46:23 +0000</pubDate>
      <link>https://dev.to/ovaiseq/building-reliable-agents-with-the-transactional-outbox-pattern-and-redis-streams-5c80</link>
      <guid>https://dev.to/ovaiseq/building-reliable-agents-with-the-transactional-outbox-pattern-and-redis-streams-5c80</guid>
      <description>&lt;h2&gt;
  
  
  When Your AI Agent Makes the Right Call But Your System Doesn't Follow Through
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;You built an agent that works. The model is sharp, the decisions are correct — and yet, customers are still getting burned. Here's why, and how to fix it for good.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There's a moment every developer building AI agents eventually hits. The demo goes well. The model makes the right call. Everyone's impressed. Then you ship it, and three days later, a customer is furious because the refund the agent approved never actually happened.&lt;/p&gt;

&lt;p&gt;The agent didn't fail. The &lt;em&gt;system&lt;/em&gt; did.&lt;/p&gt;

&lt;p&gt;This is the problem nobody talks about when they're showing off agentic workflows: the gap between a decision being made and that decision being &lt;em&gt;trusted&lt;/em&gt; by the rest of your platform. That gap — the handoff — is where reliability goes to die.&lt;/p&gt;

&lt;p&gt;In this post, I want to talk about a pattern that solves this elegantly: the &lt;strong&gt;Transactional Outbox Pattern&lt;/strong&gt;, paired with &lt;strong&gt;Redis Streams&lt;/strong&gt;. It's not new. It's been a quiet workhorse in microservices architecture for years. But it's exactly the kind of infrastructure thinking that agentic systems desperately need right now.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I also write about patterns like this on my engineering blog at &lt;a href="https://www.kallis.in" rel="noopener noreferrer"&gt;kallis.in&lt;/a&gt; — feel free to check it out if this kind of systems design content interests you.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Real Problem: The Handoff
&lt;/h2&gt;

&lt;p&gt;Picture this: your customer support agent reads a conversation, applies the refund policy, and returns &lt;code&gt;"approve the refund"&lt;/code&gt;. Your service then does two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Updates the support case record to &lt;code&gt;refund_approved&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Publishes an event to billing so the money actually moves&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Seems simple. Until your process crashes between step 1 and step 2.&lt;/p&gt;

&lt;p&gt;Now your case record says "refund approved." Billing never got the event. The customer waits, gets nothing, calls support, and that conversation becomes a manual investigation. The worst part? &lt;strong&gt;If you check the database, everything looks fine.&lt;/strong&gt; The bug is invisible until someone notices the money never moved.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Agent Decision] → [Update Case ✅] → 💥 crash → [Publish Event ❌]

![Diagram 1](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/i5r1dh918ol15s5t41aq.webp)
Result: Case says "approved". Customer gets nothing.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't an AI problem. It's a &lt;strong&gt;handoff problem&lt;/strong&gt;. And it's been around long before LLMs were in the picture.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why "Just Retry" Doesn't Save You
&lt;/h2&gt;

&lt;p&gt;The instinctive fix most developers reach for is retries. "If the publish fails, retry it." Reasonable — until you think about &lt;em&gt;where&lt;/em&gt; the failure happens.&lt;/p&gt;

&lt;p&gt;Retries only help if your application &lt;strong&gt;still knows it has something to retry&lt;/strong&gt;. If the process crashes after the state update but before the event is written anywhere, there's nothing left to retry. The knowledge of "I need to publish this event" died with the process.&lt;/p&gt;

&lt;p&gt;This is the core distinction:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;What it solves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retries&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Delivering an event that &lt;em&gt;already exists&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transactional Outbox&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ensuring the event &lt;em&gt;exists in the first place&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Once you see it that way, the Outbox pattern stops feeling like ceremony and starts feeling like basic correctness.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Transactional Outbox Pattern Actually Does
&lt;/h2&gt;

&lt;p&gt;The idea is straightforward: &lt;strong&gt;when your business state changes, write an event record in the same atomic operation.&lt;/strong&gt; Don't publish the event directly to a message broker. Write it to an outbox table (or stream) that lives alongside your data, in the same commit.&lt;/p&gt;

&lt;p&gt;A dedicated relay process then reads from the outbox and delivers events to downstream systems. If delivery fails, the event is still in the outbox. Nothing is lost.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before (fragile):
  1. UPDATE case SET status = 'refund_approved'
  2. PUBLISH RefundApproved to billing  ← can fail silently

After (outbox pattern):
  1. ATOMICALLY:
       UPDATE case SET status = 'refund_approved'
       INSERT INTO outbox (event_type, payload) VALUES ('RefundApproved', {...})
  2. Relay picks up outbox row → delivers to billing (retryable, durable)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The request path now only has &lt;strong&gt;one job&lt;/strong&gt;: commit the decision and the event together. Everything after that is recoverable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Redis Streams Fit This Beautifully
&lt;/h2&gt;

&lt;p&gt;The Transactional Outbox pattern is most commonly associated with Kafka + Debezium. That stack is powerful, but it's also heavy. You're managing Kafka brokers, ZooKeeper (or KRaft), Debezium connectors, schema registries — and you haven't even started on your actual application yet.&lt;/p&gt;

&lt;p&gt;Redis Streams offer a much lighter path that preserves the semantics you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Append-only log&lt;/strong&gt; — events are written in order and stay there&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer groups&lt;/strong&gt; — billing, notifications, and CRM sync can each consume independently, at their own pace&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pending entry tracking&lt;/strong&gt; — Redis knows which messages have been delivered but not yet acknowledged&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in recovery&lt;/strong&gt; — unacknowledged messages stay in the pending list and can be reclaimed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the biggest win isn't any of those features individually. It's this: &lt;strong&gt;if your application state also lives in Redis, the case update and the outbox append can share a single &lt;code&gt;MULTI/EXEC&lt;/code&gt; transaction.&lt;/strong&gt; One atomic commit. No dual-write problem.&lt;/p&gt;

&lt;p&gt;With Kafka, you're coordinating two separate distributed systems. With Redis Streams, it's one.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Here's how the pieces fit together for a customer support agent scenario:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State lives in a Redis Hash:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;support:&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;tenant-acme&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;:case:case&lt;/span&gt;&lt;span class="mi"&gt;-123&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;status:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"refund_approved"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Outbox lives in a Redis Stream:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;support:&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;tenant-acme&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;:outbox&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;RefundApproved&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The hash tag &lt;code&gt;{tenant-acme}&lt;/code&gt; is doing important work here. In a clustered Redis setup, keys with the same hash tag are guaranteed to land in the same slot — which is what makes them eligible for the same transaction. Miss this and your &lt;code&gt;MULTI/EXEC&lt;/code&gt; will fail in production in ways that are maddening to debug.&lt;/p&gt;

&lt;p&gt;From there, downstream consumer groups each process the stream independently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                          ┌─► billing-cg      → issues refund
outbox stream ────────────┼─► notifications-cg → emails customer  
                          └─► crm-sync-cg     → updates CRM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each group moves at its own pace. If billing is slow, notifications aren't blocked. If CRM sync has a bug, billing keeps working.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuoijw6imx7x139yu2l3o.jpg" alt="Diagram 2" width="800" height="450"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Trade-offs Worth Thinking Through
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Source of truth colocation
&lt;/h3&gt;

&lt;p&gt;The pattern is strongest when your business state and outbox live in the same datastore. If your case data is in Postgres and your outbox is in Redis, you're back in dual-write territory with extra steps. Colocate them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-tenant streams over global streams
&lt;/h3&gt;

&lt;p&gt;A single global outbox stream sounds tempting but becomes a pain in clustered Redis. Per-tenant streams keep related events together, enable better ordering guarantees, and make incident investigation dramatically easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Idempotency is non-negotiable
&lt;/h3&gt;

&lt;p&gt;The outbox makes the handoff durable, but it doesn't make effects exactly-once. If a worker crashes after processing but before acknowledging, another worker will retry the same event. Your downstream handlers &lt;em&gt;must&lt;/em&gt; be safe to run more than once. Treat stream entries as immutable facts, not mutable instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retention needs a policy before go-live
&lt;/h3&gt;

&lt;p&gt;An outbox is a log, and logs grow. Trim too aggressively and you lose your replay window. Never trim and you have a slow-growing operational problem. Set a &lt;code&gt;MAXLEN&lt;/code&gt; policy before you ship and revisit it regularly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Redis is now part of your correctness model
&lt;/h3&gt;

&lt;p&gt;If the outbox carries refunds and escalations, Redis isn't a cache anymore. It's part of your durability story. That means thinking seriously about replication, AOF persistence, failover, and what happens during a Redis primary failure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Let's Look at the Code
&lt;/h2&gt;

&lt;p&gt;Here's how this looks in practice using Java and Jedis. The same concepts translate cleanly to any Redis client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;SupportKeys&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;caseKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;SupportKeys&lt;/span&gt; &lt;span class="nf"&gt;forCase&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;tenantId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;caseId&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;hashTag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"{"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tenantId&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"}"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;SupportKeys&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"support:"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;hashTag&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;":case:"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;caseId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"support:"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;hashTag&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;":outbox"&lt;/span&gt;
        &lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The hash tag ensures both keys land in the same Redis cluster slot — making them transactable together.&lt;/p&gt;

&lt;h3&gt;
  
  
  The atomic write — the heart of the pattern
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;RefundCommitted&lt;/span&gt; &lt;span class="nf"&gt;approveRefund&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RefundDecision&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;SupportKeys&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SupportKeys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;forCase&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tenantId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;caseId&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

    &lt;span class="c1"&gt;// Case state update&lt;/span&gt;
    &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;caseFields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LinkedHashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
    &lt;span class="n"&gt;caseFields&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"refund_approved"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;caseFields&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"updated_at"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;decidedAt&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toString&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="c1"&gt;// ... other fields&lt;/span&gt;

    &lt;span class="c1"&gt;// Outbox event&lt;/span&gt;
    &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;outboxFields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LinkedHashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
    &lt;span class="n"&gt;outboxFields&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"event_type"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"RefundApproved"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;outboxFields&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"event_id"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;eventId&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;outboxFields&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"refund_id"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;refundId&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="c1"&gt;// ... other fields&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AbstractTransaction&lt;/span&gt; &lt;span class="n"&gt;redisTx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jedis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;multi&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;redisTx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;hset&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;caseKey&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;caseFields&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;           &lt;span class="c1"&gt;// update state&lt;/span&gt;
        &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;StreamEntryID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;streamId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
            &lt;span class="n"&gt;redisTx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;xadd&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;                  &lt;span class="c1"&gt;// write outbox event&lt;/span&gt;
                         &lt;span class="nc"&gt;StreamEntryID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;NEW_ENTRY&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outboxFields&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redisTx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;exec&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;IllegalStateException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Transaction aborted"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;RefundCommitted&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;caseId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;eventId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
                                   &lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toString&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the entire correctness guarantee in one block. Either both writes happen or neither does. The downstream world never sees a case that changed without a corresponding event.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgxi0rh3oigwn498eel5s.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgxi0rh3oigwn498eel5s.jpg" alt="Diagram 3" width="800" height="308"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The consumer — processing with recovery
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;tenantId&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;outboxKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SupportKeys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;forCase&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenantId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"unused"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;createConsumerGroup&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;isInterrupted&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// First: drain anything pending from a previous crashed worker&lt;/span&gt;
        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;StreamMessage&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;readGroup&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;PENDING_ID&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isEmpty&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;processEntries&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Then: pick up new entries&lt;/span&gt;
        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;StreamMessage&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;fresh&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;readGroup&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;NEW_ENTRY_ID&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;fresh&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isEmpty&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;processEntries&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fresh&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200L&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key detail here is the two-pass approach: &lt;strong&gt;always drain pending entries first&lt;/strong&gt;. If a worker crashed mid-processing, those entries are still sitting in the pending list with the previous consumer's name attached. This loop ensures they get picked up and retried — which is exactly the recovery behavior the pattern is designed to provide.&lt;/p&gt;

&lt;p&gt;Processing looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;processEntries&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;StreamMessage&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StreamMessage&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;eventType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"event_type"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"RefundApproved"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;equals&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eventType&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;billingGateway&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;issueRefund&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"refund_id"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"customer_id"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;
            &lt;span class="c1"&gt;// Acknowledge only after successful processing&lt;/span&gt;
            &lt;span class="n"&gt;jedis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;xack&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outboxKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;BILLING_GROUP_NAME&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Log and move on — message stays pending for retry&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to process {}: {}"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMessage&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Acknowledge only after you've successfully processed. Never before. That's what keeps the pending list accurate and recovery reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;If you're building AI agents that trigger real-world actions, here's what to walk away with:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent isn't the reliability problem.&lt;/strong&gt; The handoff between the agent's decision and the downstream work is where things break.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Save state, then publish" is fragile by design.&lt;/strong&gt; Any crash in the gap creates invisible inconsistency that only surfaces later, during customer complaints or manual audits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Transactional Outbox pattern removes the worst failure mode&lt;/strong&gt; by making the decision and the event a single atomic commit. If the commit succeeds, the event exists and delivery becomes a recoverable problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redis Streams are a lightweight, well-suited fit&lt;/strong&gt; — especially when your application state is also in Redis. The hash tag design for cluster slot colocation is the detail that makes it actually work in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Idempotency and retention aren't optional.&lt;/strong&gt; Design for them before you ship, not after you hit the problem.&lt;/p&gt;

&lt;p&gt;Agentic systems are getting more capable quickly. But capability without reliability is just a more impressive way to fail. Patterns like this are what turn impressive demos into trustworthy production systems.&lt;/p&gt;




&lt;h1&gt;
  
  
  &lt;a href="https://www.kallis.in/blog/building-reliable-agents-with-the-transactional-outbox-pattern-and-redis-streams" rel="noopener noreferrer"&gt;Read More&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;If you found this useful, I write about systems design, distributed patterns, and engineering craft at &lt;a href="https://www.kallis.in" rel="noopener noreferrer"&gt;kallis.in&lt;/a&gt;. Come say hi.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;#ai&lt;/code&gt; &lt;code&gt;#redis&lt;/code&gt; &lt;code&gt;#distributedsystems&lt;/code&gt; &lt;code&gt;#architecture&lt;/code&gt; &lt;code&gt;#agents&lt;/code&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>redis</category>
      <category>ovaiseqayoom</category>
    </item>
    <item>
      <title>I Just Built a Production-Ready Authentication System with Supabase</title>
      <dc:creator>Ovaise Qayoom</dc:creator>
      <pubDate>Sat, 28 Mar 2026 07:15:08 +0000</pubDate>
      <link>https://dev.to/ovaiseq/i-just-built-a-production-ready-authentication-system-with-supabase-275</link>
      <guid>https://dev.to/ovaiseq/i-just-built-a-production-ready-authentication-system-with-supabase-275</guid>
      <description>&lt;h1&gt;
  
  
  Full Guide (Next.js App Router)
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Hey Dev Community!&lt;/strong&gt; 👋&lt;/p&gt;

&lt;p&gt;If you've ever struggled with &lt;strong&gt;real-world authentication&lt;/strong&gt; in a Next.js project — dealing with session expiry, middleware headaches, email verification, protected routes, or proper &lt;strong&gt;Row Level Security (RLS)&lt;/strong&gt; — this one's for you.&lt;/p&gt;

&lt;p&gt;I just published a &lt;strong&gt;complete, battle-tested guide&lt;/strong&gt; on my blog that walks you through building a &lt;strong&gt;full authentication system&lt;/strong&gt; using &lt;strong&gt;Supabase&lt;/strong&gt; and &lt;strong&gt;Next.js App Router&lt;/strong&gt;. No half-baked examples. No missing pieces. Everything you need to ship a secure, production-grade auth flow today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Guide is Different
&lt;/h2&gt;

&lt;p&gt;Most Supabase tutorials stop at &lt;code&gt;signInWithPassword&lt;/code&gt; and call it a day.&lt;br&gt;&lt;br&gt;
This one goes all the way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Middleware + HTTP-only cookies&lt;/strong&gt; with &lt;code&gt;@supabase/ssr&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Email verification&lt;/strong&gt; + &lt;strong&gt;password reset&lt;/strong&gt; flows&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Protected routes&lt;/strong&gt; using &lt;code&gt;getUser()&lt;/code&gt; on the server&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Role-Based Access Control (RBAC)&lt;/strong&gt; with Postgres RLS + profiles table&lt;/li&gt;
&lt;li&gt;✅ Proper client vs server Supabase clients&lt;/li&gt;
&lt;li&gt;✅ Security best practices &amp;amp; common pitfalls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Read the full step-by-step tutorial here:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://www.kallis.in/blog/build-full-authentication-system-supabase" rel="noopener noreferrer"&gt;How to Build a Full Authentication System with Supabase (Real Project Setup)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Learn
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Setting up &lt;strong&gt;Supabase Auth&lt;/strong&gt; correctly in Next.js 15/16&lt;/li&gt;
&lt;li&gt;Configuring &lt;strong&gt;middleware&lt;/strong&gt; for seamless session management&lt;/li&gt;
&lt;li&gt;Building a &lt;strong&gt;profiles table&lt;/strong&gt; with RLS policies&lt;/li&gt;
&lt;li&gt;Handling &lt;strong&gt;auth redirects&lt;/strong&gt; and edge cases&lt;/li&gt;
&lt;li&gt;The right mental model for scalable auth architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The guide is written from a &lt;strong&gt;real project perspective&lt;/strong&gt; — exactly how I implement auth for client work and SaaS products.&lt;/p&gt;

&lt;h2&gt;
  
  
  More from Kallis Blog
&lt;/h2&gt;

&lt;p&gt;Love deep-dive technical content? Check out these recent posts too:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://www.kallis.in/blog/future-of-react-server-components" rel="noopener noreferrer"&gt;React Server Components in 2026: The Complete Guide to Faster, Smarter React Apps&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://www.kallis.in/blog/mastering-tailwind-css" rel="noopener noreferrer"&gt;Mastering Tailwind CSS in 2026: The Complete Guide to Scalable, Production-Grade UI Systems&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Browse the full blog archive:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://www.kallis.in/blog" rel="noopener noreferrer"&gt;All Articles on Kallis Blog&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Main website:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://www.kallis.in" rel="noopener noreferrer"&gt;www.kallis.in&lt;/a&gt;&lt;/strong&gt; – Web Development, SaaS, Backend APIs, Mobile Apps &amp;amp; SEO Services&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Would love your feedback!&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Drop a comment below if you’ve tried Supabase auth before. What was the biggest pain point for you?  &lt;/p&gt;

&lt;p&gt;If this guide helped (or if you want me to cover OAuth, MFA, or anything else next), give it a ❤️ and share it with your network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Follow me on X&lt;/strong&gt; for more Next.js + Supabase + modern web dev content: &lt;a href="https://twitter.com/kallisx1" rel="noopener noreferrer"&gt;@ovaiseqayoom&lt;/a&gt; (or just search Ovaise Qayoom).&lt;/p&gt;

&lt;p&gt;Happy coding! 🚀&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Published by Ovaise Qayoom | Kallis Blog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>authjs</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
