<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Emmanuel Onuiteshi</title>
    <description>The latest articles on DEV Community by Emmanuel Onuiteshi (@immanuel_nonzo).</description>
    <link>https://dev.to/immanuel_nonzo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F826898%2F1c071f4f-24eb-4156-b6c9-10a9049c22c2.jpeg</url>
      <title>DEV Community: Emmanuel Onuiteshi</title>
      <link>https://dev.to/immanuel_nonzo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/immanuel_nonzo"/>
    <language>en</language>
    <item>
      <title>Scaling to 1 Million Users : Load Balancing &amp; Caching Strategies</title>
      <dc:creator>Emmanuel Onuiteshi</dc:creator>
      <pubDate>Mon, 25 May 2026 22:45:28 +0000</pubDate>
      <link>https://dev.to/immanuel_nonzo/scaling-to-1-million-users-load-balancing-caching-strategies-2g8l</link>
      <guid>https://dev.to/immanuel_nonzo/scaling-to-1-million-users-load-balancing-caching-strategies-2g8l</guid>
      <description>&lt;p&gt;&lt;em&gt;You build a URL shortener in a weekend. It works perfectly. Then it goes viral.&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;At first it’s just your friends. Pages load instantly, and they love what you’ve built. They share the link with others, and you watch the user count tick towards a hundred. That quiet excitement hits; you’ve made something real. Then more people start using it.&lt;/p&gt;

&lt;p&gt;You keep opening your dashboard. The numbers are climbing faster than you can refresh. You are half excited, half terrified.  Hundreds is turning into thousands. Then the notifications start: the app is slow, links are not redirecting, users are complaining publicly. The same system that handled a hundred users without blinking is now falling apart under a thousand. You have not changed a single line of code. &lt;br&gt;
So what went wrong?&lt;/p&gt;

&lt;p&gt;Nothing went wrong. You just hit the wall that every growing system hits eventually. &lt;br&gt;
The question is:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What do you actually do?&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Scaling Roadmap
&lt;/h3&gt;

&lt;p&gt;Scaling isn’t a single decision, it’s a series of targeted upgrades, each one unlocking the next order of magnitude. Here’s the progression every high-traffic app follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flopqrpxoq9xkcyxdahk4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flopqrpxoq9xkcyxdahk4.png" alt=" " width="563" height="58"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;•Single Server - handles your first few thousand users&lt;br&gt;
•Load Balancer - distributes traffic across multiple servers&lt;br&gt;
•Caching Layer - serves popular data from memory instead of the database&lt;br&gt;
•Content Delivery Network (CDN) - pushes content closer to users globally&lt;br&gt;
•Distributed Cache - spreads cache across multiple machines for millions of users&lt;/p&gt;

&lt;p&gt;Single Server : Your first deployment runs everything on one machine: request handling, database queries, link generation, and page serving. This is perfectly fine up to a few thousand users; the pages load fast and the setup is simple. Don’t over-engineer this stage.&lt;/p&gt;

&lt;p&gt;Load Balancer : Once you’re in the tens of thousands, a single server starts to buckle. Requests queue up, response times climb, and occasional timeouts start appearing. A load balancer sits in front of your servers and distributes incoming traffic across a pool of app servers, ensuring no single machine becomes a bottleneck. Traffic spikes that would have crashed your app are now absorbed gracefully.&lt;/p&gt;

&lt;p&gt;Caching Layer : At hundreds of thousands of users, a pattern becomes obvious: the same short codes are being resolved over and over. Instead of hitting the database every time, a cache layer stores the most frequently accessed mappings in memory. A lookup that previously cost a 40ms database round-trip now completes in under 1ms. Database load drops dramatically, and your app can handle far more concurrent users on the same hardware.&lt;/p&gt;

&lt;p&gt;Content Delivery Network (CDN) : Once your users are spread across the globe, physical distance becomes a problem. A CDN places copies of your static assets and cache-able responses at edge locations around the world. A user in Lagos, Berlin, or Sydney gets their redirect served from a nearby edge node rather than your origin server in, say, Virginia. Latency drops from hundreds of milliseconds to single digits.&lt;/p&gt;

&lt;p&gt;Distributed Caching : At millions of users, even a single powerful cache server becomes a constraint. A distributed cache; like a Redis Cluster, spreads data across multiple nodes. The most popular short links are served instantly from memory, read throughput scales horizontally, and the system stays fast even under massive, sustained load.&lt;/p&gt;
&lt;h3&gt;
  
  
  Load Balancing: Distributing Traffic Across Servers
&lt;/h3&gt;

&lt;p&gt;Round-Robin : Round-robin is the simplest traffic distribution strategy: each incoming request is sent to the next server in rotation, cycling back to the start. It works well when servers are equally capable and traffic is fairly uniform. For a URL shortener handling stateless redirect requests, round-robin is a reasonable starting point at modest scale.&lt;/p&gt;

&lt;p&gt;But round-robin has a critical blind spot. It knows nothing about data locality. If one server has cached a hot short code in memory, round-robin may send the next request for that code to a different server entirely, causing a cache miss. At scale, this causes unnecessary database pressure and unpredictable latency. Adding or removing servers also reshuffles which server handles which requests, wiping out accumulated cache state.&lt;/p&gt;

&lt;p&gt;The Rehashing Problem : Imagine your URL shortener has four servers, each caching a quarter of your popular short codes. You add a fifth server to handle increased load. With naive modulo hashing &lt;code&gt;(short_code % number_of_servers)&lt;/code&gt;, roughly 80% of your cache keys now map to different servers. Users experience redirect failures and slowdowns while servers frantically rebuild their caches. &lt;br&gt;
It’s like rearranging a warehouse mid-shipment.&lt;/p&gt;

&lt;p&gt;Consistent Hashing: The Production Solution&lt;br&gt;
Consistent hashing solves this cleanly. Picture a ring. Servers occupy fixed positions along the ring, and each short code is hashed to a point on the ring. Requests route clockwise to the nearest server. When you add a new server, only the keys in the arc immediately preceding its position need to migrate roughly 1/N of total keys, where N is the number of servers. Virtual nodes (multiple positions per server) smooth out load distribution even further.&lt;/p&gt;

&lt;p&gt;For your URL shortener, consistent hashing on the short_code ensures that popular links reliably route to the server holding their cache, and that adding capacity during a traffic spike doesn’t cascade into a cache stampede.&lt;/p&gt;

&lt;p&gt;Here’s how round-robin looks in an NGINX upstream configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;upstream app_servers {
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
}

server {
    location / {
        proxy_pass http://app_servers;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Algorithm comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;td&gt;Algorithm&lt;/td&gt;
        &lt;td&gt;Keys Moved on Change&lt;/td&gt;
        &lt;td&gt;Used By&lt;/td&gt;
        &lt;td&gt;Best For&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Round-Robin&lt;/td&gt;
        &lt;td&gt;N/A (no cache affinity)&lt;/td&gt;
        &lt;td&gt;NGINX default&lt;/td&gt;
        &lt;td&gt;Stateless, uniform requests&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Mod-N Hashing&lt;/td&gt;
        &lt;td&gt;~80% when N changes&lt;/td&gt;
        &lt;td&gt;Legacy systems&lt;/td&gt;
        &lt;td&gt;Static server pools only&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Consistent Hashing&lt;/td&gt;
        &lt;td&gt;~1/N (minimum possible)&lt;/td&gt;
        &lt;td&gt;DynamoDB, Cassandra, Akamai&lt;/td&gt;
        &lt;td&gt;Dynamic scaling, cache affinity&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Power of Two Choices&lt;/td&gt;
        &lt;td&gt;N/A (load-aware)&lt;/td&gt;
        &lt;td&gt;AWS Lambda, Envoy&lt;/td&gt;
        &lt;td&gt;Multi-LB environments, service mesh&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Real-world precedent : Netflix applies consistent hashing to route requests to the servers holding cached video segment data. Popular content is served without repeatedly querying origin storage, keeping playback smooth even under massive load. The same principle applies directly to your URL shortener.&lt;/p&gt;

&lt;h2&gt;
  
  
  HTTP Caching: Making the Web Faster
&lt;/h2&gt;

&lt;p&gt;HTTP caching is built into the web protocol. When configured correctly, browsers and CDN edge nodes store responses locally, eliminating redundant trips to your origin servers. The key headers are:&lt;/p&gt;

&lt;p&gt;•Cache-Control - defines how long content should be stored and by whom&lt;br&gt;
•ETag - a fingerprint that lets clients check whether cached content is still fresh&lt;br&gt;
•Vary - specifies which request headers affect the cached response&lt;/p&gt;

&lt;p&gt;Understanding Cache-Control : A common misconception: Cache-Control: no-cache does not mean “don’t cache.” It means “cache, but revalidate before serving.” The response can live in memory; it just can’t be served without checking freshness first. Understanding this distinction is essential to using caching effectively.&lt;/p&gt;

&lt;p&gt;A more powerful pattern is splitting browser and CDN TTLs:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Cache-Control: public, max-age=60, s-maxage=3600&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This tells browsers to cache for 60 seconds (so users get fast responses on repeated clicks) and CDNs to cache for an hour (so your origin servers rarely see requests for popular links). Browsers validate frequently; CDNs absorb the bulk of the load.&lt;/p&gt;

&lt;p&gt;ETags and Conditional Requests : On the first request, your server returns a response with an ETag header,  a hash or version identifier. The browser stores it. On the next request, the browser sends the ETag back. If the content hasn’t changed, the server responds with 304 Not Modified. No body is sent, bandwidth is saved, and the user experiences an instant load. For a URL shortener, this matters for any metadata pages where content changes infrequently.&lt;/p&gt;

&lt;p&gt;Stale-while-revalidate : &lt;code&gt;stale-while-revalidate&lt;/code&gt; allows serving an expired cache entry immediately while fetching a fresh copy in the background. Applied to your URL shortener, this means a redirect response can be served from cache even after its TTL expires, with the cache refreshed transparently. Users never see a delay during high-traffic bursts.&lt;/p&gt;

&lt;p&gt;The Vary Header Trap&lt;br&gt;
Vary: User-Agent forces caches to store a separate copy for every distinct browser and device type. This silently destroys cache efficiency, every variation gets its own slot, and cache hit rates collapse. Avoid broad Vary headers unless you’re genuinely serving different content per device.&lt;/p&gt;
&lt;h2&gt;
  
  
  CDN Architecture: Bringing Content Closer to Users
&lt;/h2&gt;

&lt;p&gt;At its core, a CDN is a distributed HTTP cache. Instead of routing every request back to your origin server, copies of your content live at dozens of edge locations worldwide. For a URL shortener, this means viral links, the small fraction that receive massive traffic  can be served entirely from the edge, with zero database involvement.&lt;/p&gt;
&lt;h2&gt;
  
  
  Pull CDN vs Push CDN
&lt;/h2&gt;

&lt;p&gt;Pull CDN : lazily fetches content from your origin only when a user first requests it. The cache fills naturally over time. Ideal for dynamic or unpredictable content, like short codes whose popularity you can’t know in advance.&lt;/p&gt;

&lt;p&gt;Push CDN : requires you to proactively upload content to edge nodes. Best for static resources or pre-generated redirect tables for your most popular links.&lt;/p&gt;

&lt;p&gt;Real-world precedent : Netflix Open Connect achieves a 98% CDN cache hit rate for video streams. Nearly every video chunk is served from the edge, not from Netflix’s origin data centers. The same model applies directly to a URL shortener: the top 0.1% of links can be handled entirely at the edge, leaving your database untouched.&lt;/p&gt;

&lt;p&gt;Cache invalidation strategies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;td&gt;Strategy&lt;/td&gt;
        &lt;td&gt;How It Works&lt;/td&gt;
        &lt;td&gt;Speed&lt;/td&gt;
        &lt;td&gt;Use Case&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;TTL Expiration&lt;/td&gt;
        &lt;td&gt;Content expires automatically after N seconds&lt;/td&gt;
        &lt;td&gt;Delayed (waits for TTL)&lt;/td&gt;
        &lt;td&gt;Slow-changing content:: blog posts, product pages&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Purge API&lt;/td&gt;
        &lt;td&gt;Manual API call instantly removes cached content&lt;/td&gt;
        &lt;td&gt;Fastly: 150ms global&lt;/td&gt;
        &lt;td&gt;News, e-commerce inventory, breaking content&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Surrogate Keys&lt;/td&gt;
        &lt;td&gt;Tag responses; purge all tagged objects at once&lt;/td&gt;
        &lt;td&gt;Same as purge&lt;/td&gt;
        &lt;td&gt;Complex relationships: purge all product-123 pages&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Soft Purge&lt;/td&gt;
        &lt;td&gt;Mark stale, serve old while refreshing in background&lt;/td&gt;
        &lt;td&gt;Immediate serve&lt;/td&gt;
        &lt;td&gt;High-traffic pages where downtime is unacceptable&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;CDN provider comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;td&gt;Feature&lt;/td&gt;
        &lt;td&gt;Cloudflare&lt;/td&gt;
        &lt;td&gt;AWS CloudFront&lt;/td&gt;
        &lt;td&gt;Fastly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;PoPs&lt;/td&gt;
        &lt;td&gt;330+ cities&lt;/td&gt;
        &lt;td&gt;750+ PoPs + 1,140 embedded&lt;/td&gt;
        &lt;td&gt;~200 strategic&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Routing&lt;/td&gt;
        &lt;td&gt;Anycast (single IP, BGP routing)&lt;/td&gt;
        &lt;td&gt;DNS-based (+ Anycast option)&lt;/td&gt;
        &lt;td&gt;Anycast&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Purge Speed&lt;/td&gt;
        &lt;td&gt;Sub-150ms global&lt;/td&gt;
        &lt;td&gt;Seconds to minutes&lt;/td&gt;
        &lt;td&gt;150ms global (since 2011)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Edge Compute&lt;/td&gt;
        &lt;td&gt;Workers: V8 Isolates, &amp;lt;1ms cold start&lt;/td&gt;
        &lt;td&gt;Lambda@Edge or CF Functions&lt;/td&gt;
        &lt;td&gt;Compute@Edge: WebAssembly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Cache Invalidation&lt;/td&gt;
        &lt;td&gt;Purge API + Cache Rules&lt;/td&gt;
        &lt;td&gt;API (slow) + versioned URLs&lt;/td&gt;
        &lt;td&gt;Surrogate keys: best in class&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Free Tier&lt;/td&gt;
        &lt;td&gt;Generous: unlimited bandwidth&lt;/td&gt;
        &lt;td&gt;Pay per GB from first byte&lt;/td&gt;
        &lt;td&gt;No free tier&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Redis: Application-Level Caching
&lt;/h2&gt;

&lt;p&gt;Beyond the HTTP layer, your application needs its own in-memory cache. Redis is the industry standard: it stores data in RAM rather than on disk, making look-ups orders of magnitude faster than a database query. For a URL shortener, Redis is the layer that makes redirect responses feel instantaneous.&lt;/p&gt;

&lt;p&gt;Cache-Aside: The Recommended Pattern&lt;br&gt;
When a user clicks a short link, your app checks Redis first. If the mapping is there, it’s returned immediately. If not, the app queries the database, returns the result, and stores it in Redis for future requests. Most subsequent clicks on that link never touch the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def get_short_url(short_code):
    url = cache.get(short_code)       # Step 1: Check cache
    if not url:                        # Step 2: Cache miss
        url = db.query(short_code)     #   Query database
        cache.set(short_code, url)     # Step 3: Populate cache
    return url
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Write-Through vs Write-Behind&lt;/p&gt;

&lt;p&gt;Write-Through : writes to both cache and database simultaneously. Guarantees consistency but doubles write latency. Use this when data correctness is non-negotiable.&lt;/p&gt;

&lt;p&gt;Write-Behind : writes to cache first and flushes to the database asynchronously. Faster writes, but risks data loss if the cache crashes before the flush completes. Use this for high-throughput analytics where some loss is acceptable.&lt;/p&gt;

&lt;p&gt;Cache Stampede: The Failure Mode You Must Plan For&lt;br&gt;
A cache stampede happens when a popular cache key expires and thousands of concurrent requests simultaneously find a miss. Each one fires a database query. The database buckles under the load. For a URL shortener, a single viral link expiring at the wrong moment can trigger exactly this scenario.&lt;/p&gt;

&lt;p&gt;Three defences:&lt;/p&gt;

&lt;p&gt;1.TTL jitter: Randomize expiry times slightly so keys don’t expire simultaneously&lt;br&gt;
2.Distributed lock (Redis SET NX EX): Only one request rebuilds the cache; others wait&lt;br&gt;
3.XFetch: proactively Refresh hot keys just before they expire, preventing the miss entirely&lt;/p&gt;

&lt;p&gt;Memory Optimization&lt;/p&gt;

&lt;p&gt;Real-world precedent: Instagram stored 300 million URL mappings in Redis using 21 GB of memory. By switching to Redis ziplist encoding (which compacts small structures), they reduced that to 5 GB - a 76% reduction. For your URL shortener, similar techniques (efficient serialization, compact data structures) can dramatically cut infrastructure costs at scale.&lt;/p&gt;

&lt;p&gt;Eviction Policy&lt;br&gt;
&lt;code&gt;allkeys-lru&lt;/code&gt;(Least Recently Used) is the right default for general workloads. If your traffic follows an 80/20 pattern: 20% of links generating 80% of clicks, then allkeys-lfu (Least Frequently Used) keeps your hottest links in memory while evicting cold ones. Choosing the right policy ensures cache performance holds under sustained load.&lt;/p&gt;

&lt;h2&gt;
  
  
  Everything Applied: One Request, End to End
&lt;/h2&gt;

&lt;p&gt;So let’s say a user in Nigeria clicks a short link in a tweet. Here’s exactly what happens across the full stack:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxj1g96lej5l3e31gyqwz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxj1g96lej5l3e31gyqwz.png" alt=" " width="558" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: CDN Edge (~5ms)
&lt;/h2&gt;

&lt;p&gt;The request hits the nearest CDN edge location. For the top 0.1% of viral links, the ones cached at the edge via s-maxage, the redirect response is returned immediately. The request never reaches your servers. TTL jitter ensures popular links don’t expire in sync, preventing coordinated cache misses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Redis / Cache-Aside (~10ms)
&lt;/h2&gt;

&lt;p&gt;If the CDN doesn’t have the link, the request reaches your app servers. Cache-Aside checks Redis for the short_code. A hit returns the mapping in under 10ms. A miss triggers the database path. Distributed locks or XFetch prevent simultaneous misses from cascading into a stampede.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Database (~40ms)
&lt;/h2&gt;

&lt;p&gt;On a cache miss, the app queries the sharded database (consistent hashing routes the query to the correct shard), retrieves the mapping, writes it back to Redis, and responds. Ziplist encoding and appropriate eviction policies keep the cache lean and performant for the next request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Redirect Response: 301 vs 302
&lt;/h2&gt;

&lt;p&gt;Why 302 and not 301? Bitly famously uses 302 because click analytics are their core product. A 301 permanently caches the redirect in the browser, making future clicks invisible to their tracking. A 302 ensures every click is recorded. For your URL shortener, the answer depends on whether analytics matter more than marginal performance gains.&lt;/p&gt;

&lt;p&gt;Performance at scale; back of the envelope:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;td&gt;Metric&lt;/td&gt;
        &lt;td&gt;Calculation&lt;/td&gt;
        &lt;td&gt;Result&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;New URLs created&lt;/td&gt;
        &lt;td&gt;100M per month / 30 days / 86,400 sec&lt;/td&gt;
        &lt;td&gt;~40 writes/second&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;URL redirects (100:1 read ratio)&lt;/td&gt;
        &lt;td&gt;40 writes/sec × 100&lt;/td&gt;
        &lt;td&gt;4,000 reads/sec (40K at peak)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Short code space (7 chars, Base62)&lt;/td&gt;
        &lt;td&gt;62^7&lt;/td&gt;
        &lt;td&gt;3.52 trillion combinations (~100 years)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Storage (5 years)&lt;/td&gt;
        &lt;td&gt;100M × 12 months × 5 years × 500 bytes&lt;/td&gt;
        &lt;td&gt;~3 TB before replication&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Redis hot cache (top 1% = 90% of traffic)&lt;/td&gt;
        &lt;td&gt;1% of daily URLs × 500 bytes&lt;/td&gt;
        &lt;td&gt;~330 MB caches 90%+ of reads&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What breaks at each scale stage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;td&gt;Scale&lt;/td&gt;
        &lt;td&gt;What Breaks&lt;/td&gt;
        &lt;td&gt;The Fix&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;0-1K users&lt;/td&gt;
        &lt;td&gt;Nothing&lt;/td&gt;
        &lt;td&gt;Single server, SQLite or MySQL, no Redis needed&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;1K-10K users&lt;/td&gt;
        &lt;td&gt;Database read bottleneck&lt;/td&gt;
        &lt;td&gt;Add Redis cache-aside, add read replica&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;10K-100K users&lt;/td&gt;
        &lt;td&gt;App server CPU ceiling&lt;/td&gt;
        &lt;td&gt;Load balancer + 2-3 app servers (Round-Robin)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;100K-500K users&lt;/td&gt;
        &lt;td&gt;Cache miss spikes overwhelming DB&lt;/td&gt;
        &lt;td&gt;CDN for redirects, TTL jitter, Redis cluster&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;500K-1M users&lt;/td&gt;
        &lt;td&gt;Database write throughput ceiling&lt;/td&gt;
        &lt;td&gt;Sharding with consistent hashing, async analytics&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;1M+ users&lt;/td&gt;
        &lt;td&gt;Single-region latency for global users&lt;/td&gt;
        &lt;td&gt;Multi-region, GeoDNS routing, regional Redis clusters&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Key Trade-Offs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;td&gt;Decision&lt;/td&gt;
        &lt;td&gt;Option A&lt;/td&gt;
        &lt;td&gt;Option B&lt;/td&gt;
        &lt;td&gt;Choose Based On&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Redirect type&lt;/td&gt;
        &lt;td&gt;301 Permanent (browser caches; no return trips)&lt;/td&gt;
        &lt;td&gt;302 Temporary (every click reaches your servers)&lt;/td&gt;
        &lt;td&gt;Need analytics? Use 302. Click data is the product for companies like Bitly.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Short code generation&lt;/td&gt;
        &lt;td&gt;Auto-increment + Base62 encode (predictable, zero collisions)&lt;/td&gt;
        &lt;td&gt;Hash (MD5 truncated; collision risk)&lt;/td&gt;
        &lt;td&gt;At scale, auto-increment + XOR obfuscation beats hash complexity.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Database choice&lt;/td&gt;
        &lt;td&gt;NoSQL (DynamoDB/Cassandra): horizontal sharding native&lt;/td&gt;
        &lt;td&gt;SQL (MySQL): simpler, vertical scaling ceiling&lt;/td&gt;
        &lt;td&gt;At 4,000 reads/sec, NoSQL with consistent hashing wins.&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Engineering Mindset
&lt;/h2&gt;

&lt;p&gt;The best engineers are not the ones who know every tool. They are the ones who understand trade-offs deeply enough to make the right call for their specific system, their specific constraints, and their specific users.&lt;/p&gt;

&lt;p&gt;For example, as a developer in Nigeria, your constraints are real: users are on expensive data plans, connectivity that drops without warning, servers that are geographically far away. Every caching decision you make is an act of empathy for the person on a 3G connection in Kano trying to load your app. Build accordingly. The goal isn’t to over-engineer early, it’s to design systems that evolve as demand increases.&lt;/p&gt;

&lt;p&gt;Scaling to one million users is less about powerful hardware and more about smart architecture. Three principles drive it:&lt;/p&gt;

&lt;p&gt;•Distribute traffic using load balancing&lt;br&gt;
•Reduce repeated computation through caching&lt;br&gt;
•Move content closer to users using CDNs&lt;/p&gt;

&lt;p&gt;When combined, these strategies dramatically reduce database load, lower infrastructure costs, and keep your application fast, from your first hundred users to your first million.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;•Designing Data-Intensive Applications. Kleppmann (chapters 5-6)&lt;br&gt;
•RFC 9111. HTTP Caching (current standard)&lt;br&gt;
•ByteByteGo YouTube. Alex Xu; visual system design&lt;br&gt;
•Instagram Engineering Blog. Redis memory optimization&lt;br&gt;
•Scaling Memcache at Facebook. NSDI 2013 (Nishtala et al.)&lt;br&gt;
•Redis Official Docs. Caching patterns &amp;amp; eviction reference&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next?
&lt;/h2&gt;

&lt;p&gt;Once caching and load balancing are in place and your system is serving one million users reliably from cache, the next frontier is real-time communication at scale. Technologies like WebSockets and Server-Sent Events introduce a fundamentally different set of constraints; persistent connections, event fan-out, and stateful session management.&lt;/p&gt;

&lt;p&gt;Next post will cover WebSockets, HTTP polling, and Server-Sent Events. Follow or subscribe for the next post.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>backend</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>From Browser to Server : The Journey of an HTTP Request (Demystifying the Web’s Infrastructure)</title>
      <dc:creator>Emmanuel Onuiteshi</dc:creator>
      <pubDate>Mon, 25 May 2026 14:30:25 +0000</pubDate>
      <link>https://dev.to/immanuel_nonzo/from-browser-to-server-the-journey-of-an-http-request-demystifying-the-webs-infrastructure-599i</link>
      <guid>https://dev.to/immanuel_nonzo/from-browser-to-server-the-journey-of-an-http-request-demystifying-the-webs-infrastructure-599i</guid>
      <description>&lt;h2&gt;
  
  
  What Actually Happens When You Press Enter?
&lt;/h2&gt;

&lt;p&gt;You type &lt;a href="http://www.google.com" rel="noopener noreferrer"&gt;www.google.com&lt;/a&gt; and press Enter. Half a second later, a fully rendered page appears. Nobody taught you to find that remarkable. But as a developer, that half second is your responsibility.&lt;/p&gt;

&lt;p&gt;It takes 0.5 seconds. But it touches 7 layers of infrastructure.&lt;br&gt;
Here is every layer, in order.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1: DNS Lookup; The Internet’s Phonebook
&lt;/h2&gt;

&lt;p&gt;Humans remember names. Computers understand numbers. DNS translates google.com into 142.250.190.46. &lt;br&gt;
The lookup chain: your browser cache → OS cache → Recursive Resolver → Root Server → TLD Server → Authoritative Server. &lt;br&gt;
The whole chain completes in milliseconds.&lt;br&gt;
When DNS fails, no website loads at all. It is so foundational that its failure looks like the entire internet is broken.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 2: TCP Connection; The 3-Way Handshake
&lt;/h2&gt;

&lt;p&gt;Having the IP address is not enough. Your device and the server need to confirm they are both ready to communicate reliably. TCP handles this with three messages before a single byte of your request moves:&lt;br&gt;
•SYN: “can we talk?”&lt;br&gt;
•SYN-ACK: “yes, let’s talk”&lt;br&gt;
•ACK: “connection open”&lt;/p&gt;

&lt;p&gt;On a Lagos to Frankfurt connection, this round trip is 100 to 150ms. On a local server, under 5ms. That gap is why CDN edge nodes matter because they bring the handshake closer to your users.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 3: The HTTP Request
&lt;/h2&gt;

&lt;p&gt;With the connection open, your browser sends a structured request. Three parts:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Part&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Request Line&lt;/td&gt;
&lt;td&gt;The verb and path&lt;/td&gt;
&lt;td&gt;GET /index.html HTTP/1.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Headers&lt;/td&gt;
&lt;td&gt;Context about the request&lt;/td&gt;
&lt;td&gt;Host, User-Agent, Cookie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Body&lt;/td&gt;
&lt;td&gt;Payload (POST/PUT only)&lt;/td&gt;
&lt;td&gt;JSON data, form fields&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;HTTP methods define intent: GET retrieves, POST creates, PUT/PATCH updates, DELETE removes. Using the wrong method breaks caching. GET requests are cached by default; POST requests are not. If you are using POST to fetch data, you are bypassing the entire caching layer unnecessarily.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 4: Client-Server Architecture
&lt;/h2&gt;

&lt;p&gt;Right, so the request has arrived somewhere. But where exactly? And who is allowed to touch what?&lt;br&gt;
&lt;em&gt;The Rule: the client is never allowed inside the kitchen. They must ask the waiter, who asks the kitchen on their behalf.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Think of it like a restaurant. You are the customer. You can read the menu and place an order, but you do not walk into the kitchen yourself. The waiter (the network) carries your request. The kitchen (the server) does the actual work. And the pantry (the database) holds all the ingredients.&lt;/p&gt;

&lt;p&gt;Most production systems follow the 3-tier model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;td&gt;Layer&lt;/td&gt;
        &lt;td&gt;Role&lt;/td&gt;
        &lt;td&gt;Technologies&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Presentation&lt;/td&gt;
        &lt;td&gt;What the user sees&lt;/td&gt;
        &lt;td&gt;HTML, CSS, JavaScript, React&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Application&lt;/td&gt;
        &lt;td&gt;Business logic and rules&lt;/td&gt;
        &lt;td&gt;Node.js, Python, Go, Java&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Data&lt;/td&gt;
        &lt;td&gt;Persistent storage&lt;/td&gt;
        &lt;td&gt;PostgreSQL, MongoDB, Redis&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each layer talks only to the layer immediately next to it. The browser never touches the database directly. This boundary is a security constraint, not just a convention.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 5: Server Processing and REST
&lt;/h2&gt;

&lt;p&gt;So the request has made it past the front door. Now your application server actually does something with it. Here is the typical flow:&lt;br&gt;
 &lt;br&gt;
• The web server (Nginx, Apache) receives the raw request and routes it inward&lt;br&gt;
• Middleware runs: authentication checks, rate limiting, request logging&lt;br&gt;
• The router matches the URL and HTTP method to a specific handler function&lt;br&gt;
• The handler runs your business logic, queries the database if needed, and builds a response&lt;br&gt;
 &lt;br&gt;
This is where REST comes in. REST is the set of conventions that makes this process predictable and consistent. The four rules:&lt;br&gt;
 &lt;br&gt;
• URLs are nouns, not verbs. Use /users/123, not /getUser?id=123&lt;br&gt;
• Use HTTP methods correctly and consistently&lt;br&gt;
• Every request is stateless, it carries everything the server needs to process it&lt;br&gt;
• Structure is consistent: /users returns a list, /users/123 returns one record&lt;br&gt;
 &lt;br&gt;
&lt;em&gt;A well-designed REST API is one your teammates can read without a dictionary. A poorly designed one is a support ticket waiting to happen.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET    /users          // List all users
GET    /users/123      // Get one user
POST   /users          // Create a user
DELETE /users/123      // Remove a user
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: The HTTP Response
&lt;/h2&gt;

&lt;p&gt;The server has done its job. Now it sends back what it found or what went wrong. Every response has a status code, headers, and a body.&lt;br&gt;
 &lt;br&gt;
Status codes are the internet’s traffic lights. Every developer needs these internalized:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;td&gt;Range&lt;/td&gt;
        &lt;td&gt;Meaning&lt;/td&gt;
        &lt;td&gt;Key Codes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;2xx&lt;/td&gt;
        &lt;td&gt;Success&lt;/td&gt;
        &lt;td&gt;200 OK, 201 Created, 204 No Content&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;3xx&lt;/td&gt;
        &lt;td&gt;Redirect&lt;/td&gt;
        &lt;td&gt;301 Permanent, 302 Temporary, 304 Not Modified&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;4xx&lt;/td&gt;
        &lt;td&gt;Client Error&lt;/td&gt;
        &lt;td&gt;400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;5xx&lt;/td&gt;
        &lt;td&gt;Server Error&lt;/td&gt;
        &lt;td&gt;500 Internal Error, 502 Bad Gateway, 503 Unavailable&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One mistake that drives everyone mad: Returning 200 OK when an error occurs is one of the most common API mistakes. It breaks clients, breaks monitoring, and makes debugging painful. Return the right code every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Browser Rendering; Code into Pixels
&lt;/h2&gt;

&lt;p&gt;The response is sitting in your browser. It is raw HTML, CSS, and JavaScript. None of it is visible yet. What happens next is actually one of the most impressive things your computer does silently, several times a day.&lt;br&gt;
 &lt;br&gt;
The browser runs through the Critical Rendering Path:&lt;br&gt;
 &lt;br&gt;
• Parse HTML → build the DOM tree&lt;br&gt;
• Parse CSS → build the CSSOM&lt;br&gt;
• Combine into a Render Tree (visible elements only)&lt;br&gt;
• Layout: calculate exact positions and sizes for everything on the page&lt;br&gt;
• Paint and Composite: pixels hit the screen&lt;br&gt;
 &lt;br&gt;
JavaScript can interrupt this pipeline at any point. A 200kb render-blocking script sitting in the wrong place is the difference between a 0.5 second load and a 3 second one. On a 3G connection in Kano or Benin City, that delay is not a minor inconvenience. It is the difference between a user who waits and one who closes the tab.&lt;br&gt;
 &lt;br&gt;
HTML is the blueprint. CSS is the paint bucket. JavaScript is the interior designer rearranging furniture after the house is built. The browser does all of it in under 200 milliseconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Journey at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What Happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;DNS Lookup&lt;/td&gt;
&lt;td&gt;Domain name resolved to IP address&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;TCP Connection&lt;/td&gt;
&lt;td&gt;3-way handshake establishes reliable channel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;HTTP Request&lt;/td&gt;
&lt;td&gt;Browser sends method, headers, and body&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Client-Server&lt;/td&gt;
&lt;td&gt;Request routed through 3-tier architecture&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Server Processing&lt;/td&gt;
&lt;td&gt;Business logic runs; database queried; response built&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;HTTP Response&lt;/td&gt;
&lt;td&gt;Status code, headers, and payload returned&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Browser Rendering&lt;/td&gt;
&lt;td&gt;HTML, CSS, JS converted to pixels&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What’s Next?&lt;br&gt;
You now know what happens every time a user hits your app. DNS finds the address, TCP builds the connection, HTTP carries the message, your server does the work, and the browser makes it visible. Seven layers, half a second.&lt;br&gt;
 &lt;br&gt;
The next question is: what happens when a million users do all of that simultaneously? Look out for my next post.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>http</category>
      <category>webarchitecture</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
