URL Shortener Design: The Capacity Math 90% of Candidates Get Wrong

#programming #database #tutorial #architecture

Book: System Design Pocket Guide: Interviews
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

The URL shortener is the warm-up question. Candidates still bomb it the same way every loop. They jump to "use Redis and base62" before doing the math, then the interviewer asks "how many shards?" and the whiteboard goes quiet.

The trap is that the design is easy and the numbers are not. If you cannot defend "200 writes per second, 20K reads per second, 12 TB over five years" with a calculator, your hash-collision discussion is wallpaper.

Pin the numbers before the boxes

You're designing a TinyURL clone. The interviewer waves a hand at "consumer scale." You commit to a number and check it.

The numbers below are illustrative textbook capacity assumptions, not measured production figures from a real shortener. Say 500M new short URLs per month. That's the spec the textbook problems use and the one most interviewers expect (DesignGurus walkthrough). From there, every other number falls out.

Write QPS. 500M / (30 days × 86,400 s) = ~193 writes/sec. Round to 200 writes/sec average, 1,000/sec peak (5x burst factor for daytime spikes).

Read QPS. Reads dominate writes (a roughly 100:1 ratio is the textbook rule of thumb for this product). 200 × 100 = 20K reads/sec average, 100K/sec peak. That ratio decides every cache and replica decision downstream.

Storage. Five-year retention. 500M/month × 60 months = 30B records. Each row holds short code (7 bytes), long URL (avg 100 bytes), user ID, timestamps, click counter (call it 200 bytes). 30B × 200 B = 6 TB of row data, roughly 12 TB with indexes and replication overhead (rule of thumb: index plus replica copy doubles raw row storage).

Bandwidth. Writes: 200/sec × 200 bytes = 40 KB/sec, trivial. Reads: 20K/sec × 500-byte redirect response = 10 MB/sec, 80 Mbit/sec outbound. Still fits on a single edge node, but the math shows where a CDN starts paying back.

The reason candidates skip this is that the calculations look like busywork. They are not. Every architectural decision after this is a function of the numbers above: sharding, cache size, replication strategy. If you don't have them, you're guessing.

Pick an ID strategy that scales without coordinating

You have a long URL. You need a 7-character short code. There are three serious options.

Hash and truncate. MD5 the long URL, take the first 7 base62 characters. Deterministic (same URL always maps to the same code), but collisions force an "is this the same URL or a different one that hashed there?" lookup on every write. Custom aliases also become awkward.

Random base62. Generate 7 random characters from [A-Za-z0-9]. 62^7 = 3.5 trillion combinations. With 30B records, the birthday-style collision probability is low but nonzero, so you still need a uniqueness check on insert. Easy to scale, no coordination, but writes do an extra round trip on collision.

Counter + base62 encode. Maintain a global 64-bit counter, encode it base62 into 7 characters. Zero collisions by construction. The problem is the counter: a single Postgres BIGSERIAL becomes a write hotspot at 1,000 writes/sec.

The production answer is counter + base62 with batch allocation. Each app instance pulls 1,000 IDs at a time from a Redis INCRBY, then hands them out locally. Redis comfortably handles tens of thousands of INCR ops/sec on a single node (illustrative; tune to your benchmarks), and you only hit it once per 1,000 writes per app instance. If a process crashes mid-batch, the wasted IDs cost nothing. 3.5 trillion is a lot of headroom.

Here is the 30-line generator.

import string
import threading
import redis

ALPHABET = string.digits + string.ascii_letters  # 0-9a-zA-Z, base62
BATCH = 1000

class IDGenerator:
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
        self.lock = threading.Lock()
        self.next_id = 0
        self.end_id = 0

    def _refill(self) -> None:
        end = self.redis.incrby("urls:id_counter", BATCH)
        self.next_id = end - BATCH + 1
        self.end_id = end

    def _encode(self, n: int) -> str:
        if n == 0:
            return ALPHABET[0]
        out = []
        while n:
            n, rem = divmod(n, 62)
            out.append(ALPHABET[rem])
        return "".join(reversed(out)).rjust(7, "0")

    def next_short_code(self) -> str:
        with self.lock:
            if self.next_id > self.end_id:
                self._refill()
            n = self.next_id
            self.next_id += 1
        return self._encode(n)

Each instance reserves 1,000 IDs in one Redis call, encodes locally, and stays out of the critical path. Run a hundred app instances at 1,000 writes/sec total — Redis sees one call per second on the counter key.

Pick the database for the read pattern, not the write pattern

The trap here is shape, not size. 12 TB fits on Postgres. 12 TB also fits on DynamoDB or Cassandra. The question is what the read looks like.

Reads are point lookups by short code. Always. The redirect endpoint receives /abc1234, fetches the long URL, returns 301. There is no range scan, no join, no analytics on the hot path.

For that pattern, a key-value store wins on throughput per dollar. DynamoDB or Cassandra with short_code as the partition key handles 100K reads/sec without breaking sweat, replicates across regions for free, and scales horizontally as the table grows.

If the interviewer pushes Postgres (many will, because "you said 12 TB and Postgres handles that"), defend it like this: a single primary with five read replicas serves 100K reads/sec on a hashed primary key index. The schema is two columns plus metadata. You pay for the relational machinery you don't use, but the operational story is simpler than running Cassandra. The honest answer at 30B rows is that either ships, and the right choice depends on the team's existing infra. Don't pick the technology before you know what's already on the org chart.

What you do not want is MongoDB. The B-tree index on a UUID-shaped string at 30B rows is painful, and the document model adds nothing — your row is two scalars.

Cache hot URLs or the database wins

A small fraction of short codes get most of the traffic. Marketing campaigns, viral tweets, the link a celebrity posted at 9pm. As a rule of thumb, expect a small fraction of codes to serve most redirects (Pareto-style distribution; the 1%-serves-90% framing is illustrative).

You put Redis in front of the database with a simple lookup pattern: cache hit returns the long URL, cache miss reads from the database and populates the cache. TTL of 24 hours catches the day's hot links and lets cold ones fall out.

def get_long_url(redis_client, db, short_code: str) -> str | None:
    cached = redis_client.get(f"url:{short_code}")
    if cached:
        return cached.decode("utf-8")

    row = db.fetch_one(
        "SELECT long_url FROM urls WHERE short_code = %s",
        (short_code,),
    )
    if row is None:
        return None

    redis_client.setex(
        f"url:{short_code}", 86400, row["long_url"]
    )
    return row["long_url"]

20 lines. The interesting parts are what's missing. No cache invalidation on writes: short codes are immutable, so cache and source can never diverge. No cache stampede protection on the basic version, but you add SETNX with a short lock TTL if a single hot code is missing. Negative caching matters too: 404s should sit in Redis with a short TTL so a typo storm doesn't translate into a database storm.

Under those Pareto assumptions, cache hit rate lands around 95% (illustrative, not a measured benchmark). That cuts database read load from 100K/sec to 5K/sec, which a single Postgres replica handles without breathing hard.

The three follow-ups interviewers always ask

You've covered the basics. The interviewer leans back. They have three questions queued. Knowing them ahead of time turns the second half of the interview into a victory lap.

1. Custom aliases. "What if the user wants /my-cool-link?" The naive answer is "check uniqueness, insert." The good answer separates the namespace: custom aliases live in the same table but have a flag; on write, you do a SELECT ... FOR UPDATE to avoid two users grabbing the same alias at the same time. The great answer notes that custom aliases break the counter strategy. They collide with the auto-generated namespace unless you reserve a prefix or use two separate keyspaces. Most candidates miss this.

2. Analytics on click-through. "How do you count clicks without slowing down redirects?" Don't write to the URL row on every redirect. Emit a click event to Kafka in fire-and-forget mode, batch-process into an analytics table (ClickHouse, BigQuery, or a Postgres table you partition by day). The redirect path stays at one cache lookup. The interviewer wants to hear "I separate the read path from the analytics path" without prompting.

3. Link expiration and abuse. "What about links that need to expire, or links that point to malware?" Expiration is a expires_at column and a periodic sweep job that nulls out the cache key. Abuse is harder: you build a disabled flag and a worker that scans new URLs against a phishing/malware feed (Google Safe Browsing, abuse-reporting webhooks). The redirect path checks the flag on cache populate. None of this is sexy, but every URL shortener that lasts longer than a year deals with it.

As a rule of thumb, most candidates lose this question on the first sub-bullet. The math is the foundation, and it takes 90 seconds of arithmetic. Do it and you're already past the bar.

If this was useful

The URL shortener is one of 15 designs walked end-to-end in System Design Pocket Guide: Interviews: capacity math, ID strategy, database choice, cache, and the follow-ups, in the same shape as this post. If you've got an interview loop in the next month and want a checklist that survives contact with a whiteboard, it's the format.