Gabriel Anhaia

Posted on Jun 13

Design a URL Shortener: The Constraints Interviewers Actually Probe

#systemdesign #interview #scalability #backend

Book: System Design Pocket Guide: Interviews — 15 Real System Designs, Step by Step
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You drew two boxes and an arrow. Browser hits your API, API writes a row, API returns bit.ly/abc123. The interviewer nods and asks how you generate abc123. You say "hash the URL." They ask what happens when two different URLs hash to the same six characters. The whiteboard goes quiet.

A URL shortener looks like a starter question. It is the opposite. It is a starter question with five sharp edges, and the interviewer already knows which one they're going to push on. The boxes-and-arrows part takes thirty seconds. The rest of the hour is collisions, read amplification, custom aliases, and what the analytics path costs you.

What the system actually does

Strip it down. Two operations.

Write: take a long URL, return a short code. Read: take a short code, redirect to the long URL. That's it. Everything else (auth, analytics, expiry) hangs off those two.

The numbers that matter come from the read/write ratio. A shortener is read-heavy by a wide margin. People create a link once and it gets clicked thousands of times. A common framing is 100:1 reads to writes, sometimes far higher for a viral link. Say that ratio out loud early. It justifies every caching decision you make later, and the interviewer wants to hear you reason from it instead of bolting a cache on at the end.

Back-of-envelope, so you have a number to defend:

100M new URLs/day → ~1,160 writes/sec average.
At 100:1, ~116K reads/sec average, with peaks several times that.
Keep links 10 years → ~365B rows. At ~500 bytes/row, that's ~180TB. You are not fitting that on one box.

Those figures tell you two things before you draw anything: the read path needs a cache, and the store needs to shard.

Key generation: the part they actually probe

This is the heart of the question. Three approaches, and the interviewer wants to see you reject two of them for the right reasons.

Hash the URL (the answer they want you to outgrow)

Take MD5 or SHA-256 of the long URL, base62-encode it, keep the first 6-7 characters.

import hashlib

ALPHABET = (
    "0123456789"
    "abcdefghijklmnopqrstuvwxyz"
    "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
)

def base62(n: int) -> str:
    if n == 0:
        return ALPHABET[0]
    out = []
    while n > 0:
        n, rem = divmod(n, 62)
        out.append(ALPHABET[rem])
    return "".join(reversed(out))

def short_code(url: str, length: int = 7) -> str:
    digest = hashlib.sha256(url.encode()).digest()
    n = int.from_bytes(digest[:8], "big")
    return base62(n)[:length]

Two problems, and you raise both yourself. First, truncating a hash to 7 characters means collisions are not hypothetical. Birthday math says you'll hit your first 50% collision chance well before you exhaust the keyspace, because you threw away most of the hash. Second, hashing the same URL gives the same code, so two users shortening example.com get the same link and your click analytics merge into one bucket. Sometimes you want that. Usually you don't.

Counter plus base62 (the one that scales)

Keep a global monotonic counter. Each new URL gets the next integer. Base62-encode the integer. No hashing, no collisions, ever, because integers are unique by construction.

def encode_id(counter_value: int) -> str:
    return base62(counter_value)

# id 1            -> "1"
# id 1_000_000    -> "4c92"
# id 56_800_235   -> "djYUR"

62^7 is about 3.5 trillion codes, which covers a decade of growth. The catch is the counter itself becoming a single point of contention at 1,160 writes/sec. You don't want every write fighting over one row's AUTO_INCREMENT.

Fix it with ranges. A coordination service (ZooKeeper, etcd, or a single dedicated table) hands each application server a block of 1,000 IDs at a time. The server burns through its block locally, then asks for the next one. Contention drops by 1,000x. The cost is that crashed servers leak their unused range, so your codes have gaps. Gaps are fine. Nobody promised dense codes.

Pre-generated key pool

A separate service generates random unused 7-char keys offline, stores them in a "available keys" table, and the write path just pops one. Generation and assignment are decoupled, so the write path never computes anything. The tradeoff is an extra service to operate and a table to keep topped up.

For the interview, lead with counter-plus-base62 and ranges. It's the cleanest "no collisions by construction" story, and it shows you understand the difference between avoiding collisions and detecting them.

Collisions: how you handle them tells the level

If you went with hashing, you must handle collisions, and how you do it separates a mid-level answer from a senior one.

The naive version: generate a code, INSERT, catch the unique-constraint violation, regenerate, retry. That works but it does a read-then-write race unless your uniqueness lives in the database constraint. Put a UNIQUE index on the code column and let the database be the arbiter. Never check-then-insert in application code; two requests will both pass the check and one will overwrite the other.

CREATE TABLE urls (
    code        VARCHAR(7) PRIMARY KEY,
    long_url    TEXT NOT NULL,
    created_at  TIMESTAMP DEFAULT now()
);
-- code is the PK, so a dup insert fails atomically.
-- App retries with a new code on conflict.

The senior move is to say: "I'd rather not detect collisions at all. Counter-based IDs make them impossible, so I'd spend my reliability budget elsewhere." That reframes the question from "how do you recover" to "how do you avoid the failure mode entirely." Interviewers remember that.

The read path is the whole game

116K reads/sec, climbing on viral links. Every read is a code lookup followed by a 301/302 redirect. This is where the read-heavy ratio you stated earlier pays off.

Cache aggressively. The access pattern is power-law: a small set of codes gets the overwhelming majority of clicks. An LRU cache in Redis fronting the database absorbs almost all of that traffic.

def resolve(code: str) -> str | None:
    cached = redis.get(f"u:{code}")
    if cached is not None:
        return cached
    row = db.fetch_one(
        "SELECT long_url FROM urls WHERE code = %s",
        (code,),
    )
    if row is None:
        return None
    # cache for an hour; links rarely change
    redis.set(f"u:{code}", row.long_url, ex=3600)
    return row.long_url

Two interview follow-ups live here. First: 301 versus 302. A 301 (permanent) lets browsers and proxies cache the redirect, which slashes your read load, but you lose per-click analytics because repeat clicks never reach your server. A 302 (temporary) sends every click through you, so you keep analytics but eat the full read volume. The choice depends on whether analytics matter more than load. Name the tradeoff; don't just pick one.

Second: cache misses on a code that doesn't exist. If someone hammers random codes, every miss falls through to the database. That's a cache-penetration attack. Cache the negative result (a short TTL on "not found") or front it with a bloom filter of known codes so misses get rejected before they touch storage.

Custom aliases and the namespace collision

The interviewer will ask: "What if a user wants bit.ly/black-friday?" Now you have two code spaces fighting for the same table.

Custom aliases are user-chosen, so they can collide with each other and with generated codes. The clean answer is one namespace with a UNIQUE constraint and an is_custom flag. A custom-alias request tries to claim the code; if the insert fails on the unique index, you return "alias taken" instead of retrying. Generated codes never collide because the counter guarantees uniqueness, but a generated code could in theory land on a string a user wanted. Keep custom aliases in a length range generated codes won't reach (say, 8+ characters, or a reserved prefix) so the two spaces never overlap.

Reserve a blocklist too. You do not want a user claiming admin, api, login, or anything that shadows your own routes.

Analytics without slowing the redirect

The last edge: "How do you count clicks?" The wrong answer is incrementing a row on every redirect. That puts a write on your hottest read path and turns a 1ms redirect into a database round trip.

Decouple it. The redirect path fires an event and returns immediately. A separate pipeline consumes those events.

def redirect(code: str):
    long_url = resolve(code)
    if long_url is None:
        return Response(status=404)
    # fire-and-forget; do not block the redirect
    event_queue.publish({
        "code": code,
        "ts": time.time(),
        "ua": request.headers.get("user-agent"),
        "ref": request.headers.get("referer"),
    })
    return redirect_response(long_url, status=302)

The events land in Kafka (or Kinesis, or SQS), get aggregated downstream, and roll up into per-code counts on a schedule. The redirect never waits for the count. If exact real-time click counts matter, that's a different system and you'd say so, but for a shortener "approximately N clicks, updated every minute" is what everyone actually ships.

This is also where the 301 decision comes back to bite. If you serve permanent redirects, browsers cache them and you stop seeing repeat clicks entirely. You cannot have both maximum cacheability and complete analytics. Pick one per use case.

The 90-second answer (rehearse this one)

When they say "design a URL shortener," start talking before you draw:

"Two operations: shorten and redirect. It's read-heavy, roughly 100:1, so the read path drives the design. At 100M new URLs a day that's about 1,200 writes/sec and over 100K reads/sec at peak, and ten years of links is hundreds of terabytes, so the store shards and the read path caches.

For codes I'd use a counter plus base62, not hashing. Each app server leases a block of 1,000 IDs from a coordination service and encodes them, so there are no collisions by construction and no contention on a single counter. A unique constraint on the code column is the safety net.

Read path: Redis LRU in front of a sharded key-value store, keyed by code. Power-law access means the cache absorbs almost everything. I'd cache negative lookups too, to kill cache-penetration on random codes.

Redirect is a 302 if I need per-click analytics, 301 if I'd rather let browsers cache it and shed load. Analytics go through a fire-and-forget event to Kafka, aggregated offline, so counting never slows the redirect.

Custom aliases share one namespace with a unique index and a reserved length range so they can't collide with generated codes, plus a blocklist for reserved words.

Failure modes: counter ranges leak on crash, which just leaves gaps in codes; cache misses on bad codes need negative caching or a bloom filter; and 301 versus 302 is a real tradeoff between load and analytics, not a default."

That's the whole thing in 90 seconds. It hits key generation, collisions, caching, redirect semantics, custom aliases, and analytics, and it names the tradeoffs instead of hiding them. The interviewer can push on any one of those for the next forty minutes and you have a coherent position to defend.

The follow-ups that decide the loop

A few questions show up almost every time. Have an answer ready:

"Why not just use the database auto-increment as the ID?" You can, but a single auto-increment column is a write bottleneck and it ties your code space to one database. Range-leasing decouples ID generation from the row insert.
"How do you delete or expire links?" Add a TTL column, run a background sweeper, and evict the cache entry on delete. Don't reuse expired codes unless you've thought hard about someone bookmarking the old target.
"How do you prevent your shortener from redirecting to malware?" Check submitted URLs against a safe-browsing list at write time, and re-check periodically since a benign URL can turn hostile later. This is a real product concern, not a footnote.
"What's the cost?" Most of your spend is the read tier (cache nodes plus a sharded store), not the write path. Estimate the cache footprint from your hot-key set, not the full keyspace.

What's the worst shortener bug you've watched ship? The one where someone hashed the URL, truncated to four characters, and discovered collisions in the first week of production? Drop it in the comments. The folklore is half the reason these threads are worth reading.

If this was useful

The thing that wins this question isn't the diagram, it's the layered story: read/write ratio first, then key generation, then the cache, then the redirect semantics, then analytics, each with its tradeoff named out loud. The System Design Pocket Guide: Interviews walks through 15 of these designs end to end, including the key-generation patterns and the read-heavy caching tricks that turn "design X" from a scramble into a script you can recite under pressure.