shubham pandey (Connoisseur)

Posted on Mar 11

A System Design Deep Dive — Question by Question

#architecture #distributedsystems #interview #systemdesign

Introduction

A URL shortener seems deceptively simple — take a long URL, return a short one. But at scale, it hides some of the most fascinating distributed systems challenges in software engineering. This post walks through the real complexity, challenge by challenge.

Challenge 1: Scaling Under Heavy Traffic

Interview Question: When millions of users are simultaneously shortening URLs and millions more are clicking short links — how do you ensure the system stays fast and doesn't become a bottleneck?

The naive approach is a single server handling everything. The moment traffic spikes, you hit a wall. The fix is horizontal scaling — a load balancer distributes incoming requests across multiple application servers. But this raises an immediate follow-up: what about the database?

Key Insight: Horizontal scaling solves app-layer pressure, but the database becomes the next bottleneck if left as a single instance.

Challenge 2: The Read/Write Imbalance

Interview Question: If all application servers point to one single database for both reads and writes — what happens under heavy read traffic? Redirects outnumber URL creation by roughly 100:1.

A URL shortener is an extremely read-heavy system. For every person shortening a URL, roughly 100 people are clicking it. A single database will buckle under that read pressure. The solution is to treat reads and writes differently. Most reads are for the same popular URLs repeatedly — which is exactly what caching is built for.

Key Insight: Reads and writes have fundamentally different patterns and must be architected independently. Caching is the most powerful lever for read-heavy systems.

Challenge 3: Cache Misses and the Cold Start Problem

Interview Question: Your Redis cache is cold. You have 500 million unique short URLs — you can't cache all of them. What stays in cache, and what happens when a miss falls through to the database?

Even with caching, misses happen. Every miss hits the database. The database needs to be horizontally scalable too — which is why NoSQL databases like Cassandra or DynamoDB are popular here. They are designed to scale out across many nodes, handling reads across distributed partitions.

Key Insight: NoSQL provides horizontal scalability at the storage layer, acting as the safety net for cache misses at any scale.

Challenge 4: Choosing the Right Cache Eviction Strategy

Interview Question: Your cache is full. A new URL needs space. Which entry do you evict — and does your algorithm reflect real-world URL access patterns?

Each strategy alone falls short:

FIFO — evicts the oldest entry, ignores popularity and recency entirely
LFU — a viral URL from 3 months ago that is now dead stays in cache forever
LRU — a URL accessed 1M times but not hit in 2 hours gets evicted over a rarely accessed recent one

The optimal strategy combines both frequency and recency — evict the entry that is infrequently accessed AND hasn't been accessed recently. This is the principle behind W-TinyLFU, the algorithm Redis uses internally in production.

Key Insight: W-TinyLFU (hybrid LFU + LRU) is the gold standard for cache eviction, combining frequency and recency for smarter decisions.

Challenge 5: Unique ID Generation Across Distributed Nodes

Interview Question: Multiple application servers generate short codes simultaneously. How do you ensure no two servers generate the same short code for different URLs?

A central auto-increment counter seems obvious — but it becomes a single point of failure. Master-slave replication helps with availability, but async replication risks duplicate IDs being issued after a failover.

Follow-up: Can you design the system so each node generates IDs independently without coordinating on every request?

The elegant solution is Range-Based ID Allocation. The counter service hands each node a range (e.g., Node A gets 1–1000, Node B gets 1001–2000). Each node generates IDs independently from its range. When a node exhausts its range, it requests a new batch. Counter service is called infrequently — not in the hot path. If the counter service goes down briefly, nodes keep generating from their existing range. ID gaps don't matter — short codes are opaque to users anyway.

Key Insight: Range-based ID allocation decentralizes generation while maintaining global uniqueness — used by Twitter, Instagram, and many others at scale.

Challenge 6: 301 vs 302 Redirects — A Business Decision

Interview Question: 301 is a permanent redirect — browsers cache it, reducing server load. But what does 301 silently break for businesses using your service?

Once a browser caches a 301, it never contacts your servers again for that URL. Analytics die — you cannot track clicks, geography, device type, or referrer. URL updating breaks — if a business wants to change the destination mid-campaign, users with cached 301s will never see the update. 302 ensures every click hits your servers first. Yes, there is a small overhead — but for a service where analytics and flexibility are the core value proposition, 302 is the only sensible choice.

Key Insight: 302 preserves analytics and URL mutability — essential for businesses running campaigns. The slight latency cost is worth the business value.

Challenge 7: Malicious URL Protection

Interview Question: A bad actor shortens a phishing URL. Millions of users click it. How do you protect users — and what about URLs that were clean when shortened but become malicious later?

A purely reactive approach leaves a dangerous time window. The right approach is layered defense. At creation time, check against a 3rd party malicious URL database like Google Safe Browsing API before accepting the URL. Periodic re-scanning re-checks existing URLs regularly since clean URLs can turn malicious later. Reactive blocking via user reports and team verification acts as the final safety net.

Key Insight: Defense in depth shrinks the harmful time window dramatically. No single layer is enough on its own.

Challenge 8: The Thundering Herd / Cache Stampede

Interview Question: A celebrity tweets your short URL to 50 million followers simultaneously. The URL was just created — cache is cold. What happens to your database at that exact moment?

This is not a cold start problem — it is a Cache Stampede. Cold start means cache is empty and traffic arrives gradually so the database warms up slowly. Cache stampede means cache is empty AND millions hit simultaneously in one instant — the database gets obliterated in one shot.

Follow-up: How do you make only one request go to the database and make the rest wait for that result?

The solution is Cache Locking. The first request misses cache, sets an IN_PROGRESS flag in Redis, then goes to the database. All subsequent requests see IN_PROGRESS and wait. The first request returns, populates the cache, removes the flag, and all waiting requests are served from cache instantly. One database hit instead of one million.

Follow-up: What if the first request crashes after setting the flag but before populating the cache?

Give the IN_PROGRESS flag a TTL equal to the expected database response time plus a small buffer — for example 600ms if your DB responds in 500ms. If the request crashes, the flag expires automatically. No manual cleanup, no deadlocks.

Key Insight: Cache locking with TTL-based expiry prevents thundering herd without any risk of deadlock — a production pattern used at Facebook and Twitter scale.

Summary

App layer scaling — Horizontal scaling via load balancer
Database scaling — NoSQL with horizontal sharding
Cache eviction — W-TinyLFU hybrid combining frequency and recency
Unique ID generation — Range-based allocation across nodes
Counter availability — Infrequent calls plus node breathing room
Redirect strategy — 302 for analytics and URL flexibility
Malicious URLs — 3rd party scanning plus periodic recheck plus reactive blocking
Cache stampede — Cache locking with TTL-based expiry

Final Thoughts

A URL shortener is one of the most deceptively deep system design problems. On the surface it is a key-value store. Underneath it forces you to confront horizontal scaling, caching theory, distributed ID generation, HTTP semantics, security, and concurrency all at once. The most important skill is not knowing the answers — it is questioning your own assumptions. Every solution reveals a new edge case. That iterative thinking is what separates good system design from great system design.

Happy building.

DEV Community