Designing a Content Delivery Network (CDN) is a very popular system design interview question. It tests your understanding of distributed systems, caching, networking, scalability, trade-offs.
Let’s break it down slowly and clearly.
Why Do We Even Need a CDN?
Imagine this. Your backend server is deployed in one data center in the US.
Now your users are:
- In India
- In Europe
- In Australia
- In South America
Every time someone loads your website:
- The request travels thousands of kilometers.
- The server processes it.
- The response travels all the way back.
That distance causes:
- High latency
- Slow page loads
- Buffering videos
- Poor user experience
Now multiply that by millions of users.
Clearly, this won’t scale well.
So the idea is simple:
Instead of moving users closer to the server, move the content closer to the users.
That’s what a CDN does.
What Is a CDN?
A Content Delivery Network (CDN) is a distributed network of servers deployed across multiple geographic locations.
These servers:
- Cache content
- Serve users from nearby locations
- Reduce load on the main server (origin)
Think of it like:
Instead of one big supermarket in one city, we open smaller stores in every city. People buy from the nearest store.
Step 1: Clarify Requirements
Before drawing architecture, always clarify.
Functional Requirements
- Serve static content (images, CSS, JS, videos).
- Reduce latency for global users.
- Cache content at multiple geographic locations.
- Fetch content from origin on cache miss.
- Support cache invalidation.
Optional:
- Video streaming support
- Analytics (hit/miss ratio)
- DDoS protection
- TLS termination
Non-Functional Requirements
- Very low latency (milliseconds)
- High availability
- Massive scalability (billions of requests)
- Fault tolerance
- Cost efficiency
Step 2: High-Level Architecture
Here’s the overall idea:
User → DNS → Nearest Edge Server
↓
(Cache Hit?)
↓
Yes → Return content
No → Fetch from Origin → Cache → Return
Now let’s understand each component properly.
Step 3: Core Components Explained
DNS Routing
When a user enters www.example.com,
DNS does something smart.
Instead of returning the IP of the origin server,
it returns the IP of the closest edge server.
How does it decide?
- Geo-based routing (based on user location)
- Latency-based routing (based on network speed)
- Anycast routing (same IP announced globally)
The goal:
Send the user to the nearest server
Edge Servers (PoPs)
PoP = Point of Presence.
These are servers deployed in:
- Different cities
- Different countries
- Different continents
Each PoP:
- Stores cached content
- Serves users directly
- Reduces origin traffic
Now two cases happen.
Case 1: Cache Hit
The requested content is already stored in that edge server.
So:
Edge → Immediately returns content
This is fast.
Latency becomes extremely low.
Case 2: Cache Miss
The content is not available at the edge.
So:
- Edge requests content from origin.
- Origin sends content.
- Edge stores it locally.
- Edge returns it to the user.
Next user in that region gets it instantly.
Step 4: How Caching Works
Caching is the heart of CDN. Without caching, CDN has no meaning.
TTL (Time To Live)
Each cached object has a TTL.
Example:
- TTL = 1 hour
- For 1 hour, edge serves cached version.
- After expiry, edge fetches fresh version.
Trade-off:
- Long TTL → Better performance, but risk stale content
- Short TTL → Fresh content, but more load on origin
Cache Eviction Policies
Edge servers have limited memory.
When cache becomes full, we remove old content.
Common policies:
- LRU (Least Recently Used)
- LFU (Least Frequently Used)
Most systems use LRU because it’s simple and effective.
Step 5: Scaling the CDN
Let’s assume:
- 1 billion requests per day
- 95% cache hit ratio
That means:
Only 5% of traffic goes to origin. That’s massive load reduction.
To scale:
- Add more PoPs globally
- Horizontal scaling inside PoPs
- Use load balancers within edge clusters
- Use consistent hashing to distribute traffic
CDNs scale horizontally. Never vertically.
Step 6: Handling Failures
What if:
An edge server crashes? A whole region goes down?
Solutions:
- Health checks
- Automatic failover
- Traffic rerouting
- Multi-region redundancy
Users should not even notice failures.
High availability is critical.
Step 7: Cache Invalidation
If content changes at origin, how do we update it everywhere?
Two approaches:
- TTL-based expiration
- Active purge (invalidate via API)
Hard because:
- Data is distributed globally
- Consistency becomes challenging
- You must avoid serving stale content
There is no perfect solution. It’s always a trade-off.


Top comments (0)