System design interviews are the biggest differentiator between mid-level and senior engineering roles. They test whether you can think about systems holistically: scalability, reliability, trade-offs, and real-world constraints.
The problem is that most engineers study by memorizing specific system designs (URL shortener, chat app, etc.) without understanding the underlying patterns. When they get a question they haven't seen, they freeze.
This guide takes a different approach. It teaches you a repeatable framework and the core building blocks, so you can design any system on the spot.
The Framework: How to Structure Your Answer
Every system design interview should follow this structure. Internalize it.
Step 1: Clarify Requirements (3-5 minutes)
Before designing anything, ask questions. This shows maturity and prevents wasted effort.
Functional requirements:
- What are the core features?
- Who are the users?
- What are the inputs/outputs?
Non-functional requirements:
- What's the expected scale? (users, requests/sec, data volume)
- What are the latency requirements?
- Is availability or consistency more important?
- What's the read/write ratio?
Example for "Design Twitter":
Functional:
- Post tweets (text, 280 chars)
- Follow/unfollow users
- View home timeline (tweets from followed users)
- Search tweets
Non-functional:
- 500M users, 200M DAU
- ~600 tweets/sec writes, ~600K reads/sec
- Timeline latency < 200ms
- Availability > consistency (eventual consistency OK)
- Read-heavy: ~1000:1 read/write ratio
Step 2: High-Level Design (5-10 minutes)
Draw the major components and how data flows between them:
Client → Load Balancer → API Gateway → Services
│
┌───────────┼───────────┐
▼ ▼ ▼
Tweet Service Timeline User Service
│ Service │
▼ │ ▼
Tweet DB ▼ User DB
│ Cache Layer
▼ (Timeline Cache)
Message Queue
│
▼
Fan-out Service
Step 3: Deep Dive (15-20 minutes)
Pick the most critical components and design them in detail. The interviewer will guide you, but be prepared to dive into:
- Database schema and choice
- API design
- Scaling strategy
- Caching approach
- Failure handling
Step 4: Trade-offs and Bottlenecks (5 minutes)
Discuss what could break, what you'd monitor, and alternative approaches.
Core Building Blocks You Must Know
These are the Lego pieces of system design. Learn these deeply, and you can assemble any system.
1. Load Balancing
Distributes traffic across servers. Know the algorithms:
| Algorithm | When to Use |
|---|---|
| Round Robin | Equal server capacity, stateless services |
| Weighted Round Robin | Mixed server capacities |
| Least Connections | Long-lived connections (WebSocket) |
| IP Hash | Session affinity needs |
| Consistent Hashing | Distributed caches, database sharding |
Key point: L4 (TCP) vs L7 (HTTP) load balancing. L7 can route based on content (URL path, headers) but adds latency. L4 is faster but dumber.
2. Caching
Caching is in every system design answer. Know the patterns:
Cache-Aside (Lazy Loading):
1. App checks cache
2. Cache miss → read from DB
3. Write result to cache
4. Return to client
Write-Through:
1. App writes to cache
2. Cache writes to DB
3. Return to client
Write-Behind (Write-Back):
1. App writes to cache
2. Cache async writes to DB (batched)
3. Return to client immediately
When to use what:
- Cache-aside: Default choice. Works for read-heavy workloads.
- Write-through: When you can't afford cache misses on recently written data.
- Write-behind: High write throughput, OK with some data loss risk.
Cache invalidation strategies:
- TTL (Time-To-Live): Simple, eventual consistency. Set TTL = acceptable staleness.
- Event-based: Invalidate on write. More complex but fresher data.
- Version tags: Include version in cache key. New version = automatic miss.
3. Database Selection
| Requirement | Database Type | Examples |
|---|---|---|
| Structured data, ACID | Relational | PostgreSQL, MySQL |
| Flexible schema, high write | Document | MongoDB, DynamoDB |
| Social graphs, relationships | Graph | Neo4j, Amazon Neptune |
| Time-series metrics | Time-series | InfluxDB, TimescaleDB |
| Full-text search | Search engine | Elasticsearch, OpenSearch |
| Session data, leaderboards | Key-Value | Redis, Memcached |
| Wide-column, massive scale | Column-family | Cassandra, HBase |
4. Database Scaling Patterns
Vertical scaling — Bigger machine. Simple but has a ceiling.
Read replicas — Primary handles writes, replicas handle reads. Works for read-heavy workloads.
Sharding — Split data across multiple databases by a shard key.
Shard by user_id:
user_id % 4 = 0 → Shard A
user_id % 4 = 1 → Shard B
user_id % 4 = 2 → Shard C
user_id % 4 = 3 → Shard D
Problems with naive sharding:
- Hot shards (uneven distribution)
- Cross-shard queries are expensive
- Rebalancing when adding shards
Better: Consistent hashing with virtual nodes
5. Message Queues
Decouple producers from consumers. Essential for async processing.
Producer → Queue → Consumer
Use cases:
- Order processing (place order → queue → payment → queue → fulfillment)
- Notifications (event → queue → email/push/SMS services)
- Data pipelines (change event → queue → downstream processing)
Key concepts:
- At-least-once delivery (most common)
- Exactly-once semantics (harder, Kafka supports it)
- Dead letter queues (failed messages go here)
- Message ordering (per-partition in Kafka)
6. The CAP Theorem (Practical Version)
In a distributed system during a network partition, you must choose:
- CP (Consistency + Partition tolerance): Every read gets the most recent write, but some requests may fail. (Banking, inventory)
- AP (Availability + Partition tolerance): Every request gets a response, but it might be stale. (Social media feeds, DNS)
In practice, most systems pick AP for user-facing reads and CP for critical writes.
7. Rate Limiting
Protect services from abuse and cascading failures.
Algorithms:
1. Token Bucket — Allows bursts, smooth average rate
2. Sliding Window — Precise, more memory
3. Fixed Window — Simple, edge-case bursts at window boundaries
4. Leaky Bucket — Constant output rate, good for APIs
Where to implement:
- API Gateway (global rate limiting)
- Per-service (service-specific limits)
- Per-user/API-key (fairness)
Practice Problems with Solution Outlines
Problem 1: Design a URL Shortener
Requirements: 100M URLs/day, 1000:1 read/write, < 10ms redirect latency
Key decisions:
- ID generation: Base62 encoding of auto-increment or snowflake ID. 7 chars = 3.5 trillion URLs.
- Storage: Key-value store (Redis for hot URLs, DynamoDB for persistence)
- Caching: Cache-aside with Redis. Most URLs follow Zipf distribution (top 20% get 80% traffic)
- Read path: Cache → DB → 301/302 redirect
- Analytics: Async via Kafka → Analytics service
Problem 2: Design a Notification System
Requirements: Multi-channel (push, email, SMS, in-app), 100M notifications/day, prioritization
Key decisions:
- Architecture: Event-driven with priority queues
- Queue design: Separate queues per channel, priority levels within each
- Rate limiting: Per-user per-channel to prevent spam
- Template engine: Pre-compiled templates with variable substitution
- Delivery tracking: State machine (created → queued → sent → delivered → read)
- Failure handling: Exponential backoff with max retries, DLQ for investigation
Problem 3: Design a Distributed Cache
Requirements: Sub-millisecond latency, 1TB data, fault-tolerant
Key decisions:
- Partitioning: Consistent hashing with virtual nodes
- Replication: Each partition replicated to 3 nodes
- Consistency: Eventually consistent reads, quorum writes (W + R > N)
- Eviction: LRU per node, with global TTL
- Hot key handling: Local caching on client, key splitting
12-Week Study Plan
| Week | Focus Area | Practice Problem |
|---|---|---|
| 1-2 | Scaling fundamentals, load balancing, caching | URL Shortener |
| 3-4 | Database design, SQL vs NoSQL, sharding | Instagram/Twitter |
| 5-6 | Message queues, async processing | Notification System |
| 7-8 | Real-time systems, WebSockets, pub/sub | Chat Application |
| 9-10 | Search systems, indexing, ranking | Search Engine |
| 11-12 | Distributed systems, consensus, replication | Distributed Cache |
Daily Practice Routine
- Morning (30 min): Review one building block concept in depth
- Evening (60 min): Practice one design problem end-to-end
- Weekend (2 hours): Mock interview with a peer or recording
Mistakes That Sink Interviews
- Jumping to the solution — Always clarify requirements first. The interviewer is testing your process.
- Not doing back-of-envelope math — "How many servers do we need?" You should be able to estimate.
- Ignoring failure modes — "What happens when this component fails?" Always address this.
- Over-engineering — Start simple, then add complexity as needed. Don't design for Google scale if the requirements say 10K users.
- Not discussing trade-offs — There is no perfect design. Every choice has a cost. Articulate it.
Back-of-Envelope Calculations Cheat Sheet
Useful numbers:
- 1 day = ~100K seconds (86,400)
- 1 year = ~30M seconds
- QPS from daily users: DAU × avg_requests / 86400
- Storage: items × size × retention_period
- Bandwidth: QPS × avg_response_size
Example: Twitter timeline reads
- 200M DAU, each refreshes 10x/day
- QPS = 200M × 10 / 86400 ≈ 23K QPS
- Peak = 2-3× average ≈ 60K QPS
Summary
System design interviews test three things:
- Can you break down ambiguous problems? (Requirements gathering)
- Do you know the building blocks? (Technical knowledge)
- Can you make and defend trade-offs? (Engineering judgment)
Master the framework, deeply understand 6-8 building blocks, and practice 10-15 problems. That's the formula.
Accelerate Your Interview Prep
Studying system design from scattered blog posts is inefficient. The System Design Cheat Sheets from Interview Prep Pro give you 50+ architecture diagrams covering real-world systems, with the exact patterns interviewers look for.
The full Interview Prep Pro collection includes 11 products: system design guides, behavioral question banks, coding patterns, resume templates, salary negotiation playbooks, and a 90-day study tracker.
Use code LAUNCH40 for 40% off, or STUDENT for 50% off (student email required).
Top comments (0)