87% of engineers who fail system design interviews don't fail on technical knowledge — they fail on structure. They know what a load balancer does, they understand caching, but the moment someone says "Design Twitter," they freeze and start drawing boxes at random.
This guide fixes that. Instead of memorizing 30 specific system designs, you'll learn a repeatable framework and the core building blocks so you can design any system on the spot.
The Framework: How to Structure Your Answer
Every system design interview should follow this four-step structure. Internalize it until it's muscle memory.
Step 1: Clarify Requirements (3–5 minutes)
Before designing anything, ask questions. This demonstrates engineering maturity and prevents wasted effort.
Functional requirements:
- What are the core features?
- Who are the users?
- What are the inputs and outputs?
Non-functional requirements:
- What's the expected scale? (users, requests/sec, data volume)
- What are the latency requirements?
- Is availability or consistency more important?
- What's the read/write ratio?
Example for "Design Twitter":
Functional:
- Post tweets (text, 280 chars)
- Follow/unfollow users
- View home timeline (tweets from followed users)
- Search tweets
Non-functional:
- 500M users, 200M DAU
- ~600 tweets/sec writes, ~600K reads/sec
- Timeline latency < 200ms
- Availability > consistency (eventual consistency OK)
- Read-heavy: ~1000:1 read/write ratio
Step 2: High-Level Design (5–10 minutes)
Draw the major components and how data flows between them:
Client → Load Balancer → API Gateway → Services
│
┌───────────┼───────────┐
▼ ▼ ▼
Tweet Service Timeline User Service
│ Service │
▼ │ ▼
Tweet DB ▼ User DB
│ Cache Layer
▼ (Timeline Cache)
Message Queue
│
▼
Fan-out Service
Step 3: Deep Dive (15–20 minutes)
Pick the most critical components and design them in detail. The interviewer will guide you, but be prepared to dive into:
- Database schema and choice
- API design
- Scaling strategy
- Caching approach
- Failure handling
Step 4: Trade-offs and Bottlenecks (5 minutes)
Discuss what could break, what you'd monitor, and alternative approaches you considered.
Core Building Blocks You Must Know
These are the Lego pieces of system design. Learn these deeply, and you can assemble any system.
1. Load Balancing
Distributes traffic across servers. Know the algorithms:
| Algorithm | When to Use |
|---|---|
| Round Robin | Equal server capacity, stateless services |
| Weighted Round Robin | Mixed server capacities |
| Least Connections | Long-lived connections (WebSocket) |
| IP Hash | Session affinity needs |
| Consistent Hashing | Distributed caches, database sharding |
Key point: L4 (TCP) vs L7 (HTTP) load balancing. L7 can route based on content (URL path, headers) but adds latency. L4 is faster but less flexible.
2. Caching
Caching appears in every system design answer. Know the patterns:
Cache-Aside (Lazy Loading):
1. App checks cache
2. Cache miss → read from DB
3. Write result to cache
4. Return to client
Write-Through:
1. App writes to cache
2. Cache writes to DB
3. Return to client
Write-Behind (Write-Back):
1. App writes to cache
2. Cache async writes to DB (batched)
3. Return to client immediately
When to use what:
- Cache-aside: Default choice. Works for read-heavy workloads.
- Write-through: When you can't afford cache misses on recently written data.
- Write-behind: High write throughput needed, acceptable risk of data loss.
Cache invalidation strategies:
- TTL (Time-To-Live): Simple, eventual consistency. Set TTL to match acceptable staleness.
- Event-based: Invalidate on write. More complex but data stays fresher.
- Version tags: Include version in cache key. New version = automatic miss.
3. Database Selection
| Requirement | Database Type | Examples |
|---|---|---|
| Structured data, ACID | Relational | PostgreSQL, MySQL |
| Flexible schema, high write | Document | MongoDB, DynamoDB |
| Social graphs, relationships | Graph | Neo4j, Amazon Neptune |
| Time-series metrics | Time-series | InfluxDB, TimescaleDB |
| Full-text search | Search engine | Elasticsearch, OpenSearch |
| Session data, leaderboards | Key-Value | Redis, Memcached |
| Wide-column, massive scale | Column-family | Cassandra, HBase |
4. Database Scaling Patterns
Vertical scaling — Bigger machine. Simple but has a ceiling.
Read replicas — Primary handles writes, replicas handle reads. Works for read-heavy workloads.
Sharding — Split data across multiple databases by a shard key.
Shard by user_id:
user_id % 4 = 0 → Shard A
user_id % 4 = 1 → Shard B
user_id % 4 = 2 → Shard C
user_id % 4 = 3 → Shard D
Problems with naive sharding:
- Hot shards (uneven distribution)
- Cross-shard queries are expensive
- Rebalancing when adding shards
Better: Consistent hashing with virtual nodes
5. Message Queues
Decouple producers from consumers. Essential for async processing.
Producer → Queue → Consumer
Use cases:
- Order processing (place order → queue → payment → queue → fulfillment)
- Notifications (event → queue → email/push/SMS services)
- Data pipelines (change event → queue → downstream processing)
Key concepts:
- At-least-once delivery (most common)
- Exactly-once semantics (harder, Kafka supports it)
- Dead letter queues (failed messages go here)
- Message ordering (per-partition in Kafka)
6. The CAP Theorem (Practical Version)
In a distributed system during a network partition, you must choose:
- CP (Consistency + Partition tolerance): Every read gets the most recent write, but some requests may fail. Use for banking and inventory.
- AP (Availability + Partition tolerance): Every request gets a response, but it might be stale. Use for social media feeds and DNS.
In practice, most systems pick AP for user-facing reads and CP for critical writes.
7. Rate Limiting
Protect services from abuse and cascading failures.
Algorithms:
1. Token Bucket — Allows bursts, smooth average rate
2. Sliding Window — Precise, more memory
3. Fixed Window — Simple, edge-case bursts at window boundaries
4. Leaky Bucket — Constant output rate, good for APIs
Where to implement:
- API Gateway (global rate limiting)
- Per-service (service-specific limits)
- Per-user/API-key (fairness)
Practice Problems with Solution Outlines
Problem 1: Design a URL Shortener
Requirements: 100M URLs/day, 1000:1 read/write, < 10ms redirect latency
Key decisions:
- ID generation: Base62 encoding of auto-increment or snowflake ID. 7 chars = 3.5 trillion URLs.
- Storage: Key-value store (Redis for hot URLs, DynamoDB for persistence).
- Caching: Cache-aside with Redis. Most URLs follow Zipf distribution (top 20% get 80% of traffic).
- Read path: Cache → DB → 301/302 redirect.
- Analytics: Async via Kafka → Analytics service.
Problem 2: Design a Notification System
Requirements: Multi-channel (push, email, SMS, in-app), 100M notifications/day, prioritization
Key decisions:
- Architecture: Event-driven with priority queues.
- Queue design: Separate queues per channel, priority levels within each.
- Rate limiting: Per-user per-channel to prevent notification fatigue.
- Template engine: Pre-compiled templates with variable substitution.
- Delivery tracking: State machine (created → queued → sent → delivered → read).
- Failure handling: Exponential backoff with max retries, DLQ for investigation.
Problem 3: Design a Distributed Cache
Requirements: Sub-millisecond latency, 1TB data, fault-tolerant
Key decisions:
- Partitioning: Consistent hashing with virtual nodes.
- Replication: Each partition replicated to 3 nodes.
- Consistency: Eventually consistent reads, quorum writes (W + R > N).
- Eviction: LRU per node with global TTL.
- Hot key handling: Local caching on client, key splitting.
12-Week Study Plan
| Week | Focus Area | Practice Problem |
|---|---|---|
| 1–2 | Scaling fundamentals, load balancing, caching | URL Shortener |
| 3–4 | Database design, SQL vs NoSQL, sharding | Instagram/Twitter |
| 5–6 | Message queues, async processing | Notification System |
| 7–8 | Real-time systems, WebSockets, pub/sub | Chat Application |
| 9–10 | Search systems, indexing, ranking | Search Engine |
| 11–12 | Distributed systems, consensus, replication | Distributed Cache |
Daily Practice Routine
- Morning (30 min): Review one building block concept in depth.
- Evening (60 min): Practice one design problem end-to-end.
- Weekend (2 hours): Mock interview with a peer or self-record and review.
Mistakes That Sink Interviews
- Jumping to the solution — Always clarify requirements first. The interviewer is evaluating your process as much as your answer.
- Skipping back-of-envelope math — "How many servers do we need?" You should be able to estimate within an order of magnitude.
- Ignoring failure modes — "What happens when this component goes down?" Always address this proactively.
- Over-engineering — Start simple, then add complexity as requirements demand. Don't design for Google scale when the prompt says 10K users.
- Not discussing trade-offs — There is no perfect design. Every choice has a cost. The best candidates articulate these clearly.
Back-of-Envelope Calculations Cheat Sheet
Useful numbers:
- 1 day = ~100K seconds (86,400)
- 1 year = ~30M seconds
- QPS from daily users: DAU × avg_requests / 86,400
- Storage: items × size × retention_period
- Bandwidth: QPS × avg_response_size
Example: Twitter timeline reads
- 200M DAU, each refreshes 10x/day
- QPS = 200M × 10 / 86,400 ≈ 23K QPS
- Peak = 2–3× average ≈ 60K QPS
Wrapping Up
System design interviews test three things:
- Can you break down ambiguous problems? (Requirements gathering)
- Do you know the building blocks? (Technical knowledge)
- Can you make and defend trade-offs? (Engineering judgment)
Master the framework, deeply understand 6–8 building blocks, and practice 10–15 problems. That's the formula.
If you're looking for ready-made architecture diagrams, cheat sheets, and structured study materials for data engineering and system design, check out DataStack Pro.
Top comments (0)