Mamoor Ahmad

Posted on Apr 25 • Edited on May 9

The Ultimate System Design Interview Cheatsheet (Visual Guide)

#ai #webdev #productivity #beginners

System design interviews can feel overwhelming — there's a mountain of concepts, and you never know which ones will come up. I put together a visual cheatsheet that covers the most essential topics, organized so you can see the big picture at a glance. 👇

Here's a topic-by-topic breakdown of everything on it. 🚀

1️⃣ Non-Functional Characteristics

Before designing anything, clarify the -ilities: availability, scalability, reliability, maintainability, latency, throughput, and consistency. These drive every architectural decision you'll make. 🎯

💡 Interview tip: Always ask about expected scale (QPS, data size, latency SLAs) before diving into a design.

2️⃣ CAP Theorem

You can only guarantee two of three:

🔄 Consistency — every read gets the latest write
✅ Availability — every request gets a response
🌐 Partition Tolerance — the system works despite network splits

In distributed systems, P is non-negotiable, so you're really choosing between CP (banking, inventory) and AP (social feeds, DNS).

3️⃣ Horizontal vs. Vertical Scaling ⚖️

	📈 Vertical	📊 Horizontal
How	Bigger machine	More machines
Limit	Hardware ceiling	Theoretically unlimited
Cost	Exponential	Linear-ish
Complexity	Low	High (needs load balancing, data partitioning)

Most production systems use horizontal scaling — it's the only way to handle massive traffic. 🏗️

4️⃣ DNS (Domain Name System) 🌍

DNS translates human-readable domains to IP addresses. Key concepts:

🔍 Recursive resolvers do the heavy lifting
⏱️ TTL controls caching duration
🗺️ Geographic DNS routes users to the nearest data center

For system design, think about DNS as your first layer of traffic routing. 🛣️

5️⃣ Load Balancing ⚖️

Distributes traffic across multiple servers. Common algorithms:

🔄 Round Robin — simple rotation
📉 Least Connections — route to the least busy server
🔗 IP Hash — sticky sessions by client IP
⚖️ Weighted — more traffic to beefier servers

Works at Layer 4 (TCP) or Layer 7 (HTTP). Use health checks to automatically remove dead backends. 🏥

6️⃣ API Gateway 🚪

A single entry point for all client requests. Handles:

🔐 Authentication & authorization
🚦 Rate limiting
🛤️ Request routing & transformation
🔒 SSL termination
📝 Logging & analytics

Think of it as the front door to your microservices architecture. 🏠

7️⃣ Content Delivery Network (CDN) 🌐

Caches static assets (images, CSS, JS, video) at edge locations close to users.

⬆️ Push CDN — you upload content proactively
⬇️ Pull CDN — fetches from origin on first request

Reduces latency dramatically. Pair with proper cache-control headers for best results. ⚡

8️⃣ Caching 💾

The fastest database query is the one you never make. 🎯

🌐 Browser cache → CDN cache → ⚡ Application cache → 💽 Database cache
🛠️ Tools: Redis, Memcached
📋 Strategies: Cache-aside, Write-through, Write-behind, Read-through

⚠️ Watch out for: cache invalidation (hard), thundering herd, and stale data.

9️⃣ Polling vs. WebSockets 📡

	🔄 Polling	🔌 WebSockets
Direction	Client → Server	Bidirectional
Latency	Depends on interval	Real-time
Overhead	New HTTP connection each time	Single persistent connection
Use case	Email checks, dashboards	Chat, live feeds, gaming

Long polling is a middle ground — the server holds the connection open until data is available. 🔗

🔟 Forward & Reverse Proxy 🛡️

➡️ Forward proxy — sits in front of clients (VPN, ad blockers, corporate firewalls)
⬅️ Reverse proxy — sits in front of servers (load balancer, API gateway, Nginx)

Both hide the real origin. Reverse proxies are a fundamental building block of scalable systems. 🧱

1️⃣1️⃣ Consistent Hashing 🔄

Solves the "what happens when we add/remove servers" problem.

🗺️ Maps both servers and keys to a hash ring
🔄 When a server is added/removed, only K/N keys need to be remapped (not all of them)
🛠️ Used in distributed caches, database sharding, CDNs

Virtual nodes improve even distribution across the ring. 💫

1️⃣2️⃣ Database Types 🗄️

A quick taxonomy:

📊 Relational (SQL): MySQL, PostgreSQL — structured data, ACID transactions
📄 Document: MongoDB — flexible schemas, JSON-like storage
🔑 Key-Value: Redis, DynamoDB — blazing fast lookups
📈 Column-Family: Cassandra, HBase — wide-column, high write throughput
🔗 Graph: Neo4j — relationships are first-class citizens
⏱️ Time-Series: InfluxDB — metrics, IoT data

💡 Pick the right tool for the job. There's no "best" database.

1️⃣3️⃣ SQL vs. NoSQL ⚔️

	📊 SQL	🍃 NoSQL
Schema	Fixed	Flexible
Scaling	Vertical (mostly)	Horizontal
Transactions	Strong ACID	Eventual consistency (usually)
Joins	Native	Application-level
Best for	Complex queries, relationships	Scale, flexibility, speed

Modern apps often use both — SQL for transactional data, NoSQL for caching/analytics. 🤝

1️⃣4️⃣ Database Scaling 📈

Two main strategies:

📖 Read Replicas

📋 Copy data to multiple follower nodes
🔄 Reads spread across replicas
✍️ Writes go to the leader only

🔪 Sharding

✂️ Split data across multiple databases
📦 Each shard holds a subset of the data
🧩 Hard problems: cross-shard queries, rebalancing

1️⃣5️⃣ Indexes 📇

A B-tree (or hash index) that makes lookups O(log n) instead of full table scans. ⚡

📄 Single-column vs. 📑 composite indexes
🎯 Covering index — query answered entirely from the index
⚖️ Trade-off: faster reads, slower writes (index maintenance overhead)

💡 Rule of thumb: index columns used in WHERE, JOIN, and ORDER BY.

1️⃣6️⃣ Leader Election 👑

In distributed systems, you often need a single coordinator:

🚀 Raft — understandable consensus (etcd, Consul)
📚 Paxos — the classic (harder to implement)
🏗️ ZooKeeper — battle-tested coordination service

Used in database replication, distributed locks, and task schedulers. 🔐

1️⃣7️⃣ Message Queues 📬

Decouple producers from consumers:

🚀 Kafka — high throughput, durable, great for event streaming
🐰 RabbitMQ — traditional broker, flexible routing
☁️ SQS — managed, serverless-friendly

Benefits: buffering, async processing, retry logic, fan-out. 🎯

1️⃣8️⃣ Event-Driven Architecture ⚡

Systems communicate through events rather than direct calls:

📤 Event producer → 🚌 Event bus → 📥 Event consumer
🔗 Enables loose coupling and independent scaling
🧩 Patterns: Event sourcing, CQRS, Saga

Think: "When X happens, trigger Y" at scale. 💭

1️⃣9️⃣ Microservices 🧱

Break a monolith into small, independently deployable services:

📦 Each service owns its data and logic
📡 Communicate via APIs or message queues
⚖️ Trade simplicity for scalability and team autonomy

✅ When to use: large teams, independent scaling needs, polyglot tech stacks.
❌ When not to: small teams, early-stage products.

2️⃣0️⃣ Communication Patterns 📡

🔄 Synchronous: REST, gRPC, GraphQL — request/response
⚡ Asynchronous: Message queues, event streams — fire and forget
🚀 gRPC — binary, fast, great for inter-service communication
🎯 GraphQL — client specifies exactly what data it needs

2️⃣1️⃣ Rate Limiting 🚦

Protect your system from abuse and overload:

🪣 Token bucket — tokens refill at a fixed rate
📊 Sliding window — counts requests in a rolling time window
💧 Leaky bucket — processes at a constant rate

Implement at the API gateway level. Return 429 Too Many Requests with Retry-After header. 🛑

2️⃣2️⃣ Idempotency 🔁

The same request applied multiple times has the same effect as once.

Why it matters: network retries, message queue redelivery, double-clicks. 🖱️

How: use idempotency keys — client sends a unique key, server deduplicates. 🔑

💰 Critical for payment systems and any write operation.

2️⃣3️⃣ Bloom & Cuckoo Filters 🌸

Probabilistic data structures for "is this element in the set?" 🤔

🌸 Bloom filter — space-efficient, no false negatives, possible false positives
🐦 Cuckoo filter — supports deletion, better false positive rates

Use cases: cache hit prediction, spam filtering, preventing duplicate writes. 🎯

2️⃣4️⃣ Single Point of Failure (SPOF) 💀

Any component whose failure brings down the entire system.

Eliminate SPOFs with:

🔄 Redundancy (multiple instances)
🔀 Failover mechanisms
🏥 Health checks + automatic recovery
🌍 Geographic distribution

🗣️ Interview mantra: "What happens when this component dies?" ☠️

2️⃣5️⃣ Heartbeat 💓

Periodic "I'm alive" signals between components.

💓 Server sends heartbeat to a monitor at regular intervals
⏰ If heartbeat is missed → mark as unhealthy → trigger failover
🛠️ Used in: leader election, cluster management, load balancer health checks

2️⃣6️⃣ Checksum ✅

Detects data corruption during transfer or storage.

🔓 MD5 — fast but not cryptographically secure
🔐 SHA-256 — secure, widely used
⚡ CRC32 — fast, good for error detection

Applied at: file transfers, network packets, distributed storage verification. 📁

2️⃣7️⃣ Database Replication 🔁

Copy data across multiple nodes:

🔄 Synchronous — writes confirmed after all replicas update (strong consistency, higher latency)
⚡ Asynchronous — writes confirmed immediately, replicas catch up (eventual consistency, lower latency)

Leader-follower is the most common pattern. Multi-leader and leaderless for advanced use cases. 🏗️

2️⃣8️⃣ Database Sharding & Partitioning 🔪

🔪 Sharding — horizontal split across databases/servers
📊 Partitioning — split within a single database

Sharding strategies:

📏 Range-based — by date, ID range
🔢 Hash-based — hash the shard key
📖 Directory-based — lookup table

🧩 Hard parts: rebalancing, cross-shard joins, hotspot avoidance.

🏁 Final Thoughts

This cheatsheet covers the 28 core concepts that come up again and again in system design interviews. You don't need to memorize everything — focus on understanding when and why to use each one. 🎯

The real skill in system design isn't knowing the tools. It's knowing which tools to reach for, and being able to explain your tradeoffs clearly. 💪

Good luck on your next interview. 🚀🔥

DEV Community