DEV Community

Cover image for 🧠 The Ultimate System Design Interview Cheatsheet (Visual Guide)
Mamoor Ahmad
Mamoor Ahmad

Posted on

🧠 The Ultimate System Design Interview Cheatsheet (Visual Guide)

🧠 The Ultimate System Design Interview Cheatsheet

System Design Cheatsheet

System design interviews can feel overwhelming β€” there's a mountain of concepts, and you never know which ones will come up. I put together a visual cheatsheet that covers the most essential topics, organized so you can see the big picture at a glance. πŸ‘‡

Here's a topic-by-topic breakdown of everything on it. πŸš€


1️⃣ Non-Functional Characteristics

Before designing anything, clarify the -ilities: availability, scalability, reliability, maintainability, latency, throughput, and consistency. These drive every architectural decision you'll make. 🎯

πŸ’‘ Interview tip: Always ask about expected scale (QPS, data size, latency SLAs) before diving into a design.


2️⃣ CAP Theorem

You can only guarantee two of three:

  • πŸ”„ Consistency β€” every read gets the latest write
  • βœ… Availability β€” every request gets a response
  • 🌐 Partition Tolerance β€” the system works despite network splits

In distributed systems, P is non-negotiable, so you're really choosing between CP (banking, inventory) and AP (social feeds, DNS).


3️⃣ Horizontal vs. Vertical Scaling βš–οΈ

πŸ“ˆ Vertical πŸ“Š Horizontal
How Bigger machine More machines
Limit Hardware ceiling Theoretically unlimited
Cost Exponential Linear-ish
Complexity Low High (needs load balancing, data partitioning)

Most production systems use horizontal scaling β€” it's the only way to handle massive traffic. πŸ—οΈ


4️⃣ DNS (Domain Name System) 🌍

DNS translates human-readable domains to IP addresses. Key concepts:

  • πŸ” Recursive resolvers do the heavy lifting
  • ⏱️ TTL controls caching duration
  • πŸ—ΊοΈ Geographic DNS routes users to the nearest data center

For system design, think about DNS as your first layer of traffic routing. πŸ›£οΈ


5️⃣ Load Balancing βš–οΈ

Distributes traffic across multiple servers. Common algorithms:

  • πŸ”„ Round Robin β€” simple rotation
  • πŸ“‰ Least Connections β€” route to the least busy server
  • πŸ”— IP Hash β€” sticky sessions by client IP
  • βš–οΈ Weighted β€” more traffic to beefier servers

Works at Layer 4 (TCP) or Layer 7 (HTTP). Use health checks to automatically remove dead backends. πŸ₯


6️⃣ API Gateway πŸšͺ

A single entry point for all client requests. Handles:

  • πŸ” Authentication & authorization
  • 🚦 Rate limiting
  • πŸ›€οΈ Request routing & transformation
  • πŸ”’ SSL termination
  • πŸ“ Logging & analytics

Think of it as the front door to your microservices architecture. 🏠


7️⃣ Content Delivery Network (CDN) 🌐

Caches static assets (images, CSS, JS, video) at edge locations close to users.

  • ⬆️ Push CDN β€” you upload content proactively
  • ⬇️ Pull CDN β€” fetches from origin on first request

Reduces latency dramatically. Pair with proper cache-control headers for best results. ⚑


8️⃣ Caching πŸ’Ύ

The fastest database query is the one you never make. 🎯

  • 🌐 Browser cache β†’ CDN cache β†’ ⚑ Application cache β†’ πŸ’½ Database cache
  • πŸ› οΈ Tools: Redis, Memcached
  • πŸ“‹ Strategies: Cache-aside, Write-through, Write-behind, Read-through

⚠️ Watch out for: cache invalidation (hard), thundering herd, and stale data.


9️⃣ Polling vs. WebSockets πŸ“‘

πŸ”„ Polling πŸ”Œ WebSockets
Direction Client β†’ Server Bidirectional
Latency Depends on interval Real-time
Overhead New HTTP connection each time Single persistent connection
Use case Email checks, dashboards Chat, live feeds, gaming

Long polling is a middle ground β€” the server holds the connection open until data is available. πŸ”—


πŸ”Ÿ Forward & Reverse Proxy πŸ›‘οΈ

  • ➑️ Forward proxy β€” sits in front of clients (VPN, ad blockers, corporate firewalls)
  • ⬅️ Reverse proxy β€” sits in front of servers (load balancer, API gateway, Nginx)

Both hide the real origin. Reverse proxies are a fundamental building block of scalable systems. 🧱


1️⃣1️⃣ Consistent Hashing πŸ”„

Solves the "what happens when we add/remove servers" problem.

  • πŸ—ΊοΈ Maps both servers and keys to a hash ring
  • πŸ”„ When a server is added/removed, only K/N keys need to be remapped (not all of them)
  • πŸ› οΈ Used in distributed caches, database sharding, CDNs

Virtual nodes improve even distribution across the ring. πŸ’«


1️⃣2️⃣ Database Types πŸ—„οΈ

A quick taxonomy:

  • πŸ“Š Relational (SQL): MySQL, PostgreSQL β€” structured data, ACID transactions
  • πŸ“„ Document: MongoDB β€” flexible schemas, JSON-like storage
  • πŸ”‘ Key-Value: Redis, DynamoDB β€” blazing fast lookups
  • πŸ“ˆ Column-Family: Cassandra, HBase β€” wide-column, high write throughput
  • πŸ”— Graph: Neo4j β€” relationships are first-class citizens
  • ⏱️ Time-Series: InfluxDB β€” metrics, IoT data

πŸ’‘ Pick the right tool for the job. There's no "best" database.


1️⃣3️⃣ SQL vs. NoSQL βš”οΈ

πŸ“Š SQL πŸƒ NoSQL
Schema Fixed Flexible
Scaling Vertical (mostly) Horizontal
Transactions Strong ACID Eventual consistency (usually)
Joins Native Application-level
Best for Complex queries, relationships Scale, flexibility, speed

Modern apps often use both β€” SQL for transactional data, NoSQL for caching/analytics. 🀝


1️⃣4️⃣ Database Scaling πŸ“ˆ

Two main strategies:

πŸ“– Read Replicas

  • πŸ“‹ Copy data to multiple follower nodes
  • πŸ”„ Reads spread across replicas
  • ✍️ Writes go to the leader only

πŸ”ͺ Sharding

  • βœ‚οΈ Split data across multiple databases
  • πŸ“¦ Each shard holds a subset of the data
  • 🧩 Hard problems: cross-shard queries, rebalancing

1️⃣5️⃣ Indexes πŸ“‡

A B-tree (or hash index) that makes lookups O(log n) instead of full table scans. ⚑

  • πŸ“„ Single-column vs. πŸ“‘ composite indexes
  • 🎯 Covering index β€” query answered entirely from the index
  • βš–οΈ Trade-off: faster reads, slower writes (index maintenance overhead)

πŸ’‘ Rule of thumb: index columns used in WHERE, JOIN, and ORDER BY.


1️⃣6️⃣ Leader Election πŸ‘‘

In distributed systems, you often need a single coordinator:

  • πŸš€ Raft β€” understandable consensus (etcd, Consul)
  • πŸ“š Paxos β€” the classic (harder to implement)
  • πŸ—οΈ ZooKeeper β€” battle-tested coordination service

Used in database replication, distributed locks, and task schedulers. πŸ”


1️⃣7️⃣ Message Queues πŸ“¬

Decouple producers from consumers:

  • πŸš€ Kafka β€” high throughput, durable, great for event streaming
  • 🐰 RabbitMQ β€” traditional broker, flexible routing
  • ☁️ SQS β€” managed, serverless-friendly

Benefits: buffering, async processing, retry logic, fan-out. 🎯


1️⃣8️⃣ Event-Driven Architecture ⚑

Systems communicate through events rather than direct calls:

  • πŸ“€ Event producer β†’ 🚌 Event bus β†’ πŸ“₯ Event consumer
  • πŸ”— Enables loose coupling and independent scaling
  • 🧩 Patterns: Event sourcing, CQRS, Saga

Think: "When X happens, trigger Y" at scale. πŸ’­


1️⃣9️⃣ Microservices 🧱

Break a monolith into small, independently deployable services:

  • πŸ“¦ Each service owns its data and logic
  • πŸ“‘ Communicate via APIs or message queues
  • βš–οΈ Trade simplicity for scalability and team autonomy

βœ… When to use: large teams, independent scaling needs, polyglot tech stacks.
❌ When not to: small teams, early-stage products.


2️⃣0️⃣ Communication Patterns πŸ“‘

  • πŸ”„ Synchronous: REST, gRPC, GraphQL β€” request/response
  • ⚑ Asynchronous: Message queues, event streams β€” fire and forget
  • πŸš€ gRPC β€” binary, fast, great for inter-service communication
  • 🎯 GraphQL β€” client specifies exactly what data it needs

2️⃣1️⃣ Rate Limiting 🚦

Protect your system from abuse and overload:

  • πŸͺ£ Token bucket β€” tokens refill at a fixed rate
  • πŸ“Š Sliding window β€” counts requests in a rolling time window
  • πŸ’§ Leaky bucket β€” processes at a constant rate

Implement at the API gateway level. Return 429 Too Many Requests with Retry-After header. πŸ›‘


2️⃣2️⃣ Idempotency πŸ”

The same request applied multiple times has the same effect as once.

Why it matters: network retries, message queue redelivery, double-clicks. πŸ–±οΈ

How: use idempotency keys β€” client sends a unique key, server deduplicates. πŸ”‘

πŸ’° Critical for payment systems and any write operation.


2️⃣3️⃣ Bloom & Cuckoo Filters 🌸

Probabilistic data structures for "is this element in the set?" πŸ€”

  • 🌸 Bloom filter β€” space-efficient, no false negatives, possible false positives
  • 🐦 Cuckoo filter β€” supports deletion, better false positive rates

Use cases: cache hit prediction, spam filtering, preventing duplicate writes. 🎯


2️⃣4️⃣ Single Point of Failure (SPOF) πŸ’€

Any component whose failure brings down the entire system.

Eliminate SPOFs with:

  • πŸ”„ Redundancy (multiple instances)
  • πŸ”€ Failover mechanisms
  • πŸ₯ Health checks + automatic recovery
  • 🌍 Geographic distribution

πŸ—£οΈ Interview mantra: "What happens when this component dies?" ☠️


2️⃣5️⃣ Heartbeat πŸ’“

Periodic "I'm alive" signals between components.

  • πŸ’“ Server sends heartbeat to a monitor at regular intervals
  • ⏰ If heartbeat is missed β†’ mark as unhealthy β†’ trigger failover
  • πŸ› οΈ Used in: leader election, cluster management, load balancer health checks

2️⃣6️⃣ Checksum βœ…

Detects data corruption during transfer or storage.

  • πŸ”“ MD5 β€” fast but not cryptographically secure
  • πŸ” SHA-256 β€” secure, widely used
  • ⚑ CRC32 β€” fast, good for error detection

Applied at: file transfers, network packets, distributed storage verification. πŸ“


2️⃣7️⃣ Database Replication πŸ”

Copy data across multiple nodes:

  • πŸ”„ Synchronous β€” writes confirmed after all replicas update (strong consistency, higher latency)
  • ⚑ Asynchronous β€” writes confirmed immediately, replicas catch up (eventual consistency, lower latency)

Leader-follower is the most common pattern. Multi-leader and leaderless for advanced use cases. πŸ—οΈ


2️⃣8️⃣ Database Sharding & Partitioning πŸ”ͺ

  • πŸ”ͺ Sharding β€” horizontal split across databases/servers
  • πŸ“Š Partitioning β€” split within a single database

Sharding strategies:

  • πŸ“ Range-based β€” by date, ID range
  • πŸ”’ Hash-based β€” hash the shard key
  • πŸ“– Directory-based β€” lookup table

🧩 Hard parts: rebalancing, cross-shard joins, hotspot avoidance.


🏁 Final Thoughts

This cheatsheet covers the 28 core concepts that come up again and again in system design interviews. You don't need to memorize everything β€” focus on understanding when and why to use each one. 🎯

The real skill in system design isn't knowing the tools. It's knowing which tools to reach for, and being able to explain your tradeoffs clearly. πŸ’ͺ

Good luck on your next interview. πŸš€πŸ”₯


πŸ’¬ What system design topic do you find trickiest? Drop a comment below! πŸ‘‡

Top comments (0)