DEV Community

Cover image for System Design Cheat Sheet: Concepts Every Developer Should Know
Ajit Singh
Ajit Singh

Posted on • Originally published at singhajit.com

System Design Cheat Sheet: Concepts Every Developer Should Know

I spent years reading system design books and articles, watching conference talks, and building actual systems. This cheat sheet is everything I wish I had when I started. It covers the concepts that actually matter when building systems that scale.

Use this as a reference when designing systems, preparing for interviews, or reviewing architecture decisions.

Table of Contents

The System Design Process

Before diving into components, understand the process. Every good design starts with requirements.

Step 1: Clarify Requirements

Never start designing without understanding what you are building.

Functional Requirements (what the system does):

  • What features does the system need?
  • Who are the users?
  • What are the core use cases?

Non-Functional Requirements (how well it does it):

  • Scale: How many users? How much data?
  • Performance: What latency is acceptable?
  • Availability: How much downtime is tolerable?
  • Consistency: Is eventual consistency acceptable?

Step 2: Estimate Scale

Back-of-envelope calculations set the foundation. Get the order of magnitude right.

Step 2: Estimate Scale

Back-of-envelope calculations set the foundation. Get the order of magnitude right.

Metric Question to Ask
Users Daily active users? Peak concurrent users?
Storage How much data per user? How long do we keep it?
Bandwidth Average request size? Uploads vs downloads?
Throughput Requests per second? Read-heavy or write-heavy?

Step 3: Define High-Level Design

Draw the main components and how data flows between them.

Step 4: Deep Dive into Components

Pick the most critical or complex components and design them in detail. This includes database schemas, API contracts, and algorithms.

Step 5: Address Bottlenecks and Trade-offs

Every design has trade-offs. Identify potential bottlenecks and explain how you would handle them.


Scalability Fundamentals

Scalability is the ability to handle increased load. There are two approaches.

Vertical Scaling (Scale Up)

Add more resources to existing machines.

Pros Cons
Simple to implement Hardware limits (you cannot buy a bigger server indefinitely)
No code changes needed Single point of failure
Easier to manage Expensive at high end
No distributed complexity Downtime during upgrades

Horizontal Scaling (Scale Out)

Add more machines to distribute the load.

Pros Cons
Near unlimited scaling More complex architecture
Better fault tolerance Requires distributed systems knowledge
Cost effective (commodity hardware) Data consistency challenges
No single point of failure Network overhead

Rule of thumb: Start simple with vertical scaling, move to horizontal when you hit limits or need fault tolerance.


Load Balancing

A load balancer distributes traffic across multiple servers.

Why Use Load Balancers?

  • Availability: If one server dies, traffic goes to healthy servers
  • Scalability: Add servers behind the load balancer as traffic grows
  • Performance: Prevent any single server from being overwhelmed

Load Balancing Algorithms

Algorithm How It Works Best For
Round Robin Requests go to servers in rotation Equal-capacity servers, stateless apps
Weighted Round Robin Higher-weight servers get more traffic Mixed-capacity server fleet
Least Connections New requests go to server with fewest active connections Long-running requests, varying request times
IP Hash Client IP determines server (sticky sessions) Stateful applications, session affinity
Least Response Time Fastest responding server gets next request Performance-critical applications

Layer 4 vs Layer 7 Load Balancing

Layer 4 (Transport Layer) routes based on IP address and port. Fast but cannot inspect content.

Layer 7 (Application Layer) routes based on URL, headers, cookies. Smarter but more processing overhead.

Layer 4 Layer 7
Faster (less processing) Smarter routing (URL, headers, cookies)
Cannot inspect content Can cache, compress, SSL terminate
Simple configuration Content-based routing
TCP/UDP level HTTP/HTTPS level

If you want the client-side picture of how DNS, TCP, TLS, and HTTP requests fit together before traffic ever hits a load balancer, see what happens when you type a URL in the browser.

Popular Load Balancers

  • Nginx: Fast, widely used, great for HTTP
  • HAProxy: High performance, TCP and HTTP
  • AWS ELB/ALB: Managed, integrates with AWS
  • Cloudflare: Edge load balancing with CDN

Caching

Caching stores frequently accessed data in fast storage to reduce latency and database load.

Where to Cache

Cache Layer What It Caches Tools
Browser Static assets, API responses HTTP headers (Cache-Control)
CDN Static files, media, edge content Cloudflare, CloudFront, Fastly
Application Computed values, session data In-memory (Guava, Caffeine)
Distributed Shared data across servers Redis, Memcached
Database Query results, frequently accessed rows MySQL query cache, PostgreSQL

Caching Strategies

I covered these in depth in Caching Strategies Explained. Here is the summary:

Strategy How It Works Best For
Cache-Aside App checks cache, fetches from DB on miss, populates cache General purpose, most control
Read-Through Cache fetches from DB automatically on miss Read-heavy, simpler code
Write-Through Writes go to cache and DB synchronously Consistency critical
Write-Behind Writes go to cache, async to DB later High write throughput
Write-Around Writes bypass cache, go to DB only Write-once data

Cache Eviction Policies

When cache is full, what gets removed?

Policy Removes Best For
LRU Least Recently Used items General purpose, most common
LFU Least Frequently Used items Stable access patterns
FIFO Oldest items Simple use cases
TTL Expired items Time-sensitive data

Cache Invalidation

The hardest problem in caching. Options include:

  • TTL-based: Expire after fixed time (simple, allows staleness window)
  • Event-based: Invalidate on data change (immediate, complex to track)
  • Version-based: Include version in cache key (no stale data, more misses)

Databases

Database choice is one of the most important architectural decisions.

SQL vs NoSQL

SQL (Relational) NoSQL
Structured data, fixed schema Flexible or schema-less
ACID transactions Eventual consistency (often)
Complex queries and joins Simple queries, denormalized data
Vertical scaling primarily Horizontal scaling built-in
PostgreSQL, MySQL, Oracle MongoDB, Cassandra, DynamoDB

For a complete breakdown of when each engine wins on cost, scaling, and consistency, see PostgreSQL vs MongoDB vs DynamoDB. The PostgreSQL Cheat Sheet and MongoDB Cheat Sheet cover day-to-day commands.

NoSQL Types

Type Data Model Examples Use Case
Document JSON documents MongoDB, CouchDB Content management, catalogs
Key-Value Simple key to value Redis, DynamoDB Caching, sessions
Column-Family Wide columns Cassandra, HBase Time series, analytics
Graph Nodes and edges Neo4j, Amazon Neptune Social networks, recommendations

Database Scaling Patterns

Replication

Copies of data across multiple servers.

  • Leader-Follower: One primary handles writes, replicas handle reads
  • Leader-Leader: Multiple primaries, complex conflict resolution
  • Benefit: Read scalability, fault tolerance

Once writes are distributed across replicas and shards, concurrency control becomes critical. See Database Locks Explained for shared, exclusive, and row-level locking patterns.

Related tool: SQL Formatter and Beautifier. Format slow SQL from EXPLAIN ANALYZE output before sharing it in design reviews.

Sharding (Partitioning)

Split data across multiple databases.

Sharding Strategy How It Works Pros Cons
Hash-based Hash of key determines shard Even distribution Resharding is painful
Range-based Key ranges determine shard Range queries work Hot spots possible
Geographic Location determines shard Low latency Complex for global users
Directory-based Lookup table maps keys to shards Flexible Lookup is bottleneck

Use consistent hashing for hash-based sharding to minimize data movement when servers are added or removed. See how Slack uses workspace-based sharding and Shopify shards by shop_id in their production systems.

The shard key itself usually comes from a distributed unique-ID scheme like Snowflake IDs. Snowflake IDs encode a timestamp in the high bits, which makes them naturally time-sortable and friendly to range-based shards.

Related tool: Snowflake ID Decoder. Paste a Discord, Twitter, or Instagram ID to see its timestamp, machine, and sequence bits.


Message Queues

Message queues decouple services and enable asynchronous processing.

Why Use Queues?

  • Decoupling: Services don't need to know about each other
  • Resilience: Failed consumers don't crash producers
  • Buffering: Absorb traffic spikes
  • Scalability: Add consumers as needed

For a deep dive, see Role of Queues in System Design.

Queue Patterns

Pattern Description Use Case
Point-to-Point One producer, one consumer per message Task distribution
Pub/Sub One producer, many consumers get same message Event notifications
Work Queue Multiple consumers compete for messages Parallel processing
Dead Letter Queue Failed messages go here after retries Error handling

Popular Message Queues

Tool Best For Throughput
Kafka Event streaming, log aggregation, replay Millions/sec
RabbitMQ Complex routing, traditional messaging Thousands/sec
SQS Simple AWS-native queuing Thousands/sec
Redis Streams Lightweight streaming Hundreds of thousands/sec

See How Kafka Works for a complete breakdown and Kafka vs RabbitMQ vs SQS for a head-to-head on throughput, ordering, and cost.

Background workers usually run on cron-style schedules. Use the Cron Expression Translator (or the Quartz Cron Generator for Java and Spring Boot) to validate schedules before they hit production.


API Design

APIs are contracts between services. Design them carefully.

REST vs GraphQL vs gRPC

Aspect REST GraphQL gRPC
Data Format JSON JSON Protocol Buffers
Contract Implicit (conventions) Schema-defined Protocol definition
Over-fetching Common Solved (request what you need) N/A
Learning Curve Low Medium Higher
Best For Public APIs, CRUD Flexible frontends Internal microservices

For a detailed comparison with architecture patterns, performance benchmarks, and a decision flowchart, see REST vs GraphQL vs gRPC: How to Pick the Right API Protocol.

REST Best Practices

# Good URL design
GET    /users              # List users
GET    /users/123          # Get user 123
POST   /users              # Create user
PUT    /users/123          # Update user 123
DELETE /users/123          # Delete user 123

# Bad URL design
GET    /getUsers
POST   /createUser
GET    /users/delete/123
Enter fullscreen mode Exit fullscreen mode

Related tool: Curl Convertor. Paste any curl request from your terminal and get it as Python, Go, Axios, Java, or PHP code for your tests and SDKs.

Authentication

Most modern APIs are stateless. They authenticate each request with a token instead of a server-side session. For the full breakdown of why JWT replaced sessions, the three-part structure, and common pitfalls, read Why JWT Replaced Sessions: Building Auth That Scales and OAuth 2.0 explained.

Related tool: JWT Decoder and Inspector. Inspect a JWT's header, payload, and expiration without leaking the token to a third-party server.

API Versioning

Strategy Example Pros Cons
URL Path /v1/users Clear, easy caching URL changes
Query Param /users?version=1 Single endpoint Easy to miss
Header Accept: application/vnd.api.v1+json Clean URLs Hidden versioning

Rate Limiting

Protect your API from abuse and overload.

Algorithm How It Works Best For
Token Bucket Tokens added at fixed rate, requests consume tokens Burst-friendly, most common
Sliding Window Count requests in rolling time window Smooth rate limiting
Fixed Window Count requests in fixed intervals Simple to implement

For implementation details, see Dynamic Rate Limiter System Design.


Distributed Systems Concepts

When you scale beyond a single machine, you enter distributed systems territory.

CAP Theorem

In a distributed system, during a network partition, you must choose between:

  • CP System: Consistency + Partition Tolerance (blocks during partition)
    • Example: ZooKeeper, etcd, traditional banking
  • AP System: Availability + Partition Tolerance (may serve stale data)
    • Example: Cassandra, DynamoDB, DNS

Reality: Network partitions are inevitable. You're always choosing between C and A.

Consistency Models

Model Guarantee Example
Strong Read always returns latest write Bank balance
Eventual Reads will eventually see latest write Social media likes
Causal Related events appear in order Chat messages
Read-your-writes You see your own writes immediately Shopping cart

Consensus Algorithms

How do distributed nodes agree on a value?

Algorithm Used In Complexity
Paxos Chubby, Spanner Notoriously complex
Raft etcd, Consul Easier to understand
ZAB ZooKeeper Similar to Paxos

See Paxos Distributed Consensus for details.

Distributed Transactions

When a transaction spans multiple services:

Pattern How It Works Consistency
Two-Phase Commit Coordinator asks all nodes, then commits Strong (but slow)
Saga Chain of local transactions with compensations Eventual
Outbox Pattern Write to DB and outbox table atomically Eventual

See Two-Phase Commit, the Saga Pattern guide, and the Transactional Outbox Pattern for implementation details.


Common Architecture Patterns

Monolith vs Microservices

Monolith Microservices
Simple deployment Independent deployments
Easy debugging Distributed debugging
Shared database Database per service
Tight coupling Network overhead
Team coordination needed Team autonomy

Most teams should start with a monolith and extract services when needed. See Modular Monolith Architecture for a middle ground.

Event-Driven Architecture

Services communicate through events instead of direct calls.

Benefits:

  • Loose coupling between services
  • Services can be added/removed without affecting others
  • Natural audit log of events

Challenges:

  • Eventual consistency
  • Debugging across services
  • Event ordering

CQRS (Command Query Responsibility Segregation)

Separate read and write models.

Use when read and write patterns are very different. See CQRS Pattern Guide for details.


Monitoring and Observability

You cannot fix what you cannot see.

The Three Pillars

Pillar What It Shows Tools
Metrics Numerical measurements over time Prometheus, Datadog, CloudWatch
Logs Discrete events with details ELK Stack, Splunk, Loki
Traces Request path across services Jaeger, Grafana Tempo, Zipkin, X-Ray

Use OpenTelemetry as the vendor-neutral instrumentation layer to collect all three signals and export to any backend.

Key Metrics to Monitor

Metric What It Measures Alert When
Request Rate Requests per second Sudden drop or spike
Error Rate Percentage of errors Above threshold (e.g., > 1%)
Latency (p50, p95, p99) Response time distribution p99 exceeds SLA
Saturation Resource utilization CPU, memory, disk > 80%

When a single request is misbehaving, a HAR export from the browser is usually faster than diving into traces.

Related tool: HAR File Viewer. Open Chrome, Firefox, Safari, or Edge HAR files in your browser to see the timing waterfall, headers, and bodies for every request.

Related tool: User Agent Parser. Parse User-Agent strings from access logs to break errors down by browser, OS, device, and bot traffic.

SLI, SLO, SLA

Term Definition Example
SLI (Service Level Indicator) Measurement of service Request latency, error rate
SLO (Service Level Objective) Target for the SLI 99.9% requests under 200ms
SLA (Service Level Agreement) Contract with customers 99.5% uptime or refund

For more details, see SLI, SLO, SLA Explained.

Related tool: SLA and Uptime Calculator. Turn SLA percentages like 99.9% and 99.99% into allowed downtime per day, week, and month.


Capacity Estimation

Back-of-envelope calculations help validate designs.

Common Numbers to Know

Resource Value
L1 cache reference 0.5 ns
RAM reference 100 ns
SSD read 100 μs
Network round trip (same datacenter) 500 μs
Network round trip (cross-country) 150 ms
Disk seek 10 ms

Storage Calculations

Users: 100 million
Data per user: 1 KB profile + 10 KB posts = 11 KB
Total: 100M × 11 KB = 1.1 TB

With 3x replication: 3.3 TB
Growth over 3 years (2x): 6.6 TB
Enter fullscreen mode Exit fullscreen mode

Throughput Calculations

Daily active users: 10 million
Requests per user per day: 20
Daily requests: 200 million
Requests per second: 200M / 86,400 = ~2,300 RPS

Peak (3x average): ~7,000 RPS
Design for: 10,000 RPS (headroom)
Enter fullscreen mode Exit fullscreen mode

Bandwidth Calculations

Requests per second: 10,000
Average response size: 10 KB
Bandwidth: 10,000 × 10 KB = 100 MB/s = 800 Mbps
Enter fullscreen mode Exit fullscreen mode

Quick Reference Tables

Database Decision Matrix

Need Choose
ACID transactions PostgreSQL, MySQL
Flexible schema MongoDB, DynamoDB
Time series data InfluxDB, TimescaleDB
Graph relationships Neo4j
High write throughput Cassandra
Caching/sessions Redis

Communication Protocol Decision

Need Choose
Public API, broad compatibility REST
Flexible queries, multiple clients GraphQL
Internal services, high performance gRPC
Real-time bidirectional WebSocket
One-way server push Server-Sent Events

For real-time options, see WebSockets Explained, Server-Sent Events, and Long Polling.

Scaling Decision Matrix

Problem Solution
Database reads too slow Add read replicas, caching
Database writes too slow Sharding, write-behind cache
Single server overloaded Horizontal scaling with load balancer
Too much traffic for one region CDN, geographic distribution
Service calls too slow Message queues, async processing
Downstream service failures cascade Circuit breaker pattern, bulkheads

Common System Design Numbers

System Scale
Twitter (X) 500 million tweets/day
Google 8.5 billion searches/day
Netflix 15% of global internet traffic
WhatsApp 100 billion messages/day
Uber 1 million matches/minute peak

Putting It All Together

Here is a typical architecture for a scalable web application:

Components:

  1. CDN: Serves static assets close to users
  2. Load Balancer: Distributes traffic, provides failover
  3. API Servers: Handle business logic, stateless for easy scaling
  4. Cache: Reduce database load, improve latency
  5. Database: Primary for writes, replicas for reads
  6. Message Queue: Decouple services, handle async work
  7. Workers: Process background jobs
  8. Object Storage: Store files, media
  9. Search Index: Full-text search capabilities

Related tool: UUID Generator and Decoder. Generate UUID v4 and v7 for distributed services, databases, and API resources.

Related tool: ULID Generator and Decoder. Prefer ULIDs when you need time-sortable, lexicographically ordered IDs across services.

Related tool: Epoch Timestamp Converter. Convert Unix timestamps when you debug ordering, TTLs, and event timelines.

Related tool: Subnet Calculator. Plan VPCs, subnets, and CIDR ranges for multi-region deployments and Kubernetes pod networks.

Further Reading

These posts go deeper into specific topics:

Caching and Performance:

Database and Storage:

Identity and Auth:

Messaging and Event Streaming:

AI System Design:

Scaling Case Studies:

Distributed Systems:

Operations and Deployment:


External Resources:

Top comments (1)

Collapse
 
tracygjg profile image
Tracy Gilmore

Excellent post but thee is a small duplication at Step two.