Robort Gabriel

Posted on Sep 21 • Edited on Nov 2

System Design Interview Playbook: Clear Steps to Shine

#programming #webdev #interview #devops

System design sits at the heart of the technical Interview. It can look huge at first, but it doesn’t need to be. With a simple, repeatable game plan, you can show clear thinking, good engineering judgment, and solid communication. This guide turns a fuzzy prompt like “design Instagram” into a calm, easy chat, without breaking a sweat or drawing a data center with 17 mystery boxes.

System design also matters for career growth. Senior roles expect you to reason about scale, reliability, and trade‑offs, not just write code. If you want to understand how system design fits into the broader set of in-demand abilities for senior engineers, see our guide to high-income tech skills.

You’re not building the entire company in one hour. Pick a slice, design it well, and explain the trade‑offs clearly.

What Interviewers Really Evaluate

Interviewers want more than a pretty diagram. They’re looking at how you:

Clarify scope and remove ambiguity
Separate functional and non-functional requirements
Choose sensible APIs and data models
Communicate trade‑offs (consistency vs availability, latency vs cost, build vs buy)
Scale the design step by step, not all at once
Collaborate: ask questions, validate assumptions, and listen to signals

Keep the conversation two-way. Ask what matters most and go deeper there.

A 5-Step Framework You Can Reuse

Use this simple approach in any system design Interview.

1) Clarify the Scope

Start by shrinking the problem. Suppose you’re asked to design a social media app like Instagram. Ask:

Which features should we focus on, uploading content, viewing the feed, or messaging?
What’s the target region and expected user scale?
Are we optimizing for mobile, web, or both?
Is real-time interaction a must-have, or can updates be slightly delayed?

This avoids overbuilding and guides your choices later.

2) Gather Requirements

List requirements and label them clearly.

Functional requirements (what it should do):
- Users can upload images and short videos
- Users can view a home feed of posts
- Users can like and comment
- Users can follow other accounts
Non-functional requirements (how it should behave):
- High availability and low latency for reads
- Reasonable write performance under spikes (e.g., viral posts)
- Durability of media and metadata
- Horizontal scalability (many users, many posts)
- Security, privacy, and basic compliance

As you weigh trade-offs, remember the CAP theorem, which says a distributed system can’t deliver all three of consistency, availability, and partition tolerance at once. In practice, you’ll often pick availability and partition tolerance for user-facing reads, with eventual consistency for non-critical counters and feeds.

3) Sketch APIs and a Data Model

Don’t aim for perfect. Keep it simple and show you can translate requirements into an interface.

Example APIs for an “upload + view feed” slice:

POST /posts (body includes media URL, caption, userId)
GET /feed?userId=&cursor=&limit=
POST /likes (postId, userId)
GET /posts/{id}

Rough data model:

User(id, handle, name, createdAt)
Post(id, userId, caption, mediaUrl, createdAt)
Follow(followerId, followeeId, createdAt)
Like(postId, userId, createdAt)

Note a few access patterns:

Feed reads dominate: GET /feed is a hot path
Post writes include metadata to a database and media to blob storage
Likes/comments are small writes but high volume on popular posts

Call out indexes and partitions if needed: e.g., index Follow by followerId for fast feed fan-out, and partition Post by userId or time.

4) Draw a Baseline Architecture

Describe a simple, clean baseline before you scale it up:

Client (mobile/web)
Load Balancer
App tier: separate paths for reads vs writes (reads typically outnumber writes)
Metadata database (start with a relational DB if data is highly relational)
Blob storage for images/videos
CDN in front of media for global performance

Text version of the flow:

Client → Load Balancer → App (Write) → DB (metadata) + Blob Storage (media)
Client → Load Balancer → App (Read) → DB + Cache + CDN (for media)

This baseline maps directly to functional requirements and gives you a platform to layer more capabilities.

5) Layer in Scale, Reliability, and Performance

Add components gradually and explain why:

Caching for hot reads (e.g., Redis) to reduce DB load
Background workers and queues for fan-out, thumbnail generation, and notifications
Search/indexing using a dedicated service (e.g., OpenSearch/Elasticsearch)
Rate limiting and auth for safety
Monitoring, logging, and tracing for visibility
Feature flags and gradual rollouts for safety during changes

Be explicit about each new piece: what problem it solves, how it fits in, and the trade-offs it brings.

From Baseline to Production: What to Add and Why

Use this as a checklist during the interview to justify each layer you propose.

Layer	Component	Why it matters
Performance	CDN + Cache	Low-latency reads and offload from origin servers
Durability	Blob Storage	Durable, cheap storage for images and videos
Reliability	Multi-AZ/Region DB	High availability and disaster recovery
Scalability	Sharding/Partitioning	Handle growth in posts, users, and traffic
Asynchronicity	Queue + Workers	Smooth spikes, process jobs out-of-band
Observability	Metrics + Logs + Traces	Detect issues and improve performance
Safety	Rate Limits + WAF	Protect against abuse and overload

Keep the conversation practical. Tie each component back to a requirement.

Handling Data and Consistency Like a Pro (But Simply)

You don’t need to be academic, but you should show you understand consistency trade-offs.

Strong vs eventual consistency: Likes count may be eventually consistent. Payment ledger must be strongly consistent.
Read-heavy endpoints (feeds, profiles) can accept slightly stale data to keep latency low.
Write paths should be idempotent: retries shouldn’t duplicate a like or a post.
Use durable storage for media and consider pre-signed URLs for secure uploads.

A simple rule of thumb: default to strong consistency for critical invariants, and use eventual consistency for derived or cached views.

Back-of-the-Envelope Numbers (Capacity Planning)

Showing quick math signals seniority. Keep it simple and round numbers.

Traffic: If you expect 10 million daily active users with 20 feed views/day, that’s ~200M feed reads/day ≈ 2,300 reads/second on average (peaks can be 10×).
Storage: If average image is 500 KB and 5 million posts/day, that’s ~2.5 TB/day. Plan for lifecycle rules (cold storage) to control cost.
Cache sizing: If 20% of posts drive 80% of reads, cache those aggressively to offload the DB.

Call out that you’d validate these assumptions with metrics and adjust capacity accordingly.

Example Walkthrough: “Instagram‑Lite” (Upload + Feed)

Here’s how you could structure a 30, 40 minute discussion.

1) Clarify scope

We’ll design uploading a photo and viewing a personalized feed. No messaging. Single region to start.

2) Requirements

Functional: upload post, view home feed, like post
Non-functional: low-latency feed (<200 ms p95), handle bursts, durable media

3) API + Data

APIs: POST /posts, GET /feed, POST /likes
Data: relational DB for Users, Posts, Follows; cache for hot feeds; blob storage for images; CDN for delivery

4) Baseline architecture

App servers behind a load balancer
Separate read and write pathways
DB for metadata; object storage for images; CDN in front of storage

5) Scale up

Precompute feeds via background workers using the Follow graph
Store precomputed feed pages per user in cache with TTL; fall back to on-demand fetch if missed
Use idempotency keys for POST /posts and POST /likes
Add metrics: feed latency, cache hit ratio, DB QPS, queue depth

Trade-offs to discuss

Precomputed feeds are fast but expensive for high-fanout users; on-demand feed assembly is cheaper but slower
Likes may be eventually consistent in the UI to keep writes simple and fast
Hot content mitigation: rate limit likes/comments on viral posts and partition workload

Close with a brief summary and note what you’d do “with more time” (multi-region, stronger privacy, ranking models).

How to Communicate During the Interview

Start simple, then layer. Don’t jump to Kubernetes or global replication immediately.
State assumptions aloud and check if they’re okay.
Draw while you talk; name each component and its purpose.
Keep a running list of trade-offs you’ve discussed.
Time-box: spend ~5 minutes on scope, ~10 on APIs/data, ~10 on architecture, ~10 on scaling and trade-offs, and reserve a few minutes to recap.

Practice Plan: Build Your System Design “Muscle Memory”

Do mock interviews weekly. Record yourself to spot gaps.
Keep a design journal with recurring patterns (caching strategies, idempotency, queues).
Rehearse common scenarios: news feed, rate limiter, URL shortener, chat, notifications, search, analytics pipeline.
Turn designs into small artifacts you can reuse: API templates, capacity cheat sheets, and a quick checklist.

To put these steps into practice and find platforms for mock interviews and hands-on exercises, check our guide to learn coding platforms that lists resources for practicing system design problems.

Common Mistakes to Avoid

Designing the whole company: Focus on one slice that shows depth.
Ignoring non-functional requirements: Talk latency, availability, and durability.
Over-optimizing early: Start with a clean baseline, then scale.
Hand-waving data: Share basic back-of-the-envelope estimates.
Vague trade-offs: Name the cost of your choice and why it’s acceptable.
Skipping observability and safety: Always include metrics, logs, rate limits, and basic security.

Key Takeaways

System design is about clarity, trade-offs, and communication, not buzzwords.
Start with scope and requirements, then APIs/data, then a simple architecture you can scale.
Use caches, queues, and CDNs to meet latency and throughput goals.
Be explicit about consistency choices and where eventual consistency is acceptable.
Practice with a repeatable framework and keep improving your artifacts.

Conclusion

System design interviews reward simple, clear thinking. Focus on the user flow you’re building, define functional and non-functional requirements, sketch clean APIs and data models, present a baseline architecture, and then layer scale and reliability with clear trade-offs. With practice and a steady structure, you’ll walk into every Interview ready to show sound engineering judgment, connecting Coding skills to real-world systems that serve millions.

DEV Community