Why This Matters
In system design interviews, candidates often fail not because they don’t know tools, but because they don’t know how to ask the right questions.
Strong designers:
- Discover problems before proposing solutions
- Reason about failures without running systems
- Explain why an architecture breaks under load
This post teaches how to identify system design problems from first principles, using a repeatable questioning process.
First Principles: What Are We Actually Designing?
Before any architecture, clarify this:
A system is a machine that accepts requests and holds resources until it finishes work.
So every design must answer:
- What work must be done?
- How long does it take?
- What resources are held while it runs?
- What happens when things slow or fail?
The Root Question (Always Start Here)
What must the system finish before it can respond?
This is the first and most important question.
Why?
- It defines request boundaries
- It determines latency
- It determines failure coupling
- It determines scalability
If you don’t answer this explicitly, the system will answer it implicitly — usually incorrectly.
The Question Ladder (Mental Checklist)
Once the root question is answered, follow this exact sequence:
1️⃣ What defines request completion?
- When is the request “done”?
- What must succeed before responding?
2️⃣ Which step is the slowest?
- Database write? (ms)
- Network call? (100s ms)
- External service / ML model? (seconds)
The slowest step dominates the system.
3️⃣ What resources are held while waiting?
Ask concretely:
- Is a goroutine/thread blocked?
- Is an HTTP connection open?
- Is memory retained?
- Is a DB connection reserved?
If resources are held → risk exists.
4️⃣ What scales with traffic vs latency?
This is the diagnostic question that exposes blocking.
Does resource usage grow when traffic increases — or when latency increases?
This distinction is critical.
The Core Principle (Memorize This)
Healthy systems scale with traffic.
Broken systems scale with latency.
Latency is unpredictable and unbounded.
Traffic can be controlled.
If your system scales with latency, it will collapse under real-world conditions.
Case Study 1: Synchronous API with External Dependency
Scenario
Client → API → External Service → API → Client
The API waits for the external service before responding.
Apply the Question Ladder
1. What defines completion?
→ External service response
2. Slowest step?
→ External service (seconds)
3. Resources held?
→ Open HTTP request, goroutine, memory
4. What scales with latency?
Example:
- 10 requests/sec
- External latency = 5 sec
After 5 seconds:
- 50 concurrent requests
- 50 goroutines blocked
Latency increases → concurrency explodes.
Failure Identified
Concurrency scales with latency, not traffic
This is blocking — even if traffic is low.
Case Study 2: Database Transaction Around Long Work
Scenario
Begin Transaction
→ Write data
→ Call external service
→ Write result
→ Commit
Apply the Questions
Completion defined by: external service
Slowest step: external service
Resources held: DB transaction, locks, connection
Scaling factor: external latency
Failure Identified
Slow work is holding scarce resources
This leads to:
- Lock contention
- Connection pool exhaustion
- Cascading failures
Case Study 3: Real-Time Streaming System
Scenario
Client ⇄ WebSocket ⇄ Server ⇄ Generator
Tokens stream incrementally.
Apply the Questions
Completion defined by: final token
Slowest step: generation duration
Resources held: socket, memory buffers
Latency impact: long streams = long-held connections
New Failure Mode
Intermediate states now exist
Questions arise:
- What if connection drops mid-stream?
- What defines “done”?
- Can streams overlap?
Streaming solves latency perception but introduces state complexity.
Why These Problems Are Found Before Architecture
Notice:
- No queues
- No databases
- No technologies mentioned
Problems emerged purely by reasoning about time, state, and resources.
This is why first-principles thinking works across:
- Chat systems
- Payments
- Notifications
- File uploads
- ML inference pipelines
The One-Page Interview Checklist
Use this on any system:
- What must complete before responding?
- What is the slowest step?
- What resources are held while waiting?
- What grows when latency increases?
- Can failures occur independently?
- What happens under partial completion?
If latency appears in #4 → design flaw detected
Final Takeaway
Blocking is not a performance issue.
It is a semantic mistake about responsibility boundaries.
The system incorrectly believes:
“I must finish long work before I can reply.”
Great system designers question that belief first — before proposing solutions.
Use this post as a thinking reference, not a pattern catalog.
If you can explain these questions clearly in an interview, you are already in the top tier.
Top comments (0)