Mohammad-Idrees

Posted on Jan 13

How to Identify System Design Problems from First Principles

#architecture #interview #systemdesign #tutorial

Why This Matters

In system design interviews, candidates often fail not because they don’t know tools, but because they don’t know how to ask the right questions.

Strong designers:

Discover problems before proposing solutions
Reason about failures without running systems
Explain why an architecture breaks under load

This post teaches how to identify system design problems from first principles, using a repeatable questioning process.

First Principles: What Are We Actually Designing?

Before any architecture, clarify this:

A system is a machine that accepts requests and holds resources until it finishes work.

So every design must answer:

What work must be done?
How long does it take?
What resources are held while it runs?
What happens when things slow or fail?

The Root Question (Always Start Here)

What must the system finish before it can respond?

This is the first and most important question.

Why?

It defines request boundaries
It determines latency
It determines failure coupling
It determines scalability

If you don’t answer this explicitly, the system will answer it implicitly — usually incorrectly.

The Question Ladder (Mental Checklist)

Once the root question is answered, follow this exact sequence:

1️⃣ What defines request completion?

When is the request “done”?
What must succeed before responding?

2️⃣ Which step is the slowest?

Database write? (ms)
Network call? (100s ms)
External service / ML model? (seconds)

The slowest step dominates the system.

3️⃣ What resources are held while waiting?

Ask concretely:

Is a goroutine/thread blocked?
Is an HTTP connection open?
Is memory retained?
Is a DB connection reserved?

If resources are held → risk exists.

4️⃣ What scales with traffic vs latency?

This is the diagnostic question that exposes blocking.

Does resource usage grow when traffic increases — or when latency increases?

This distinction is critical.

The Core Principle (Memorize This)

Healthy systems scale with traffic.
Broken systems scale with latency.

Latency is unpredictable and unbounded.
Traffic can be controlled.

If your system scales with latency, it will collapse under real-world conditions.

Case Study 1: Synchronous API with External Dependency

Scenario

Client → API → External Service → API → Client

The API waits for the external service before responding.

Apply the Question Ladder

1. What defines completion?
→ External service response

2. Slowest step?
→ External service (seconds)

3. Resources held?
→ Open HTTP request, goroutine, memory

4. What scales with latency?

Example:

10 requests/sec
External latency = 5 sec

After 5 seconds:

50 concurrent requests
50 goroutines blocked

Latency increases → concurrency explodes.

Failure Identified

Concurrency scales with latency, not traffic

This is blocking — even if traffic is low.

Case Study 2: Database Transaction Around Long Work

Scenario

Begin Transaction
→ Write data
→ Call external service
→ Write result
→ Commit

Apply the Questions

Completion defined by: external service
Slowest step: external service
Resources held: DB transaction, locks, connection
Scaling factor: external latency

Failure Identified

Slow work is holding scarce resources

This leads to:

Lock contention
Connection pool exhaustion
Cascading failures

Case Study 3: Real-Time Streaming System

Scenario

Client ⇄ WebSocket ⇄ Server ⇄ Generator

Tokens stream incrementally.

Apply the Questions

Completion defined by: final token
Slowest step: generation duration
Resources held: socket, memory buffers
Latency impact: long streams = long-held connections

New Failure Mode

Intermediate states now exist

Questions arise:

What if connection drops mid-stream?
What defines “done”?
Can streams overlap?

Streaming solves latency perception but introduces state complexity.

Why These Problems Are Found Before Architecture

Notice:

No queues
No databases
No technologies mentioned

Problems emerged purely by reasoning about time, state, and resources.

This is why first-principles thinking works across:

Chat systems
Payments
Notifications
File uploads
ML inference pipelines

The One-Page Interview Checklist

Use this on any system:

What must complete before responding?
What is the slowest step?
What resources are held while waiting?
What grows when latency increases?
Can failures occur independently?
What happens under partial completion?

If latency appears in #4 → design flaw detected

Final Takeaway

Blocking is not a performance issue.
It is a semantic mistake about responsibility boundaries.

The system incorrectly believes:

“I must finish long work before I can reply.”

Great system designers question that belief first — before proposing solutions.

Use this post as a thinking reference, not a pattern catalog.
If you can explain these questions clearly in an interview, you are already in the top tier.

DEV Community

How to Identify System Design Problems from First Principles

Why This Matters

First Principles: What Are We Actually Designing?

The Root Question (Always Start Here)

What must the system finish before it can respond?

The Question Ladder (Mental Checklist)

1️⃣ What defines request completion?

2️⃣ Which step is the slowest?

3️⃣ What resources are held while waiting?

4️⃣ What scales with traffic vs latency?

The Core Principle (Memorize This)

Case Study 1: Synchronous API with External Dependency

Scenario

Apply the Question Ladder

Failure Identified

Case Study 2: Database Transaction Around Long Work

Scenario

Apply the Questions

Failure Identified

Case Study 3: Real-Time Streaming System

Scenario

Apply the Questions

New Failure Mode

Why These Problems Are Found Before Architecture

The One-Page Interview Checklist

Final Takeaway

Top comments (0)