DEV Community

Mohammad-Idrees
Mohammad-Idrees

Posted on

How to Identify System Design Problems from First Principles

Why This Matters

In system design interviews, candidates often fail not because they don’t know tools, but because they don’t know how to ask the right questions.

Strong designers:

  • Discover problems before proposing solutions
  • Reason about failures without running systems
  • Explain why an architecture breaks under load

This post teaches how to identify system design problems from first principles, using a repeatable questioning process.


First Principles: What Are We Actually Designing?

Before any architecture, clarify this:

A system is a machine that accepts requests and holds resources until it finishes work.

So every design must answer:

  • What work must be done?
  • How long does it take?
  • What resources are held while it runs?
  • What happens when things slow or fail?

The Root Question (Always Start Here)

What must the system finish before it can respond?

This is the first and most important question.

Why?

  • It defines request boundaries
  • It determines latency
  • It determines failure coupling
  • It determines scalability

If you don’t answer this explicitly, the system will answer it implicitly — usually incorrectly.


The Question Ladder (Mental Checklist)

Once the root question is answered, follow this exact sequence:

1️⃣ What defines request completion?

  • When is the request “done”?
  • What must succeed before responding?

2️⃣ Which step is the slowest?

  • Database write? (ms)
  • Network call? (100s ms)
  • External service / ML model? (seconds)

The slowest step dominates the system.


3️⃣ What resources are held while waiting?

Ask concretely:

  • Is a goroutine/thread blocked?
  • Is an HTTP connection open?
  • Is memory retained?
  • Is a DB connection reserved?

If resources are held → risk exists.


4️⃣ What scales with traffic vs latency?

This is the diagnostic question that exposes blocking.

Does resource usage grow when traffic increases — or when latency increases?

This distinction is critical.


The Core Principle (Memorize This)

Healthy systems scale with traffic.
Broken systems scale with latency.

Latency is unpredictable and unbounded.
Traffic can be controlled.

If your system scales with latency, it will collapse under real-world conditions.


Case Study 1: Synchronous API with External Dependency

Scenario

Client → API → External Service → API → Client
Enter fullscreen mode Exit fullscreen mode

The API waits for the external service before responding.


Apply the Question Ladder

1. What defines completion?
→ External service response

2. Slowest step?
→ External service (seconds)

3. Resources held?
→ Open HTTP request, goroutine, memory

4. What scales with latency?

Example:

  • 10 requests/sec
  • External latency = 5 sec

After 5 seconds:

  • 50 concurrent requests
  • 50 goroutines blocked

Latency increases → concurrency explodes.


Failure Identified

Concurrency scales with latency, not traffic

This is blocking — even if traffic is low.


Case Study 2: Database Transaction Around Long Work

Scenario

Begin Transaction
→ Write data
→ Call external service
→ Write result
→ Commit
Enter fullscreen mode Exit fullscreen mode

Apply the Questions

Completion defined by: external service
Slowest step: external service
Resources held: DB transaction, locks, connection
Scaling factor: external latency


Failure Identified

Slow work is holding scarce resources

This leads to:

  • Lock contention
  • Connection pool exhaustion
  • Cascading failures

Case Study 3: Real-Time Streaming System

Scenario

Client ⇄ WebSocket ⇄ Server ⇄ Generator
Enter fullscreen mode Exit fullscreen mode

Tokens stream incrementally.


Apply the Questions

Completion defined by: final token
Slowest step: generation duration
Resources held: socket, memory buffers
Latency impact: long streams = long-held connections


New Failure Mode

Intermediate states now exist

Questions arise:

  • What if connection drops mid-stream?
  • What defines “done”?
  • Can streams overlap?

Streaming solves latency perception but introduces state complexity.


Why These Problems Are Found Before Architecture

Notice:

  • No queues
  • No databases
  • No technologies mentioned

Problems emerged purely by reasoning about time, state, and resources.

This is why first-principles thinking works across:

  • Chat systems
  • Payments
  • Notifications
  • File uploads
  • ML inference pipelines

The One-Page Interview Checklist

Use this on any system:

  1. What must complete before responding?
  2. What is the slowest step?
  3. What resources are held while waiting?
  4. What grows when latency increases?
  5. Can failures occur independently?
  6. What happens under partial completion?

If latency appears in #4 → design flaw detected


Final Takeaway

Blocking is not a performance issue.
It is a semantic mistake about responsibility boundaries.

The system incorrectly believes:

“I must finish long work before I can reply.”

Great system designers question that belief first — before proposing solutions.


Use this post as a thinking reference, not a pattern catalog.
If you can explain these questions clearly in an interview, you are already in the top tier.

Top comments (0)