DEV Community

Mohammad-Idrees
Mohammad-Idrees

Posted on

How to Question Any System Design Problem (With Live Interview Walkthrough)

Thinking in First Principles:

Most system design interview failures are not caused by missing knowledge of tools.

They are caused by missing questions.

Strong candidates do not start by designing systems.
They start by interrogating the problem.

This post teaches you:

  • How to question a system from first principles
  • How to apply that questioning live in an interview
  • What mistakes candidates commonly make
  • A printable one-page checklist you can memorize and reuse

No prior system design experience required.


What “First Principles” Means in System Design

First principles means reducing a problem to fundamental truths that must always hold, regardless of:

  • Programming language
  • Framework
  • Infrastructure
  • Scale

Every system—chat apps, payment systems, video processing pipelines—must answer the same core questions about:

  • State
  • Time
  • Failure
  • Order
  • Scale

If a design cannot answer one of these, it is incomplete.


The 5-Step First-Principles Questioning Framework

You will apply these questions in order.

  1. State – Where does information live? When is it durable?
  2. Time – How long does each step take?
  3. Failure – What breaks independently?
  4. Order – What defines correct sequence?
  5. Scale – What grows fastest under load?

This is not a checklist you recite.
It is a thinking sequence.

Let’s walk through each one.


1. State — Where Does It Live? When Is It Durable?

The Question

Where does the system’s information exist, and when is it safe from loss?

This is always the first question because nothing else matters if data disappears.

What You’re Really Asking

  • Is data stored in memory or persisted?
  • What survives a crash or restart?
  • What is the source of truth?

Example Case

Imagine a system that accepts user requests and processes them later.

If the request only lives in memory:

  • A restart loses it
  • A crash loses it
  • Another instance can’t see it

You have discovered a correctness problem, not a performance one.

Key Insight

If state only exists in a running process, it does not exist.


2. Time — How Long Does Each Step Take?

Once state exists, time becomes unavoidable.

The Question

Which steps are fast, and which are slow?

You are comparing orders of magnitude, not exact numbers.

What You’re Really Asking

  • Is there long-running work?
  • Does the user wait for it?
  • Is fast work blocked by slow work?

Example Case

A system:

  • Accepts a request (milliseconds)
  • Performs heavy processing (seconds)

If the request waits for processing:

  • Latency is dominated by the slowest step
  • Throughput collapses under load

Key Insight

The slowest step defines the user experience.


3. Failure — What Breaks Independently?

Now assume something goes wrong. It always will.

The Question

Which parts of the system can fail without the others failing?

What You’re Really Asking

  • What if the system crashes mid-operation?
  • What if work is retried?
  • Can the same work run twice?

Example Case

If work can be retried:

  • It may run twice
  • Side effects may duplicate
  • State may become inconsistent

This is not a bug.
It is the default behavior of distributed systems.

Key Insight

Distributed systems fail partially, not cleanly.


4. Order — What Defines Correct Sequence?

Ordering issues appear only after state, time, and failure are considered.

The Question

Does correctness depend on the order of operations?

What You’re Really Asking

  • Does arrival order equal processing order?
  • Can later work finish earlier?
  • Does that matter?

Example Case

Two requests arrive:

  • A then B

If B completes before A:

  • Is the system still correct?

If the answer is “no,” order must be explicitly enforced.

Key Insight

If order matters, it must be designed—not assumed.


5. Scale — What Grows Fastest?

Only now do we talk about scale.

The Question

As usage increases, which dimension grows fastest?

What You’re Really Asking

  • Requests?
  • Stored data?
  • Concurrent operations?
  • Waiting work?

Example Case

If each request waits on slow work:

  • Concurrent waiting grows with latency
  • Resources exhaust quickly

Key Insight

Systems fail at the fastest-growing dimension.


Live Mock Interview Case Study (Detailed)

Interviewer

“Design a system where users submit tasks and receive results later.”


Candidate (Correct Approach)

Candidate:
Before designing, I’d like to understand what state the system must preserve.


Step 1: State

Candidate:
We must store:

  • The user’s request
  • The result
  • A way to associate them

This state must survive crashes, so it needs to be persisted.

Interviewer:
Good. Continue.


Step 2: Time

Candidate:
Submitting a request is likely fast.
Producing a result could be slow.

If we make users wait for result generation, latency will be high and throughput limited.

So the system likely separates request acceptance from processing.


Step 3: Failure

Candidate:
Now I’ll assume failures.

If processing crashes mid-way:

  • The request still exists
  • Processing may retry

That means the same task could execute twice.

So we must consider whether duplicate execution is safe.


Step 4: Order

Candidate:
If users submit multiple tasks:

  • Does order matter?

If yes:

  • Arrival order ≠ completion order
  • We need to explicitly preserve sequence

If no:

  • Tasks can be processed independently

Step 5: Scale

Candidate:
Under load, the fastest-growing dimension is:

  • Pending background work

If processing is slow, the backlog grows quickly.

So the system must degrade gracefully under that pressure.


Interviewer Assessment

The candidate:

  • Asked structured questions
  • Identified real failure modes
  • Avoided premature tools
  • Demonstrated systems thinking

No tools were required to pass this interview.


Common Mistakes Candidates Make

1. Jumping to Solutions

❌ “We’ll use Kafka”
✅ “What happens if work runs twice?”

2. Treating State as Implementation Detail

❌ “We’ll store it somewhere”
✅ “What must never be lost?”

3. Ignoring Failure

❌ “Retries should work”
✅ “What if retries duplicate effects?”

4. Assuming Order

❌ “Requests are processed in order”
✅ “What enforces that order?”

5. Talking About Scale Too Early

❌ “Millions of users”
✅ “Which dimension explodes first?”


Printable One-Page Interview Checklist

You can print or memorize this.


First-Principles System Design Checklist

Ask these in order:

  1. State
  • What information must exist?
  • Where does it live?
  • When is it durable?
  1. Time
  • Which steps are fast?
  • Which are slow?
  • Does slow work block fast work?
  1. Failure
  • What can fail independently?
  • Can work be retried?
  • What happens if it runs twice?
  1. Order
  • Does correctness depend on sequence?
  • Is arrival order preserved?
  • What enforces ordering?
  1. Scale
  • What grows fastest?
  • How does the system fail under load?

Final Mental Model

Great system design is not about building systems.
It is about exposing hidden assumptions.

This framework helps you do that—calmly, systematically, and convincingly.

Top comments (0)