Thinking in First Principles:
Most system design interview failures are not caused by missing knowledge of tools.
They are caused by missing questions.
Strong candidates do not start by designing systems.
They start by interrogating the problem.
This post teaches you:
- How to question a system from first principles
- How to apply that questioning live in an interview
- What mistakes candidates commonly make
- A printable one-page checklist you can memorize and reuse
No prior system design experience required.
What “First Principles” Means in System Design
First principles means reducing a problem to fundamental truths that must always hold, regardless of:
- Programming language
- Framework
- Infrastructure
- Scale
Every system—chat apps, payment systems, video processing pipelines—must answer the same core questions about:
- State
- Time
- Failure
- Order
- Scale
If a design cannot answer one of these, it is incomplete.
The 5-Step First-Principles Questioning Framework
You will apply these questions in order.
- State – Where does information live? When is it durable?
- Time – How long does each step take?
- Failure – What breaks independently?
- Order – What defines correct sequence?
- Scale – What grows fastest under load?
This is not a checklist you recite.
It is a thinking sequence.
Let’s walk through each one.
1. State — Where Does It Live? When Is It Durable?
The Question
Where does the system’s information exist, and when is it safe from loss?
This is always the first question because nothing else matters if data disappears.
What You’re Really Asking
- Is data stored in memory or persisted?
- What survives a crash or restart?
- What is the source of truth?
Example Case
Imagine a system that accepts user requests and processes them later.
If the request only lives in memory:
- A restart loses it
- A crash loses it
- Another instance can’t see it
You have discovered a correctness problem, not a performance one.
Key Insight
If state only exists in a running process, it does not exist.
2. Time — How Long Does Each Step Take?
Once state exists, time becomes unavoidable.
The Question
Which steps are fast, and which are slow?
You are comparing orders of magnitude, not exact numbers.
What You’re Really Asking
- Is there long-running work?
- Does the user wait for it?
- Is fast work blocked by slow work?
Example Case
A system:
- Accepts a request (milliseconds)
- Performs heavy processing (seconds)
If the request waits for processing:
- Latency is dominated by the slowest step
- Throughput collapses under load
Key Insight
The slowest step defines the user experience.
3. Failure — What Breaks Independently?
Now assume something goes wrong. It always will.
The Question
Which parts of the system can fail without the others failing?
What You’re Really Asking
- What if the system crashes mid-operation?
- What if work is retried?
- Can the same work run twice?
Example Case
If work can be retried:
- It may run twice
- Side effects may duplicate
- State may become inconsistent
This is not a bug.
It is the default behavior of distributed systems.
Key Insight
Distributed systems fail partially, not cleanly.
4. Order — What Defines Correct Sequence?
Ordering issues appear only after state, time, and failure are considered.
The Question
Does correctness depend on the order of operations?
What You’re Really Asking
- Does arrival order equal processing order?
- Can later work finish earlier?
- Does that matter?
Example Case
Two requests arrive:
- A then B
If B completes before A:
- Is the system still correct?
If the answer is “no,” order must be explicitly enforced.
Key Insight
If order matters, it must be designed—not assumed.
5. Scale — What Grows Fastest?
Only now do we talk about scale.
The Question
As usage increases, which dimension grows fastest?
What You’re Really Asking
- Requests?
- Stored data?
- Concurrent operations?
- Waiting work?
Example Case
If each request waits on slow work:
- Concurrent waiting grows with latency
- Resources exhaust quickly
Key Insight
Systems fail at the fastest-growing dimension.
Live Mock Interview Case Study (Detailed)
Interviewer
“Design a system where users submit tasks and receive results later.”
Candidate (Correct Approach)
Candidate:
Before designing, I’d like to understand what state the system must preserve.
Step 1: State
Candidate:
We must store:
- The user’s request
- The result
- A way to associate them
This state must survive crashes, so it needs to be persisted.
Interviewer:
Good. Continue.
Step 2: Time
Candidate:
Submitting a request is likely fast.
Producing a result could be slow.
If we make users wait for result generation, latency will be high and throughput limited.
So the system likely separates request acceptance from processing.
Step 3: Failure
Candidate:
Now I’ll assume failures.
If processing crashes mid-way:
- The request still exists
- Processing may retry
That means the same task could execute twice.
So we must consider whether duplicate execution is safe.
Step 4: Order
Candidate:
If users submit multiple tasks:
- Does order matter?
If yes:
- Arrival order ≠ completion order
- We need to explicitly preserve sequence
If no:
- Tasks can be processed independently
Step 5: Scale
Candidate:
Under load, the fastest-growing dimension is:
- Pending background work
If processing is slow, the backlog grows quickly.
So the system must degrade gracefully under that pressure.
Interviewer Assessment
The candidate:
- Asked structured questions
- Identified real failure modes
- Avoided premature tools
- Demonstrated systems thinking
No tools were required to pass this interview.
Common Mistakes Candidates Make
1. Jumping to Solutions
❌ “We’ll use Kafka”
✅ “What happens if work runs twice?”
2. Treating State as Implementation Detail
❌ “We’ll store it somewhere”
✅ “What must never be lost?”
3. Ignoring Failure
❌ “Retries should work”
✅ “What if retries duplicate effects?”
4. Assuming Order
❌ “Requests are processed in order”
✅ “What enforces that order?”
5. Talking About Scale Too Early
❌ “Millions of users”
✅ “Which dimension explodes first?”
Printable One-Page Interview Checklist
You can print or memorize this.
First-Principles System Design Checklist
Ask these in order:
- State
- What information must exist?
- Where does it live?
- When is it durable?
- Time
- Which steps are fast?
- Which are slow?
- Does slow work block fast work?
- Failure
- What can fail independently?
- Can work be retried?
- What happens if it runs twice?
- Order
- Does correctness depend on sequence?
- Is arrival order preserved?
- What enforces ordering?
- Scale
- What grows fastest?
- How does the system fail under load?
Final Mental Model
Great system design is not about building systems.
It is about exposing hidden assumptions.
This framework helps you do that—calmly, systematically, and convincingly.
Top comments (0)