Daniel R. Foster for OptyxStack

Posted on Jan 19

Architecture Under Load #1 - Stop Collecting Architecture Patterns — Find the Constraint

#systemdesign #performanceengineering #architecture #scalability

Architecture Under Load #1

Stop Collecting Architecture Patterns — Find the Constraint

If an architecture pattern “worked” once but failed later,

the pattern wasn’t wrong.

You just applied it to the wrong constraint.

Most systems don’t slow down because they scale.

They slow down because their operating conditions change.

This is what it usually looks like in practice:

Traffic grows steadily
Deploys keep succeeding
Error rates stay low
But P95 / P99 latency quietly gets worse

So teams respond the only way they know how:

Add caching
Add queues
Split services
Scale horizontally

Sometimes it helps.

Sometimes nothing changes.

Sometimes things get worse.

That’s not because the patterns are bad.

It’s because patterns don’t scale systems — they relieve constraints.

The pattern-collecting trap

A very common belief in system design is:

“If we apply enough best practices, scalability will emerge.”

In reality:

Every system has one dominant constraint at any given time
Architecture patterns only help when they directly reduce pressure on that constraint
Applying the wrong pattern adds complexity without relief

Scalability is not about how many patterns you use.

It’s about whether work stops waiting.

A better question than “Which pattern should we use?”

Before choosing any pattern, ask this instead:

Where does work wait when the system is under load?

Waiting is where latency is created.

Common places where work waits:

CPU scheduling
Thread pools
Connection pools
Queues
Locks
Network calls
External dependencies

Wherever work waits, tail latency grows first.

Architecture patterns exist to:

Move that waiting
Shorten it
Or eliminate it entirely

Nothing more.

Example: Stateless services (what they actually solve)

Stateless services are often presented as a universal scalability solution.

They are not.

What statelessness actually fixes

Compute contention
Slow recovery during spikes or failures
Inflexible scaling behavior

When it helps

CPU or request concurrency is the constraint
You need fast replacement and fast scale-up
Deploys or failures currently amplify latency

When it doesn’t

Data access is the bottleneck
Downstream pools are already saturated
State simply moved elsewhere without control

Statelessness helps only if compute was the constraint.

Example: Queues (what they really do)

Queues are another frequently misunderstood pattern.

Queues don’t make systems faster.

They make systems honest about backlog.

When queues help

Absorbing bursts
Decoupling asynchronous work
Protecting user-facing paths

When queues hurt

Added directly into synchronous request paths
Used to hide slow dependencies
Operated without visibility into depth or age

A queue always adds latency.

The question is whether that latency is acceptable, bounded, and visible.

Why patterns “stop working” over time

Patterns don’t degrade.

Operating conditions do.

As systems grow, these change:

Traffic shape
Concurrency
Data distribution
Dependency behavior
Failure frequency

A pattern that relieved yesterday’s constraint may:

Become irrelevant
Shift the bottleneck elsewhere
Expose a more expensive constraint downstream

That’s why scalable systems aren’t designed once.

They are continuously re-aligned to their active constraint.

Architecture under load is about tradeoffs, not ideals

Every architectural decision trades one thing for another:

Latency vs throughput
Simplicity vs flexibility
Cost vs predictability
Isolation vs efficiency

Those tradeoffs only make sense if you separate:

Scalability (how systems behave as load grows)
Performance (how fast they respond)
Reliability (how they behave under failure)

They break in different ways — and confusing them leads to bad decisions.

👉 That distinction matters enough that it deserves a deeper dive:

Architecture Under Load #2: Scalability vs Performance vs Reliability (What Breaks First)

(link this to the DEV.to post for #2)

The real architecture skill

Not knowing more patterns.

Not picking trendier tools.

The real skill is:

Identifying the active constraint
Measuring where latency accumulates
Applying the minimum change that relieves it
Re-validating under real load

Most production systems don’t fail suddenly.

They fail quietly, one constraint at a time.

Part of the Architecture Under Load series

#1 Stop Collecting Architecture Patterns — Find the Constraint
#2 Scalability vs Performance vs Reliability (What Breaks First)

DEV Community