DEV Community

Daniel R. Foster for OptyxStack

Posted on

Architecture Under Load #1 - Stop Collecting Architecture Patterns — Find the Constraint

Architecture Under Load #1

Stop Collecting Architecture Patterns — Find the Constraint

If an architecture pattern “worked” once but failed later,

the pattern wasn’t wrong.

You just applied it to the wrong constraint.


Most systems don’t slow down because they scale.

They slow down because their operating conditions change.

This is what it usually looks like in practice:

  • Traffic grows steadily
  • Deploys keep succeeding
  • Error rates stay low
  • But P95 / P99 latency quietly gets worse

So teams respond the only way they know how:

  • Add caching
  • Add queues
  • Split services
  • Scale horizontally

Sometimes it helps.

Sometimes nothing changes.

Sometimes things get worse.

That’s not because the patterns are bad.

It’s because patterns don’t scale systems — they relieve constraints.


The pattern-collecting trap

A very common belief in system design is:

“If we apply enough best practices, scalability will emerge.”

In reality:

  • Every system has one dominant constraint at any given time
  • Architecture patterns only help when they directly reduce pressure on that constraint
  • Applying the wrong pattern adds complexity without relief

Scalability is not about how many patterns you use.

It’s about whether work stops waiting.


A better question than “Which pattern should we use?”

Before choosing any pattern, ask this instead:

Where does work wait when the system is under load?

Waiting is where latency is created.

Common places where work waits:

  • CPU scheduling
  • Thread pools
  • Connection pools
  • Queues
  • Locks
  • Network calls
  • External dependencies

Wherever work waits, tail latency grows first.

Architecture patterns exist to:

  • Move that waiting
  • Shorten it
  • Or eliminate it entirely

Nothing more.


Example: Stateless services (what they actually solve)

Stateless services are often presented as a universal scalability solution.

They are not.

What statelessness actually fixes

  • Compute contention
  • Slow recovery during spikes or failures
  • Inflexible scaling behavior

When it helps

  • CPU or request concurrency is the constraint
  • You need fast replacement and fast scale-up
  • Deploys or failures currently amplify latency

When it doesn’t

  • Data access is the bottleneck
  • Downstream pools are already saturated
  • State simply moved elsewhere without control

Statelessness helps only if compute was the constraint.


Example: Queues (what they really do)

Queues are another frequently misunderstood pattern.

Queues don’t make systems faster.

They make systems honest about backlog.

When queues help

  • Absorbing bursts
  • Decoupling asynchronous work
  • Protecting user-facing paths

When queues hurt

  • Added directly into synchronous request paths
  • Used to hide slow dependencies
  • Operated without visibility into depth or age

A queue always adds latency.

The question is whether that latency is acceptable, bounded, and visible.


Why patterns “stop working” over time

Patterns don’t degrade.

Operating conditions do.

As systems grow, these change:

  • Traffic shape
  • Concurrency
  • Data distribution
  • Dependency behavior
  • Failure frequency

A pattern that relieved yesterday’s constraint may:

  • Become irrelevant
  • Shift the bottleneck elsewhere
  • Expose a more expensive constraint downstream

That’s why scalable systems aren’t designed once.

They are continuously re-aligned to their active constraint.


Architecture under load is about tradeoffs, not ideals

Every architectural decision trades one thing for another:

  • Latency vs throughput
  • Simplicity vs flexibility
  • Cost vs predictability
  • Isolation vs efficiency

Those tradeoffs only make sense if you separate:

  • Scalability (how systems behave as load grows)
  • Performance (how fast they respond)
  • Reliability (how they behave under failure)

They break in different ways — and confusing them leads to bad decisions.

👉 That distinction matters enough that it deserves a deeper dive:

Architecture Under Load #2: Scalability vs Performance vs Reliability (What Breaks First)

(link this to the DEV.to post for #2)


The real architecture skill

Not knowing more patterns.

Not picking trendier tools.

The real skill is:

  • Identifying the active constraint
  • Measuring where latency accumulates
  • Applying the minimum change that relieves it
  • Re-validating under real load

Most production systems don’t fail suddenly.

They fail quietly, one constraint at a time.


Part of the Architecture Under Load series

  • #1 Stop Collecting Architecture Patterns — Find the Constraint
  • #2 Scalability vs Performance vs Reliability (What Breaks First)

Top comments (0)