Alyssa

Posted on Jan 14

I Stopped “Scaling Fast” and Started Designing Failure — Here’s What Changed

#discuss #architecture #design #api

Designer, developer, & entrepreneur. Founder of Screenity + other ventures.

For a long time, I thought my job as a developer was to make systems work.

Now I believe my real job is to make systems fail clearly.

This article is about a shift that quietly changed how I design architectures, APIs, and even teams — especially when building complex products that look fine… right until they don’t.

This is not a beginner post.
This is about decisions you only care about after you’ve shipped, broken things, fixed them at 3 a.m., and promised yourself “never again.”

The Problem: Complexity Hides Behind “Working Code”

Most systems don’t fail because of bad code.

They fail because:

assumptions are implicit
constraints are undocumented
failure modes are invisible
success paths are optimized, failure paths are ignored

In early versions of one of my products,

everything “worked”:

APIs responded
UI was smooth
metrics were green

Until a single edge case cascaded into:

partial data corruption
retries amplifying load
logs that told stories, not truth

The system didn’t fail loudly.
It failed politely.
That was the real problem.

The Shift: Designing for Failure First

Instead of asking:

“How do we make this scalable?”

I started asking:

“How does this break — and how do we know immediately?”

This led me to adopt a few non-negotiable design principles.

1. Constraints Are Features, Not Limitations

Every complex system has constraints.
The mistake is pretending they don’t exist.

Examples of explicit constraints I now write down before coding:

Maximum request size (hard fail, not best-effort)
Acceptable staleness of data
Timeout budgets per dependency
Retry limits (with exponential backoff or none at all)
Ownership boundaries (this service does not fix that service’s bugs)

If a constraint isn’t explicit, it becomes folklore.
Folklore doesn’t survive outages.

2. Failure Modes Must Be Named

If you can’t name how something fails, you can’t reason about it.

I now document failure modes like this:

Upstream unavailable → return cached degraded response
Partial write success → emit compensating event
Client misuse → reject loudly with actionable error
Unknown state → stop processing, alert humans

This isn’t pessimism.
This is engineering honesty.

3. Observability Is Not Logging

Logs are narratives.
Metrics are aggregates.
Traces are timelines.

None of them alone tell the truth.

For critical paths, I ask:

What signal tells me this is broken?
How long between breakage and detection?
Can I tell who is affected without guessing?

If the answer is “we’ll inspect logs,” the system is lying to me.

APIs Should Be Unforgiving (to Protect the System)

“Be liberal in what you accept” sounds nice — until it becomes technical debt with interest.

I now design APIs that:

validate aggressively
reject ambiguous input
return errors that explain what to fix, not just what failed

Kind APIs protect users.
Strict APIs protect systems.
Great APIs do both.

5. Teams Are Part of the Architecture

This one took me the longest to accept.

If:

ownership is fuzzy
responsibility is shared by everyone
failures are “someone else’s layer”

Then the system will reflect that ambiguity.

Clear ownership boundaries reduce:

silent failures
duplicated fixes
emotional load during incidents

Technical architecture and social architecture are inseparable.

What Changed for Me

After adopting this mindset:

incidents became rarer, but more importantly, shorter
debugging went from “what is happening?” to “this exact thing failed”
onboarding new developers became faster
my own cognitive load dropped significantly

The system didn’t become simpler.
It became more honest.

Conclusion

Complexity is unavoidable.
Confusion is optional.

Designing for failure doesn’t make you negative.
It makes you reliable.

If your system fails:

clearly
quickly
and in ways you already understand

You’re doing something right.

DEV Community