Saeed Habibi

Posted on Feb 3 • Originally published at Medium

Reversibility Decays: What Bezos’s Framework Misses About Technical Decisions

#programming #software #systemdesign #designpatterns

The Architecture of Decisions, Part 2

In Part 1, I argued that every technical decision is a bet on a future you cannot fully see. But not all bets are equal. Some bets you can walk back when new information arrives. Others lock you in the moment you commit.

The difference is reversibility. And it matters more than whether your decision is correct.

Jeff Bezos talks about one-way doors and two-way doors. A two-way door is a decision you can reverse. Walk through, look around, walk back if you don’t like what you see. A one-way door locks behind you. The cost of reversal is so high that you effectively cannot reverse.

The insight is simple. The application is not because most decisions don’t announce which type of door they are. And the consequences of getting it wrong compound over the years.

The Spectrum of Reversibility

When engineers talk about reversible decisions, they usually mean “technically possible to undo.” And by that definition, almost everything is reversible. You can rewrite the service. You can migrate the database. You can deprecate the API. Given enough time and resources, you can reverse almost any technical decision.

But that framing misses the point entirely.

The question isn’t whether you can reverse a decision. It’s what reversal actually costs. Time. Money. Team morale. User trust. Opportunity cost of everything else you’re not doing while you’re unwinding a previous choice.

I think of reversibility as a spectrum with several dimensions:

Direct cost of reversal. How much engineering effort does unwinding require? A feature flag toggle is nearly free. A database migration across billions of rows is months of work.

Indirect cost of reversal. What else breaks when you reverse? Changing an internal implementation might have no effect. Changing a public API might break hundreds of integrations you don’t control.

Time sensitivity. Does the reversal cost stay constant or grow? Some decisions get harder to undo over time. Data accumulates. Dependencies form. Teams build on top of your choices.

Coordination cost. Can you reverse unilaterally, or do you need alignment across teams, organizations, or external parties? Solo decisions are cheap to reverse. Decisions that create contracts with others are expensive.

Most technical decisions sit somewhere on this spectrum. The skill is recognizing where and adjusting your decision-making process accordingly.

Why Engineers Misjudge Reversibility

There’s a pattern I’ve seen repeatedly. Engineers treat irreversible decisions as reversible, and reversible decisions as irreversible. Both mistakes are costly, but in different ways.

Treating irreversible decisions as reversible leads to moving too fast on choices that will haunt you. “We can always change it later” becomes the mantra that justifies insufficient analysis. Then, later arrives, and changing it turns out to require a six-month migration that nobody wants to prioritize.

Treating reversible decisions as irreversible leads to analysis paralysis. Teams spend weeks debating choices that could be tested and adjusted in days. They build elaborate decision matrices for problems that would resolve themselves with a quick experiment.

I think the misjudgment happens because engineers focus on technical possibilities rather than practical cost. Yes, you can change the database schema. But will you? When the migration requires coordinated downtime across three services and a month of testing, the answer is usually no.

The decisions that don’t get reversed aren’t the ones that can’t be reversed. They’re the ones where the cost of reversal exceeds the pain of living with the original choice.

The Four Questions

When I’m evaluating any significant technical decision, I ask four questions about reversibility:

What does unwinding this look like?

Not abstractly. Concretely. If we decide this is wrong in six months, what are the actual steps to reverse it? Who needs to be involved? What systems need to change? What data needs to be migrated?

If you can’t articulate a concrete reversal path, you’re probably underestimating the difficulty.

How does the reversal cost compare to the original decision?

If the original decision takes two weeks to implement, and reversal would take two days, that’s a genuine two-way door. If reversal would take six months, you’re looking at something closer to a one-way door, regardless of what you call it.

The ratio matters. A 10:1 reversal-to-implementation ratio should trigger much more careful analysis than a 1:1 ratio.

Does the reversal cost increase over time?

This is the ratchet effect, and it’s the most commonly overlooked dimension. Some decisions have stable reversal costs. Changing an internal algorithm costs about the same whether you do it next month or next year.

But many decisions have increasing reversal costs. The database schema gets harder to change as data accumulates. The API contract gets harder to break as more consumers depend on it. The service boundary gets harder to move as more teams build around it.

For decisions with increasing reversal costs, you have a window. The longer you wait to reverse, the more expensive it becomes. Eventually, the cost exceeds any realistic budget, and the “reversible” decision becomes permanent by default.

What information would make you want to reverse?

This question forces you to think about learning. If you knew X, would you change this decision? If yes, how will you learn X? How long until you know?

Decisions with a short learning timeline are good candidates for fast action. Ship it, learn it, adjust it. Decisions where the learning timeline is long or where you might never know the outcome deserve more upfront analysis.

One-Way Doors in Technical Architecture

Some decisions are genuinely hard to reverse. Not impossible, but expensive enough that reversal is unlikely to happen even when it should.

Public API contracts are the classic example. Once external developers build against your API, changing it breaks their software. You can version it, but now you’ll have to maintain multiple versions indefinitely. You can deprecate it, but “deprecation” often means “running forever because someone important still uses it.”

The API surface you ship becomes a commitment. Not because you can’t change it technically, but because the coordination cost of changing it (communicating with consumers, managing migration timelines, handling the ones who never migrate) exceeds the cost of living with the design.

Database schemas for production data sit in a similar territory. The schema itself is easy to change. The data is the problem. Once you have millions of rows in a particular shape, reshaping them requires migration. Migrations require testing. Testing requires representative data. Large migrations require batching, monitoring, and rollback plans.

I’ve seen teams avoid schema changes for years because the migration effort seemed disproportionate to the benefit. The original schema decision, made when the table had zero rows, became effectively permanent once it had a billion.

Service boundaries can surprise you. They feel reversible because services are just software. You can merge, split, or reorganize them. But service boundaries create organizational boundaries. Teams form around services. Ownership models emerge. Other teams build integrations assuming your service exists.

Changing a service boundary isn’t just a technical change. It’s an organizational change. And organizations are much more complex to refactor than code.

Data deletion is the one truly irreversible decision in computing. Everything else can theoretically be reconstructed given enough effort. Deleted data is gone. (Yes, backups exist. But if you deleted it intentionally, your backups will eventually cycle out too.)

Two-Way Doors Worth Recognizing

On the other end of the spectrum, many decisions are more reversible than teams treat them.

Internal implementation details behind stable interfaces can usually change freely. If your API contract stays constant, consumers don’t care whether you’re using PostgreSQL or MongoDB underneath. The database choice might feel momentous, but if it’s hidden behind a well-designed abstraction, it’s surprisingly reversible.

I’ve seen teams spend months evaluating database options for services that would take two weeks to rewrite entirely if the choice turned out wrong.

Feature flags are explicitly designed to make decisions reversible. Ship the feature to 1% of users. Learn. Adjust. Roll back if needed. The whole point is to reduce the cost of reversal to nearly zero.

Teams that don’t use feature flags effectively are giving up one of the most powerful tools for treating decisions as two-way doors.

Configuration and tuning decisions are usually cheap to change. Timeout values, retry counts, cache sizes, and thread pool configurations. These are knobs, not architecture. Turn them, observe results, adjust.

Yet I’ve seen teams debate configuration decisions with the intensity reserved for fundamental architectural choices. It’s a misallocation of decision-making energy.

Library and framework choices, with good abstraction, can be more reversible than they appear. If your business logic is cleanly separated from your framework, swapping frameworks is localized work. If your business logic is entangled with framework specifics, you’ve accidentally converted a two-way door into a one-way door.

The reversibility of library choices depends on how you use them, not on the library itself.

The Hidden Cost of “Reversible”

Here’s something I’ve noticed that complicates the model: decisions that are technically reversible but practically permanent.

You can reverse them. The engineering cost is reasonable. The coordination is manageable. But you don’t. Because reversing a decision admits the original decision was wrong. Because reversing requires someone to champion the reversal. Because the pain of the current state is distributed across many people, while the effort to reverse it would be concentrated on a few.

These zombie decisions, reversible but never reversed, are everywhere. The “temporary” solution has been running for three years. The migration that was planned but never prioritized. The deprecated system that still handles 30% of traffic.

Reversibility on paper means nothing if the organization lacks the will or the mechanisms actually to reverse decisions when they should be reversed.

I’ve started thinking about this in terms of effective reversibility versus theoretical reversibility. Theoretical reversibility asks, “Can we?” Effective reversibility asks, “Will we, given how our organization actually makes decisions?”

Designing for Reversibility

If reversibility is so valuable, how do you build systems that preserve it?

Put abstraction boundaries around decisions. The narrower the blast radius of a decision, the cheaper the reversal. If your database choice is hidden behind a repository interface, changing databases is localized. If SQL queries are scattered throughout your codebase, you’ve welded yourself to that database.

Every architectural boundary is a potential reversal point. Design boundaries with future reversibility in mind, not just current separation of concerns.

Delay irreversible decisions as long as responsibly possible. Not indefinitely. Not past the point where the decision is needed. But recognize that information arrives over time. A decision made with more information is usually a better decision.

This doesn’t mean avoiding decisions. It means distinguishing between “we need to decide this now” and “we’re deciding this now because deciding feels productive.”

Use the strangler fig pattern for migrations. When you need to reverse a decision that’s become entangled, don’t try to flip it all at once. Build the new approach alongside the old. Migrate traffic gradually. Let the old system shrink until it can be removed.

The strangler fig turns a one-way door into a series of small two-way doors. Each step is reversible, even if the overall direction is committed.

Make reversibility explicit in decision records. When documenting an architectural decision, include the reversal path. What would trigger reconsideration? What would unwinding look like? What’s the expected cost?

Writing this down forces you to think about it. It also creates a record that future teams can reference when they’re considering whether to reverse.

The API Contract We Couldn’t Change

I want to tell you about a decision I got wrong. Not wrong in the “made the wrong choice” sense. Wrong in the “misjudged the reversibility” sense.

We were building a public API for a platform. Moving fast, early stage, lots of uncertainty. The mantra was iterate quickly, learn from users, adjust based on feedback.

We shipped the first version of the API with a response structure that made sense at the time. Resources had IDs, attributes, and relationships. Standard stuff. But we made some choices about naming and nesting that reflected our internal domain model, not the mental model of our API consumers.

Within a few weeks, we realized some of the naming was confusing. The nesting made certain common operations awkward. We had better ideas.

But by the time we had the better design ready, a few dozen developers had built against the original API. Not a huge number. Early adopters, mostly. But they’d written code. Their code worked. Changing our API would break their code.

We had a choice: break the early adopters and ship the better design, or preserve compatibility and live with the awkward design.

We chose compatibility. It felt like the responsible choice. We didn’t want to punish the people who’d trusted us early.

But here’s what I didn’t fully appreciate: that decision was itself an irreversible decision. Every day, we kept the original API, and more developers built against it. The cost of changing grew continuously. What was dozens of developers became hundreds. What was hundreds became thousands.

Three years later, we still had that original API structure. We’d added a “v2” for new resources, but the original endpoints stayed frozen. The awkward naming was documented in tutorials all over the internet. The confusing nesting was baked into SDKs that third parties maintained.

The “reversible” decision to ship quickly and iterate became a permanent decision by accumulation. Not because we couldn’t reverse it, but because the cost of reversal grew faster than our willingness to pay it.

What should we have done? I’m still not entirely sure. Probably, we should have broken the early adopters when we had dozens of them, not thousands. The cost of reversal was high even then, but it was much lower than it would ever be again.

Or maybe we should have treated the API design as a one-way door from the start. Spent more time upfront on the naming and structure. Consulted with potential consumers before shipping. Moved slower on the interface even while moving fast on the implementation.

What I learned is that the door type isn’t fixed at the moment of decision. Reversibility decays. Two-way doors can become one-way doors while you’re not paying attention. The window for reversal is often shorter than you think, and it closes quietly.

What Changes When You Think in Reversibility

Early in my career, I evaluated decisions primarily on whether they were “right.”

Good architecture meant making correct choices. Experience meant having sufficient pattern recognition to make the right decisions faster.

Good architecture meant making correct choices. Experience meant having sufficient pattern recognition to make the right decisions faster.

Now I think about decisions differently. Correctness matters, but correctness is always provisional. What you know today might be wrong tomorrow. What’s right for current requirements might be wrong for future requirements. The environment changes. Your understanding deepens.

Given that uncertainty, the most valuable property of any decision is often not whether it’s correct, but whether you can update it when you learn more.

This shifts how I approach architecture:

For genuine one-way doors, I slow down. I gather more information. I consult more broadly. I try to reduce uncertainty before committing.

For two-way doors, I speed up. I make a reasonable choice and move. I set up mechanisms to learn quickly and reverse cheaply if needed.

For decisions with decaying reversibility, I watch the window. I set explicit triggers for reconsideration. I try to reverse early if reversal seems likely, before the cost makes it impractical.

A decision that’s easy to reverse is a decision that’s safe to make quickly. A decision that’s hard to reverse is a decision worth taking slowly. A decision whose reversibility decays is a decision that demands attention before the window closes.

This is not about being cautious. It’s about matching your decision-making process to the nature of the decision. Some doors deserve careful study before you walk through. Some doors you should walk through immediately because you can always walk back. The skill is knowing which is which.

Reversibility doesn’t tell you what to decide. It tells you how to decide.