Saeed Habibi

Posted on Jan 28 • Originally published at saeedhbi.Medium

Why Your Best Technical Decisions Will Eventually Be Wrong

#softwareengineering #softwaredevelopment #microservices #programming

The Architecture of Decisions, Part 1

There is a mass grave of technical decisions buried in every codebase. Some of them were wrong from the start. Others were right once, then the world changed. A few were brilliant and remain brilliant. Most live somewhere in between, quietly accumulating interest on debt nobody remembers taking on.

The decision that felt obvious in 2019 becomes the migration nobody wants to own in 2024. The architecture that scaled beautifully for three years collapses under requirements that didn’t exist when you designed it. The framework everyone recommended becomes the framework everyone is migrating away from.

This is not a failure of engineering. This is the nature of technical decisions. Every choice you make is a bet on a future you cannot fully see.

Technical decisions are predictions. Not guesses, not preferences, not best practices applied blindly. Predictions. When you choose PostgreSQL over MongoDB, you are predicting that your data will remain relational, that your access patterns will favor complex queries over document lookups, and that the join performance will matter more than horizontal write scaling. When you define service boundaries around “users” and “orders,” you are predicting that these domains will evolve independently, that the team structure will mirror this separation, and that the contract between them will remain stable.

Most engineers don’t think about decisions this way. They think about decisions as choices between options, evaluated by criteria, selected by judgment. And that framing is not wrong. But it misses something fundamental.

The criteria you use to evaluate options are themselves predictions. “We need horizontal scalability” is a prediction about load that may never materialize. “We need strong consistency” is a prediction about what correctness means for users who haven’t complained yet. “We need to move fast” is a prediction about how long the current architecture will matter before the next rewrite.

This is why two equally skilled engineers can look at the same problem, apply sound reasoning, and reach opposite conclusions. One looks at your 10,000 daily active users and sees a system that needs to scale to millions. The other looks at the same data and sees a system that might stay this size forever. They are not disagreeing about the present. They disagree about the future. And the future has not yet voted.

The uncomfortable truth is that you cannot know if a technical decision is good until time has passed. Sometimes years. The decision to use Kubernetes might look brilliant at month six when you’re deploying twelve services, and catastrophic at month eighteen when you realize you have three services and a full-time infrastructure engineer managing cluster complexity. The decision to stay on Heroku might feel limiting in year one and wise in year three, when your competitor is still debugging their Kubernetes networking.

What you can do is understand the properties of your decisions. Not whether they are right, because that requires information you don’t have. But how will they behave as the future unfolds? How much room do they leave for course correction? How far does the damage spread if they turn out to be wrong? How much are you betting on things you cannot currently know?

Some decisions age well. They remain good choices even as requirements shift, teams change, and the technology landscape evolves. Other decisions decay. They were reasonable once, but the context that made them reasonable has disappeared, and now they are obstacles rather than foundations.

The difference is rarely about the decision itself. It is about the relationship between the decision and time.

This is what I want the Architecture of Decisions series to explore. Not a catalog of correct answers, because right answers depend on context that I cannot know. Instead, a framework for thinking about decisions. A way to evaluate not just what to choose, but how to choose. And more importantly, how to understand what you are actually betting on when you commit to a path.

The framework has three components. Each one addresses a different dimension of how decisions interact with uncertainty.

Reversibility asks: if this decision turns out to be wrong, how hard is it to change course? Some decisions are two-way doors. You can walk through, look around, and walk back if you don’t like what you see. Other decisions are one-way doors. Once you’re through, the door locks behind you. The cost of reversal becomes so high that you effectively cannot reverse.

Blast radius asks: if this decision fails, how much of the system fails with it? Some decisions are contained. They affect a single component, a single team, a single workflow. Other choices are foundational. They propagate through everything. When they go wrong, everything downstream goes wrong too.

Information asymmetry asks: how much do you NOT know, and does that matter? Every decision involves incomplete information. But for some decisions, the missing information is not critical. For others, the missing information is precisely what determines whether the decision will succeed or fail.

These three properties do not tell you what to decide. They tell you how to decide. They tell you how much caution a decision deserves, how much validation it needs, and how much reversibility you should preserve.

Reversibility is the most underrated property of good architecture. Jeff Bezos talks about one-way and two-way doors, and the distinction is helpful, but it’s more nuanced than a binary.

Some decisions look reversible but aren’t. You can technically migrate from PostgreSQL to MongoDB. But if you have three years of queries written against relational assumptions, stored procedures that encode business logic, and reporting tools that expect SQL, the reversal cost is so high that the decision is effectively permanent. The door looks like it swings both ways, but the hinges have rusted shut.

Other decisions look permanent but aren’t. Switching from Python to Go feels like a massive undertaking. But if your architecture is twelve microservices with clean API boundaries, you can rewrite them one at a time. Six months later, you’ve migrated without ever stopping the system. The door looked like a one-way gate, but there was a side entrance that nobody mentioned in the architecture review.

I’ve seen teams spend months debating database choices as if the decision were irreversible, then casually adopt a new frontend framework every six months. The actual reversibility was inverted from their perception. The database, abstracted behind a repository layer, could have been swapped with moderate effort. The frontend, with its tendrils reaching into every component, was the real lock-in.

What makes reversibility powerful is not just the ability to undo. It is the ability to learn. Reversible decisions let you gather information that was unavailable when you made the original choice. You ship, you observe, you adjust. The decision becomes a hypothesis you can test rather than a commitment you must defend.

Part 2 of this series will go deep into reversibility, which we will discuss later. How to identify which type of door you are walking through. How to preserve optionality when you cannot avoid irreversible choices. How to prevent mistaking cosmetic reversibility for actual reversibility.

Blast radius is about failure containment. Every decision will eventually interact with failure, either its own failure or the failure of something adjacent. The question is how far that failure propagates.

A contained failure is annoying. A propagating failure is existential.

Part 3 will explore the blast radius in detail.

Information asymmetry is the most philosophical of the three properties, and in some ways the most important. It asks what you are betting on that you cannot currently verify.

Every decision involves unknowns. That is the nature of predicting the future. But the unknowns are not equally distributed. Some decisions depend heavily on information you don’t have. Others are robust to the unknowns because the unknowns don’t affect the core value proposition.

When you chose Angular in 2016, you were betting on Google’s commitment to the framework. You didn’t know that Angular 2 would be a complete rewrite incompatible with Angular 1. You didn’t know the community would fragment. You couldn’t know, because Google hadn’t decided yet. That’s information asymmetry: the critical information exists in someone else’s future decisions.

When you choose Redis for caching, the information asymmetry is lower. Redis has been stable for over a decade. The API is mature. The failure modes are well-documented. You’re still betting on the future, but you’re betting on a trajectory with a long, observable history.

The skill is not to eliminate information asymmetry, because you can’t. The skill is to recognize where it is highest and to calibrate your confidence accordingly. Decisions with high information asymmetry deserve more hedging, more reversibility, and more caution. That shiny new framework with eighteen months of history and one major corporate sponsor? High information asymmetry. PostgreSQL? Low information asymmetry. Your confidence in each decision should reflect that difference.

Part 4 will explore information asymmetry and how to identify what you don’t know, how to distinguish between uncertainty that matters and uncertainty that doesn’t. How to make good decisions when you cannot wait for the information you wish you had.

I should be honest about something. I have gotten this wrong more times than I have gotten it right.

In 2019, I was part of a team that decided to break a monolith into microservices. The decision made sense at the time. We had a Node.js application that had grown to 200,000 lines of code. Deployments took 20 minutes. Two teams were stepping on each other’s changes. We had read the articles and watched the conference talks. Microservices were the answer to our problems.

We identified what we thought were clean boundaries. Users. Orders. Notifications. Payments. Each would become its own service with its own repository, deployment pipeline, and database.

What we actually built was a distributed monolith. The “Users” service needed to validate orders, so it called the Orders service. The Orders service needed to check payment status, so it called Payments. Payments are required to send receipts, so it is called Notifications. Notifications needed user preferences, so it called Users. We had drawn boxes on a whiteboard, but each box had arrows pointing to the others.

What used to be function calls became HTTP requests. A user sign-up that took 50 milliseconds now takes 400 milliseconds because it crossed four network boundaries. Debugging became archaeology: which service logged the error? Which request ID do I trace? Why is this field null when the other service swears it sent a value?

The decision felt reversible. We could always recombine services if needed. In theory, yes. In practice, once you have five teams building on five services with five deployment processes, the political and organizational costs of reversal exceed the technical costs. The door had closed behind us while we were busy decorating the new rooms.

The blast radius was larger than we anticipated. We thought we were isolating components. We created a runtime dependency graph in which every service required every other service to function. A deploy to Notifications that introduced a 500ms latency spike cascaded into timeouts across the entire system. Our “independent” services were independent only in their git repositories.

And the information asymmetry was enormous. We made the decision based on a predicted scale that never materialized. We designed for 10 million users and peaked at 300,000. We made it based on team structures that changed six months later when the company reorganized. We made it based on deployment independence, which we never actually used because features still required coordinated releases across three or four services.

Three years later, parts of that system are still running. We never fully unwound it. The cost of reversal kept growing, and eventually, we just stopped trying. We learned to live with the distributed monolith, adding workarounds and accepting the latency tax.

That experience changed how I think about technical decisions.

Do not be paralyzed by fear of getting it wrong, because you will get things wrong. But to understand what you are betting on. To know which doors are one-way. To design for the blast radius of your own potential mistakes.

Early in your career, technical decisions feel like puzzles with correct answers. You learn the patterns, apply the principles, and arrive at solutions. Experience teaches you that the answers depend on questions that haven’t been asked yet.

Intermediate engineers evaluate decisions based on technical merit. Is this the right tool? Is this the correct pattern? Is this the right architecture? These are important questions, but they are incomplete.

Senior engineers evaluate decisions based on properties. How reversible is this? What is the blast radius if we’re wrong? What are we betting on that we cannot currently verify? These questions acknowledge that correctness is not a property of decisions in isolation. It is a property of decisions in context, and context changes.

The shift is not about being smarter. It is about being honest. Honest about uncertainty. Honest about the limits of prediction. Honest about the fact that some decisions will be wrong regardless of how carefully you make them.

A system that handles requests is functional.
A system that survives wrong decisions is resilient.
A system where decision quality improves over time demonstrates architectural maturity.

A system that handles requests is functional.
A system that survives wrong decisions is resilient.
A system where decision quality improves over time demonstrates architectural maturity.

That maturity comes not from making fewer mistakes, but from making survivable mistakes. From preserving the ability to learn. From understanding that every technical decision is a bet on a future that has not yet arrived.

The future will vote eventually. Your job is to stay in the game long enough to hear the results.

What is your idea? What experiences do you have?