Hugo Vantighem

Posted on Jun 1

When "it works in prod" becomes worse than "it works on my laptop"

#craftsmanship #softwareengineering

Every engineering team accumulates technical debt. That's normal.

The dangerous moment isn't when the debt appears. It's when everyone agrees it's there — and decides to keep building on top of it anyway.

I've seen that shift happen in almost every engineering team I've worked with. It doesn't announce itself. It arrives quietly, dressed as pragmatism. And by the time you notice it, it's already become the default way the team operates.

It starts with a gap in knowledge. And that's completely fine.

The first error is innocent

Making design or architecture mistakes isn't a problem. We all do the best we can with the knowledge we have at the time. That's not a failure — it's how engineers grow. You learn by building, and sometimes you build something wrong before you know what right looks like.

The first bad design decision is usually just that: a knowledge gap. Nobody in the room knew better. They shipped with what they had.

Then someone learns something

Time passes. The team gains experience. Someone reads the right book, works through a gnarly bug, or joins from a company that had solved this exact problem already. The gap closes.

And with it comes the moment that defines a team's culture.

That person — maybe you — points to the design flaw. Not to criticize, not to reopen old wounds, but because they can see it clearly now and want the team to move forward. They bring the argument. They show why it matters.

And here's where things split.

Two possible answers

The first answer is a technical discussion. Maybe the team agrees immediately. Maybe they push back, challenge the framing, propose a different fix. Maybe they decide the cost of changing is too high right now and park it for later. Any of that is fine. The conversation is alive. The team is reasoning together about its own code.

The second answer is: "Yeah but it already works like this in prod. Why change it?"

The conversation stops. The technical debate is over before it started. And something quietly shifts.

The real tipping point

When that second answer happens, the error hasn't disappeared. It's now fully visible — known, understood, and acknowledged by everyone in the room. But instead of becoming a problem to solve, it becomes a fact to live with.

Known.
Understood.
Accepted.
Normalized.

That's the tipping point. Not the original mistake — that was just a knowledge gap. The tipping point is the moment the team decides, consciously, that a known flaw is the way things are.

We've moved from "we didn't know better" to "we know, and we're choosing not to act." Those are fundamentally different things. The first is human. The second is a choice about what kind of team you want to be.

The shield protects everything — including what it shouldn't

The pattern is corrosive because "it works in prod" gets used across the entire spectrum — and the severity of what it's shielding rarely changes the answer.

Sometimes it's a structural decision that's now load-bearing: real cost, real risk, the resistance is at least understandable. The mistake is using the phrase to close the conversation rather than as a factor in a real discussion about when and how to address it.

But then there's the case that should never be shielded.

Design flaws with real business risk attached. A service that writes to two stores. Someone points out the write isn't atomic — a crash between the two leaves them permanently out of sync. The answer comes back: "We've been running it for two years." Not because the concern is wrong. Because it hasn't exploded yet.

A data model that creates silent desync under concurrent load. A sequence that can leave the database in a half-committed state on crash. An architecture that, under the right conditions — a spike, a timeout, a race — loses data.

It works in prod. For now. Because the conditions that trigger it haven't aligned yet — or because they have, quietly, and nobody noticed.

At this level, "it works in prod" isn't pragmatism. It's a decision to accept an unquantified risk indefinitely, on behalf of users who don't know they're carrying it.

The system isn't stable.

It's stable so far.

The team knows the difference.

How it compounds

What makes this dangerous isn't any single decision. It's what comes after.

Because the next feature gets built on top of the flaw. Then the next one. Each decision taken in isolation is defensible — you're adapting to what exists. But you're also adding weight to a foundation that everyone now knows is wrong.

And when the cost of fixing finally becomes impossible to ignore, the same logic applies again: "It would cost too much to redo now."

A new trade-off. Based on the previous one. Which was itself based on the one before.

This is how absurd architectures are born — not from one catastrophic decision, but from a temporary shortcut that got normalized, then depended upon, then treated as untouchable. Accidental complexity doesn't arrive all at once. It compounds, layer by layer, until it becomes the team's technical culture.

Why "it works in prod" is more dangerous than "it works on my laptop"

Every developer knows what "it works on my laptop" means: the work isn't finished. Nobody treats it as a conclusion — everyone understands it as the beginning of the real work.

"It works in production" carries the opposite social weight. It sounds like evidence. It carries the authority of real users, real load, real money. And it gets used for exactly the same purpose — to stop the conversation.

The difference is that one is obviously incomplete. The other gets mistaken for proof.

That's what makes it more dangerous. Something that breaks locally is obviously broken. Something that works in production has inertia on its side — real users depend on it, changing it carries risk, the system resists being touched. That inertia is real. But the moment it becomes a reason to close a technical conversation rather than inform one, the phrase has shifted from a statement about stability to a shield against growth.

What you lose — and who you lose

The real cost isn't code quality. That's the surface.

What you lose is the team's ability to reason honestly about its own system. The questions that are supposed to stay open get closed by precedent. And once the culture of closing them is established, it applies to the next problem too. And the one after.

When "it works in prod" becomes the team's default posture, you can map it exactly onto what "it works on my laptop" reveals about an individual developer: someone who has stopped asking whether it's right, and started asking only whether it's currently not broken. At the individual level, that's a conversation. At the team level, it's a culture.

The tell is simple: watch what happens when someone raises a known problem. Does the team ask "how do we address this?" or "why are we even talking about this?" The first compounds knowledge. The second compounds debt — and has decided that's fine.

The engineers who care most are also the ones most likely to leave when the answer is always a wall. Not because of the debt itself. Because of what the wall says about the team.

The only thing that's not negotiable

Every team carries debt they can't pay down immediately. Every team makes trade-offs under pressure. That's not the failure.

The failure is letting the conversation die.

The technique is learnable. You can hire your way out of a knowledge gap, train your way out of a skill gap, refactor your way out of a design gap. None of those require the team to be perfect — just willing.

The willingness to keep questioning, to hear "we built this wrong" as useful information rather than a threat — that's not a technical skill. It can't be reviewed, tested, or measured. But it's what determines whether a team compounds its knowledge or just compounds its debt.

Every team accumulates debt. The difference is whether it remains a conscious trade-off or becomes an unquestionable fact.

The day a team stops discussing known flaws isn't the day the architecture fails.

It's the day learning stops.

DEV Community