Kunal

Posted on Apr 13 • Originally published at kunalganglani.com

Software Rewrite from Scratch: Why It's Almost Always the Worst Engineering Decision [2026]

#softwareengineering #techdebt #refactoring #projectmanagement

Software Rewrite from Scratch: Why It's Almost Always the Worst Engineering Decision

In 1997, Netscape looked at their browser codebase and decided to burn it down. The software rewrite from scratch took nearly three years. During that time, they shipped nothing. Microsoft's Internet Explorer ate the market alive. By the time Netscape 6.0 limped into public beta, the browser wars were already over.

Twenty-eight years later, engineering teams keep making the exact same mistake. I've watched it happen three times in my career. Each time, the pitch sounds reasonable. Each time, the outcome is the same.

Why Software Teams Keep Falling for the Rewrite Trap

The appeal of a rewrite is almost primal. You open a legacy codebase, see tangled abstractions, inconsistent naming, workarounds layered on workarounds, and your brain screams: burn it down and start fresh.

I get it. I've felt it. I spent three weeks once debugging a system where the original authors had long since left and the documentation was a mix of outdated wiki pages and hopeful comments. The idea of a clean slate was intoxicating.

But the impulse is wrong. Joel Spolsky, co-founder of Stack Overflow and Trello, nailed it in his famous 2000 essay: the code you're looking at isn't messy because the original developers were incompetent. It's messy because it encodes years of bug fixes, edge cases, and hard-won lessons about the real world. Every weird conditional, every confusing variable name, every seemingly pointless check probably exists because something broke in production and someone fixed it at 2 AM.

When you throw that code away, you're not discarding syntax. You're discarding institutional knowledge.

Stripe's Developer Coefficient report puts numbers on this: developers spend roughly 17 hours per week on maintenance tasks like debugging and refactoring, with about 13.5 of those hours going specifically toward technical debt. That pain is real. But the solution to pain isn't amputation when physical therapy will do.

The Second-System Effect: Why Rewrites Always Bloat

Even if you survive the knowledge-loss problem, there's a second trap waiting. Fred Brooks called it the Second-System Effect in The Mythical Man-Month: the tendency for a team building a replacement to massively over-engineer it.

I've seen this play out the same way every time. Your architects remember every feature they had to cut from v1. Every hack they weren't proud of. Every shortcut that haunted them. Now they have a blank canvas and a mandate to "do it right this time." So they build something more abstract, more configurable, more "future-proof" than anyone asked for.

The result is a project that's late, over-budget, and somehow harder to maintain than the thing it replaced. I watched a team spend eight months building an elaborate plugin architecture for a rewrite when the original system only ever needed two integrations. They were solving problems they imagined they'd have. Not problems they actually had.

The enemy of a working system isn't ugly code. It's a beautiful system that doesn't ship.

This connects to something I wrote about in how AI-generated code is creating new maintenance burdens. Whether code comes from a human or an LLM, working software that's ugly beats elegant software that doesn't exist.

The Opportunity Cost Nobody Calculates

Here's the math that rewrite advocates consistently ignore.

A software rewrite from scratch means your engineering team spends 12 to 24 months (and that's optimistic) rebuilding features that already exist. During that window, your current product gets zero new features. Your competitors keep shipping. Your customers keep asking for things you can't deliver because everyone is heads-down recreating what you already have.

Spolsky put it bluntly: a rewrite is "the single worst strategic mistake that any software company can make." Not because the new code won't eventually be better. It might be. But by the time it ships, the market has moved on.

I lived through a rewrite where the team estimated nine months and delivered in nineteen. During that time, two competitors launched features our customers had been requesting for years. We lost three enterprise accounts. The new codebase was cleaner, sure. It was also serving a smaller customer base.

The business value of a rewrite to your customers is effectively zero. They don't care if your backend is now in Rust instead of Java. They care about the feature they've been waiting for.

This mirrors what happens with tech debt in AI applications. The temptation to start over is always there, but the cost of stopping forward progress is almost always underestimated.

What Is the Strangler Fig Pattern in Software Engineering?

If rewriting from scratch is almost always wrong, what do you actually do? The best answer I've found comes from Martin Fowler, who proposed the Strangler Fig pattern after observing strangler fig vines in the rainforests of Queensland, Australia.

The strangler fig germinates in the nook of a host tree. It grows slowly, drawing nutrients from the host, until it reaches the ground to grow its own roots and the canopy to get its own sunlight. Eventually, the fig becomes self-sustaining. The host tree dies, leaving the fig as an echo of its original shape.

The software version works the same way. Instead of replacing the old system in one big bang, you build new functionality around its edges. New features get built in the new architecture. Existing features get migrated one at a time, each migration proving itself in production before you move on. The new system gradually takes over until the old system can be safely retired.

Why does this work where rewrites fail?

No feature freeze. Your team keeps delivering value while modernizing.
No big-bang deployment. Each piece migrates independently. Failures are small and reversible.
No knowledge loss. You migrate behavior one piece at a time, validating that the new code matches the old code's actual behavior. Edge cases included.
No over-engineering. You're solving real problems as you hit them, not imagining future ones.
Continuous validation. Users test the new system in production at every step, not after two years of development in a vacuum.

I've used this approach to replace a monolithic service with a set of smaller, focused services over about fourteen months. At no point did we stop shipping features. There was no terrifying "flip the switch" deployment day. The old system just slowly got quieter until we turned it off and nobody noticed.

Fowler's key insight is that "replacing a serious IT system takes a long time, and the users can't wait for new features." The Strangler Fig respects that reality. It's also why Microsoft's fourteen-year war to deprecate the Control Panel has followed a similar incremental strategy. You don't rip out something millions of people depend on overnight.

When Is a Software Rewrite From Scratch Actually Justified?

I'm not going to pretend it's never the right call. But the bar should be extraordinarily high. After shipping software for over 14 years, I think a rewrite is justified only when all three of these conditions are true at the same time:

The technology platform is genuinely dead. Not "old." Not "unfashionable." Dead. You can't hire anyone to work on it, the vendor has stopped shipping security patches, the runtime is approaching end-of-life. A COBOL system on a mainframe going out of support might qualify. A Rails app that feels dated does not.
The architecture fundamentally cannot support the business direction. Going from single-tenant to multi-tenant, or batch processing to real-time streaming, sometimes the original architecture is so deeply incompatible that incremental migration costs more than starting over. This is rare.
The system is small enough to rewrite in one quarter. If a rewrite is going to take more than three months, use the Strangler Fig instead. Rewrite risk scales exponentially with duration.

If you can't check all three boxes, the answer is incremental modernization. Full stop.

A Framework for the Conversation

The next time someone on your team says "we should just rewrite this," don't dismiss them. The frustration behind that statement is valid. Legacy systems are genuinely painful to work in. But redirect the conversation:

What specific capability are we missing that the current architecture can't support?
Can we isolate that capability and build it as a new service alongside the old system?
What's the smallest piece we could migrate first to prove the approach?
How long would a full rewrite actually take, and what features won't ship during that time?

Honest answers to those questions almost always lead teams toward incremental modernization. Not because it's more exciting. Because it actually works.

The Boring Answer Is the Right One

This is one of those things where the boring answer is actually the right one. Gradual migration isn't sexy. Nobody writes a triumphant blog post about "we slowly replaced our authentication layer over four sprints and nobody noticed." But that's exactly what good engineering looks like.

The software rewrite from scratch is a siren song. It promises a clean start, a chance to fix everything, a world where your codebase is beautiful and your deploys are painless. What it delivers is paralysis, scope creep, and market share loss. Almost every time.

The engineers who build systems that last aren't the ones who tear everything down and start over. They're the ones with the discipline to improve what exists, one piece at a time, while never stopping delivery.

If you're staring at a legacy codebase right now and dreaming about a rewrite, close that blank main.go file. Open the existing code instead. Find the ugliest module. Write a test for it. Then make it better. That's how real systems evolve.

Originally published on kunalganglani.com

DEV Community