Iurii Rogulia

Posted on Jul 5 • Originally published at iurii.rogulia.fi

Technical Due Diligence Checklist Before a Software Rewrite

#rescueprojects #technicalduediligence #engineeringpractice #bestpractices

A founder reaches out. The message is some variation of: "Our system has a lot of problems, we think we need to rewrite it, can you give us an estimate?"

My answer is always the same: I'll give you a number after I've spent two hours in the codebase. Not before.

This is not a negotiating tactic. It's the only honest answer. A quote without an audit is a lottery — either I overprice to cover uncertainty I haven't measured, or I underestimate something serious and we both regret it four weeks in. Neither outcome is useful to you.

There's also a more fundamental issue: most rescue projects don't need a full rewrite. They need targeted stabilization of the two or three things that are actually broken, while everything the previous team built correctly — and there's usually something — stays in place. A stabilization is, in most cases I've seen, materially cheaper than a rewrite and delivers a working system faster. But I can only tell you which situation you're in after I've looked. Anyone who quotes a full rewrite before opening the repository is either guessing or has a business reason to prefer rewrites.

This is what those two hours look like.

One honest scope note before I describe the method. The two-hour audit is calibrated for the kind of systems I'm typically asked to look at: small-to-mid SaaS applications, backend services with a single team, web products under perhaps 100k lines of code. For larger and more complex systems — distributed architectures across many services, high-load infrastructure, enterprise monoliths with twenty years of history, event-driven platforms, compliance-sensitive domains, multi-team organizations — two hours is initial triage, not full assessment. A proper audit of those systems takes days, involves reading production telemetry rather than just code, and requires structured conversations with the people who operate the system day to day. The framework below still applies; the time budget doesn't. I'd be misleading you to imply that two hours gives the same depth on a 30-service event-driven platform as on a single Next.js app.

What I need before I start

The audit is only as good as the access I have. Before I open anything, I ask for the following:

Pre-audit access checklist
──────────────────────────
□ Repository access — read-only, SSH key or GitHub user
□ Production URL and staging URL (if staging exists)
□ Hosting and infra context — Vercel, Coolify, AWS, bare metal?
□ Database schema dump or read-only staging credentials
□ List of external integrations — Stripe, SendGrid, S3, etc.
□ Incident log for the last 3 months
□ Who built it — agency, in-house team, freelancer, AI-assisted?
  How many developers currently maintain it?
□ What "broken" means to you — which specific behaviours
  are wrong or unreliable right now?

Each item has a reason. The repository is obvious. The production URL tells me whether the thing is actually deployed and live, or exists only on someone's laptop. The infra context matters because what's broken in a Docker container on a VPS is different from what's broken on Vercel serverless. The schema dump is often more useful than running credentials — I can read a schema offline without worrying about accidentally touching a production database.

The incident log is the most underestimated item on this list. What broke, when, and how often tells me more about the real risk surface than any amount of static code reading. If the same user-facing error has appeared fourteen times in three months, that's a clue that everything else can wait. If there have been no incidents at all, that's also information — either the system is genuinely stable, or nobody is monitoring it.

The "who built it" question is not about blame. It tells me what category of problems to look for first. A heavily AI-assisted project, for instance, carries distinct patterns I documented in vibe-coded codebase patterns — and those patterns affect how I read the vatnode.dev class of systems differently from a traditionally-authored codebase. The failure patterns are different depending on origin: a codebase built by an agency over two years drifts in one way; a rushed MVP written by the founder over weekends drifts in another; something post-acquisition where two teams' code was merged drifts in a third; a heavily AI-assisted project carries its own recurring patterns. None of these is universal — every codebase deserves to be read on its own terms — but the origin calibrates where I look first.

The "what broken means to you" question is important precisely because the answer is often wrong. Founders describe symptoms. The underlying cause is usually something different. But the symptoms tell me where the business pain is, which shapes how I prioritize what I find.

The first hour: landscape

The first sixty minutes are about understanding the shape of the thing. I'm not debugging yet. I'm not forming opinions about what should be rewritten. I'm building a map.

README and package.json first. What does the project claim to be? What runtime, what framework, what dependencies are declared? I read the README not to follow its setup instructions, but to understand what the team thought they were building. Then I check whether those claims match reality — whether the documented setup actually produces a running application, whether the dependency versions in the lockfile match what's declared, whether the scripts in package.json correspond to anything in the project structure.

Folder structure. Is there structure at all, or is this flat chaos? Do I see folders named v2, old, _archive, new-approach? These are archaeological markers — remnants of previous attempts that were abandoned but never removed. The presence of multiple competing directories for the same concern tells me the codebase has accumulated history without ever being cleaned up.

Tests. Do they exist? Do they pass? Most importantly: what do they test? A healthy test suite takes a few minutes to assess. An unhealthy one is faster: I look at ten tests at random, and if most of them assert return types rather than return values, or if the coverage number is high but the tests are trivially satisfied, I've learned something significant. Green CI on incorrect behaviour is one of the more reliable signals that the codebase has problems its authors couldn't see.

CI/CD pipeline. Is there one? When did it last run green? A pipeline that hasn't passed in six weeks is a project that's been drifting for six weeks. No pipeline at all is a project being deployed by hand, which means every deployment is a manual operation that depends on whoever pressed the button remembering the steps correctly.

Git history. Who committed, how often, in what volumes. A first commit that adds 500 files is a signal worth investigating — it might be an AI generation, an import from a previous repository, a monorepo migration, a framework scaffold, or an internal code transfer. Each of these implies a different starting condition, and asking the team to explain it is more reliable than guessing. Commit messages dominated by "update" and "fix" can indicate weak engineering discipline, but they can also reflect a team that prioritized other communication channels (issue trackers, PR descriptions) — useful as a soft signal, not a verdict on its own. The patterns I find more reliable: cadence, who is committing where, whether changes are bundled or atomic, and whether the history shows the codebase being progressively cleaned or progressively accumulated.

Dependency audit. I run npm audit or the equivalent, note the count of high-severity vulnerabilities, and scan for duplicate coverage: two HTTP clients, three date libraries, multiple utilities that do the same thing. Dependency proliferation is a reliable indicator of how much architectural coordination happened during development.

Search passes for known red flags. TODO, FIXME, HACK in production paths. console.log statements that were left in. any scattered through TypeScript files. SQL strings assembled by template literals. These don't tell me the system is broken, but they tell me how carefully it was built — and they cluster in the same files as the actual bugs, reliably enough to be useful navigation.

At the end of the first hour, I have a landscape map. Not a verdict — a map. I know the scale of the problem, the zones of highest risk, and whether there are obvious immediate priorities. Now I go deeper.

The second hour: critical paths

The second hour is where I form the actual assessment. I pick five areas and read them carefully.

slug="technical-due-diligence"
text="Need a second opinion before committing to a rewrite? I do structured codebase audits — two hours of reading, a written assessment, and an honest recommendation."
/>

Authentication. Where does it live? Is it in one place or scattered? I trace a request from the browser through every layer that should be checking identity, and I verify those checks are actually present and consistent. Authentication is the highest-risk area in any application — not because it's the most likely thing to be completely broken, but because when it is broken, the consequences are irreversible. I'm looking for: tokens stored in the wrong place, session validation that can be bypassed, multiple partially-implemented auth approaches that might interact badly.

Database schema. Do the models match the migrations? Are foreign keys enforced at the database level, not just in the ORM? Are there indexes on the columns that actually appear in WHERE clauses, or were indexes added speculatively on columns that are never queried? Schema drift — where the migration history and the actual database schema have diverged — is one of the most expensive problems to discover late, because it means the database cannot be reliably reproduced from the repository alone.

Money paths. If there are payments, I read every line. Stripe webhook handling: is it idempotent? Can the same event be processed twice without creating a duplicate charge? (I've written about Stripe webhook idempotency in production if you want the implementation detail.) VAT logic: is it configured per jurisdiction or hardcoded for one country? The combination of payment bugs and incorrect tax handling is the category most likely to have legal and financial consequences beyond the technical problem, so I give it disproportionate attention relative to its share of the codebase.

The reported problem. Whatever the founder described as "broken" — I find the file that contains that functionality and read it. This is often the most revealing part of the audit, not because the bug is always immediately obvious, but because the state of the code around the reported problem is usually representative of the codebase at large. A well-maintained project has clean, readable code around its bugs. A project in serious trouble has code that's hard to even follow before you find the defect.

Operational failure handling. What happens when an external service goes down? What happens when the database is unreachable? What happens when a background job fails halfway through? Good operational handling is unglamorous to build and easy to skip — which means the quality of error handling, retry logic, and failure visibility is one of the most reliable proxies for how much production experience went into the codebase.

By the end of this pass, I can usually form an initial recommendation: what does this codebase look like it needs? The recommendation is provisional — production behaviour, real traffic, and the operational context I haven't yet seen can shift it. But it's grounded enough to start a serious conversation.

The decision framework I use:

If the business logic and data model are sound but the infrastructure and operational patterns are broken: stabilize. The foundation is there; it needs skilled work, not replacement.
If the business model has changed significantly since the codebase was built, and the code no longer reflects what the product actually does: partial or full rewrite may be justified. The code is not wrong so much as it answers the wrong question. Rewrite is also the honest answer in several other situations — when the original technology choice has become a permanent constraint on velocity and there is no migration path inside it; when the operational entropy is high enough that stabilization becomes an ongoing money pit rather than a finite engagement; when architectural decisions made early have become structurally irreversible (a single shared database under a monolith now serving multiple products, for example); or when the codebase carries enough accumulated risk that the cost of maintaining it exceeds the cost of replacing it. I'm not against rewrites. I'm against rewriting reflexively before anyone has measured whether stabilization is feasible.
If two or three specific modules are broken and the rest is functional: targeted rescue. Fix the things that are broken, leave the rest alone.

The part the code doesn't show

Reading code carefully will tell me a great deal. It won't tell me everything that matters, and pretending otherwise would be unprofessional.

A meaningful proportion of failing projects are not primarily failing because of the code. They're failing because of ownership ambiguity, missing operational knowledge, undocumented deployment rituals, unclear product direction, or accumulated tribal understanding that left the company when a key engineer did. The code in those projects reflects the dysfunction; it isn't the source. Rewriting that code without changing the conditions around it produces a new codebase that drifts toward the same problems.

I look for traces of this in the audit — incident retrospectives that point at the same coordination failure repeatedly, deployment instructions that exist only in someone's head, a critical service nobody on the current team can fully explain. When I see these signals strongly, I say so in the report. They don't always change the recommendation, but they change how the engagement should be structured. A rescue project that ignores the organizational substrate is a rescue project that will need to be done again.

For these reasons, anything I deliver after a two-hour audit is a code-and-history view. The full picture includes the operational realities — observability, dependency graphs, ownership boundaries, on-call structure, SLO posture — that need a separate conversation, usually with whoever runs the system day to day.

What I don't do in the first two hours

This is worth stating explicitly, because it shapes expectations.

I don't write code. I don't open pull requests, propose patches, or suggest fixes during the audit. The audit is diagnostic work, not remediation work — mixing the two means I'd start solving problems before I understand what all the problems are.

I don't profile performance. Understanding why something is slow requires instrumentation, production traffic patterns, and often data that's only visible under load. That's a separate engagement with its own tools and timeline.

I don't conduct a security penetration test. I check for obvious security problems — the kind that are visible in static code reading — but a proper security assessment involves active testing, not just reading. Serious exposure I find during the audit ends up in the report; I won't claim to have done a security audit when I've done a code review.

I don't talk to the development team. I talk only to the person who invited me in, until and unless we've agreed on a plan and I understand what my role is. Walking into a team's codebase and raising concerns directly with engineers I haven't been introduced to creates trust problems that are hard to undo.

I don't estimate timelines. That comes later, once I know what needs to be done. The audit tells me what kind of work is required. The scope and timeline conversation is what comes after.

What you receive

The output of two hours of structured audit is a document, not a conversation. It contains:

A summary of the current state: overall architecture, what's working, what's broken, and what category the problem falls into — stabilize, rewrite, targeted fix.

The top three risks, ranked by severity and consequence. Not a comprehensive list of everything I noticed — a prioritized view of what matters most. The three things I'd fix first if this were my codebase.

A recommendation with a cost range. Not a single number — a range that reflects the actual uncertainty. If stabilization is the right answer, something like: four to six weeks, €X–€Y. The range is honest; anyone who gives you a single precise number after two hours of audit is guessing with more confidence than the situation warrants.

Scope boundaries — what falls outside what I'd take on, with a note on who or what could cover it instead. Knowing the boundaries matters as much as knowing what's included.

An alternative path, if a better one exists. Sometimes the right move is hiring a senior in-house engineer for three months rather than engaging me. If the audit points there, that's what the report recommends.

When I decline

Not every rescue project is one I'll take.

If the codebase is in a state where reasonable rescue isn't possible — where the architectural problems are so fundamental that any repair would cost more than rebuilding from a clean foundation — I say so, and I don't take the work. Some codebases have accumulated enough contradictions that the only honest path is a fresh start.

If the founder wants a fix for a symptom rather than a cause — a patch over the leak rather than repair of the pipe — I can do that, but I label it for what it is: temporary relief, not structural improvement. Sometimes that's the right answer for where the business is. Sometimes it isn't.

If the business logic itself is unclear — if the founder cannot tell me what the system should do in the scenarios where it currently fails — that's a product definition problem before it's a technical one. I can't rescue a system toward requirements that haven't been established. We'd need to fix the product clarity first.

If the problem requires expertise I don't have — embedded systems, ML inference infrastructure, real-time trading at exchange scale — I decline and, where I can, suggest someone who specializes in it. Taking work outside my competence would be a worse outcome for you than passing on it.

This last category is part of due diligence in the other direction: I'm evaluating whether this is a good match, not just whether the codebase is salvageable. A senior developer who takes everything that pays is not someone you want doing due diligence on your most critical infrastructure.

The cost of the audit versus the cost of skipping it

Two hours of structured reading before committing to a rescue engagement is, almost always, the cheapest thing that happens in the entire project.

If those two hours reveal that the system needs stabilization rather than rewrite, and stabilization is €20k instead of €60k, the audit has paid for itself before the first line of rescue code is written. If those two hours reveal that I'm not the right person for this particular problem, you've saved the cost of finding that out six weeks into the engagement. If the audit confirms that the situation is exactly what you described and the rescue is straightforward — you go into the engagement with a calibrated view of what's involved, which is worth something when you're committing real money against an unknown.

The two hours are not about finding reasons to charge more. They're about making sure both of us know what we're committing to.

If this is the situation you're in — a codebase someone is telling you to rewrite, and you want a second opinion before you commit — that's what my technical due diligence service is for. If you've already decided you want it stabilized, that work continues in rescue projects. And if the handover is the first challenge — before the rescue even begins — agency codebase handover covers what the first five days look like in practice.