How to Prepare a Legacy Codebase for AI-Assisted Refactoring

#programming #ai #productivity

Jumping into a legacy codebase with an AI coding assistant and no preparation produces predictably mixed results. The AI generates plausible-looking refactors that miss critical business logic embedded in unexpected places. You spend more time verifying output than the AI saved you in generation time. And the refactored code, while cleaner-looking, may have subtle behavioral changes that surface in production six weeks later.

The difference between this outcome and a productive AI-assisted modernization session is preparation. Specifically: giving the AI the context it needs to reason correctly about your specific codebase rather than reasoning from generic patterns.

This guide covers the preparation steps that make AI-assisted legacy refactoring significantly safer and more productive.

Step 1: Establish Scope and Document It

Before any AI interaction, define the boundary of what you are working on. Legacy codebases have a way of expanding scope because everything touches everything. Resist this.

Choose a specific module, class, or set of related functions as your working scope. Write a plain-language description of what that scope is responsible for:

Scope: the discount calculation module (discount.py, approximately 400 lines)
This module is responsible for: calculating the final price a customer pays
after applying applicable discounts, promotions, and loyalty tier benefits.

It is NOT responsible for: fetching customer tier data (done by customer_service.py),
validating promo codes (done by promo_validator.py), or applying tax (done post-discount
by tax_calculator.py).

The most important business constraint: discounts do not stack additively.
A customer with a 20% loyalty discount and a 15% promo code gets 20% off, 
not 35% off. This is intentional and must be preserved in any refactoring.

This description becomes the context header you paste before every AI prompt related to this module. It costs you twenty minutes to write; it saves you from explaining the same context to the AI repeatedly and catching errors that stem from the AI not knowing the "discounts don't stack" rule.

Step 2: Audit Dependencies Before Touching Anything

AI coding assistants will generate refactored code that changes function signatures, return types, or module interfaces without knowing what depends on them. Before you start refactoring, you need a dependency map.

For Python codebases, tools like Python's built-in ast module and import analysis scripts can generate call graphs. For JavaScript, ESLint and module analysis tools serve a similar purpose. GitHub advanced search can help you find all internal references to a specific function across a large repository.

The AI can help with this phase, but its output should be treated as a starting point:

Identify all the places this function is called in the following files.
For each call site, note:
1. The file and line number
2. How the return value is used (stored, compared, iterated over, etc.)
3. Whether the caller passes keyword arguments or positional arguments

[target function] [relevant surrounding files]

Review the AI's output carefully. Dynamic call patterns (calling functions stored in dictionaries, factory patterns, monkey-patching) will not appear in AI dependency analysis. These need manual identification.

The dependency map serves a critical purpose: before you change a function signature or return type, you know what you need to update. Without it, you are refactoring blind.

Step 3: Create a Test Baseline

Legacy code with no tests is the most dangerous to refactor because you have no automated way to verify that behavior is preserved. Before any refactoring, use AI to generate an initial test suite for the module you are working on.

This is one of the highest-value uses of AI assistance in legacy modernization. Even imperfect AI-generated tests are faster to produce than writing them from scratch, and they provide a safety net that makes subsequent refactoring significantly lower-risk.

Important: AI-generated tests tend to cover the happy path and obvious error cases well, and miss edge cases that emerged from production incidents. After getting the AI-generated test suite, review your issue tracker, Git blame history, and incident reports for the module. Add tests for any bugs that were fixed in the module's history - those are the edge cases most likely to be reintroduced by refactoring.

Once your test baseline is in place, configure your CI pipeline to run these tests on every commit. This gives you immediate feedback when a refactoring breaks behavior.

Step 4: Identify and Document the Critical Paths

Not all code in a legacy system is equally risky to modify. The critical paths are the execution flows that:

Handle money or anything irreversible (payments, emails sent, database deletes)
Run under high load or in performance-sensitive paths
Have known security relevance (authentication, authorization, input validation)
Have produced incidents or bugs in the past

These are the paths where AI-generated refactors need the most careful human review. Document them explicitly before starting:

Critical paths in discount.py:
1. Lines 145-190: Final discount application to cart total - this writes to the order record
2. Lines 210-230: Promo code validation bypass for internal employee accounts - security-relevant
3. Lines 280-310: Bulk discount calculation - runs for every item in large orders, performance-sensitive

When AI-generated refactors touch lines in this list, they get extra review. When they do not, you can move faster. This simple classification reduces the time you spend being careful about everything and focuses attention where it matters.

Photo by Bernice Chan on Pexels

Step 5: Set Up a Safe Experimentation Environment

Before merging any AI-assisted refactoring, you need a way to run the original and refactored code side-by-side and compare behavior. The ideal setup:

A feature branch where AI-assisted changes are isolated
Your test baseline running against both the original and the refactored code
If the module has external side effects (database writes, external API calls), a way to stub those out for comparison testing

Martin Fowler's branch-by-abstraction pattern is useful for large-scale refactoring: introduce a seam that lets you run old and new implementations in parallel and compare results before fully switching.

For simpler modules, a straightforward A/B test in a staging environment - routing a portion of traffic to the refactored implementation - gives you confidence before full deployment.

Putting It Together

The preparation sequence - scope definition, dependency audit, test baseline, critical path identification, safe environment setup - takes time. On a module of moderate complexity, expect to spend a day on preparation before writing a line of refactored code.

That investment pays back quickly. With context documents, a test baseline, and a dependency map in hand, each AI-assisted refactoring session produces output that is faster to review, safer to merge, and less likely to produce production incidents.

For the full framework on running these sessions - including prompting patterns for the refactoring phase itself - the guide on using AI coding assistants for legacy code modernization covers the end-to-end process.

137Foundry works with engineering teams on legacy modernization assessments and implementation. The 137Foundry AI automation services include preparation consulting for teams starting this process for the first time.

Prettier and ESLint are useful tools for establishing consistent code style as a baseline before starting structural refactoring - style differences in a diff make behavioral changes harder to spot. OWASP provides useful checklists for security-critical code review that apply directly to the critical path review step.

Legacy modernization done well is not fast. But with the right preparation, AI assistance makes it substantially less expensive than it used to be.