brian austin

Posted on Apr 13

How I use Claude Code to refactor legacy code — a complete workflow

#claudecode #refactoring #ai #workflow

How I use Claude Code to refactor legacy code — a complete workflow

Every codebase has that folder. The one nobody touches. The one where git blame shows the last commit was 4 years ago and the author has left the company.

I've been using Claude Code to systematically tackle legacy refactoring — and it's changed how I approach these sessions completely.

Here's my exact workflow.

The problem with legacy refactoring

Legacy code refactoring sessions are long. We're talking hours of:

Reading ancient code trying to understand intent
Tracing call chains through 12 files
Writing tests for code that was never designed to be tested
Making small changes and praying nothing breaks

This is exactly the kind of session where AI assistants shine — and exactly the kind where you hit rate limits at the worst possible moment.

Step 1: Understand before you touch

The single biggest mistake in legacy refactoring is jumping straight to changing code. I start every session with a "read-only" phase:

Read these files and tell me:
1. What is this code trying to do? (business intent, not technical description)
2. What are the hidden assumptions baked into this code?
3. What would break if I changed the core data structure?
4. What does this code do that the tests DON'T cover?

Files: [paste the legacy code]

This prompt consistently surfaces assumptions I would have missed for another 2 hours of reading.

Step 2: Map the blast radius

Before changing anything, I need to know what depends on this code:

Given this module:
[paste module]

And these files that import it:
[paste importers]

Create a dependency map showing:
- What each caller expects from this module
- Which callers would break with each type of change
- Which callers are most tightly coupled

This gives me a refactoring order — start with the most isolated code, work toward the most coupled.

Step 3: Write characterization tests first

Before changing a single line, I need tests that document current behavior:

This code has no tests. I need characterization tests that:
1. Document what it currently does (even if it's wrong)
2. Will catch any accidental behavior changes
3. Don't assume the code is correct — just capture what it does

Here's the code:
[paste code]

Here's a sample of real inputs/outputs I captured:
[paste examples]

Characterization tests are different from regular tests — they're not testing correctness, they're creating a safety net.

Step 4: Incremental refactoring with explicit constraints

Now I can actually start refactoring. The key is small, verifiable steps:

Refactor this function with these constraints:
1. Do NOT change the function signature or return type
2. Do NOT change the behavior — only the internal structure
3. Extract the [specific concern] into a named helper function
4. Add a comment for any non-obvious logic you keep

Original:
[paste function]

Characterization test output:
[paste test results]

Small constraints = predictable changes = less debugging.

Step 5: Handle the "why is this even here" moments

Every legacy codebase has mystery code. Dead conditionals. if (false) blocks. Commented-out code from 2019. I use:

This code appears to be dead/vestigial:
[paste suspicious code]

Context — it's called from:
[paste callers]

And was last changed:
[paste git log snippet]

Possible explanations:
1. Is this actually dead code?
2. Is this defensive code for an edge case?
3. Is this a feature flag that was never cleaned up?
4. What's the safest way to verify and remove it?

This conversation saves me from deleting "dead" code that turns out to be critical for some edge case in production.

The rate limit problem

Here's where it gets real: long refactoring sessions eat tokens fast.

A typical legacy module session goes:

Phase 1 (understand): 3,000 tokens
Phase 2 (blast radius): 4,000 tokens
Phase 3 (tests): 8,000 tokens
Phase 4-5 (actual refactoring, 3-4 functions): 15,000 tokens

That's 30,000 tokens for ONE module. If you're working on a legacy codebase with 50 modules, you're looking at serious quota pressure.

This is why I use SimplyLouie for my Claude sessions — flat rate at ✌️$2/month with no per-token anxiety. I can run these deep refactoring sessions without watching a usage meter.

The output format that works

For every refactored function, I ask Claude to output in this structure:

## [function name]

**Before:** [one-sentence description of old behavior]
**After:** [one-sentence description of new behavior]  
**Changed:** [what structurally changed]
**Preserved:** [what behavior was kept exactly]
**Risks:** [any edge cases to verify manually]

This becomes my PR description and the review checklist in one shot.

Results

Using this workflow on a 4-year-old Node.js codebase:

Refactored 12 core modules in 3 days (would have taken 2+ weeks before)
Found 3 actual bugs in the "working" legacy code during characterization
Zero production incidents from the refactoring
Test coverage went from 12% to 71% as a byproduct

The key insight: AI doesn't make legacy code easier to understand. It makes the process of understanding more systematic and faster.

If you're running long refactoring sessions and hitting Claude rate limits, check out SimplyLouie — it's what I use for unlimited Claude access at $2/month.

DEV Community

How I use Claude Code to refactor legacy code — a complete workflow

How I use Claude Code to refactor legacy code — a complete workflow

The problem with legacy refactoring

Step 1: Understand before you touch

Step 2: Map the blast radius

Step 3: Write characterization tests first

Step 4: Incremental refactoring with explicit constraints

Step 5: Handle the "why is this even here" moments

The rate limit problem

The output format that works

Results

Top comments (0)