How I use Claude Code to refactor legacy code safely — a complete workflow
Legacy code refactoring is one of the highest-risk activities in software development. You're changing code that works (sort of), with unclear dependencies, minimal tests, and the constant fear of breaking something in production.
I've been using Claude Code to make this process systematic and much safer. Here's my exact workflow.
The problem with legacy code refactoring
Most refactoring attempts fail for one of three reasons:
- Unclear scope — you start changing one thing and discover it touches 40 other files
- Missing test coverage — you can't verify your changes didn't break anything
- Context loss — in a long refactoring session, you forget what you've already changed
Claude Code addresses all three, but only if you approach it systematically.
Phase 1: Map the blast radius before touching anything
Before writing a single line of new code, I ask Claude to map every dependency:
Don't change anything yet. First:
1. Read [the module/file I want to refactor]
2. Find every file that imports or calls functions from it
3. For each caller, identify which specific functions/exports it uses
4. Create a dependency map showing: [file] → [functions used] → [callers]
5. Flag any circular dependencies
6. Tell me the minimum safe refactoring surface
This single step has saved me from disasters multiple times. I once discovered that a "simple" utility function was called in 23 different places across 8 modules before I started changing it.
Phase 2: Write characterization tests
Before refactoring, you need tests that capture current behavior — even if that behavior is wrong.
For each function in [module], write characterization tests that:
1. Document current input/output behavior exactly as-is
2. Include edge cases and error conditions
3. Don't judge whether the behavior is correct — just capture it
4. Use descriptive test names like 'currently returns null for empty input'
These tests should ALL pass before we start refactoring.
Characterization tests are a safety net. If your refactored code breaks one of these tests, you know you've changed behavior (intentionally or not).
Phase 3: Incremental extraction
Now the actual refactoring — but in small, verifiable steps:
Refactor [function X] only. Do not touch anything else.
Steps:
1. Create the new version of the function in a new file
2. Make the old function call the new one (strangler fig pattern)
3. Run the characterization tests — they should all pass
4. Show me the diff before applying it
The strangler fig pattern is critical for legacy code. You never delete the old code — you route around it until the new code is proven to work.
Phase 4: Handle the token limit problem
Legacy refactoring sessions are long. You're reading dozens of files, tracking dozens of changes, running tests repeatedly. This is exactly the kind of session that hits Claude's rate limits.
When you hit a rate limit mid-refactoring, you need a reliable API endpoint to continue without losing context.
I use SimplyLouie as my backup endpoint — it's $2/month and exposes the same Claude API so I can continue the session immediately:
# Set the base URL and continue immediately
export ANTHROPIC_BASE_URL=https://simplylouie.com/api
claude --continue
The --continue flag lets you pick up exactly where you left off. No context lost.
Phase 5: Validation before merge
Before merging refactored code:
Validation checklist for this refactoring:
1. Run all characterization tests — do they pass?
2. Run any existing integration tests — do they pass?
3. Check for any console.log or debug code left in
4. Verify the public API surface is identical (same exports, same signatures)
5. Check for any hardcoded values that should be constants
6. Look for any TODO comments that need addressing
I also ask Claude to do a final consistency check:
Compare the original [module] with the refactored version.
List any behavioral differences — even small ones.
Are there any cases where the refactored version would return a different result?
Real example: Refactoring a 400-line utility module
Here's how this played out on a real project. The module was a 400-line utils.js with mixed concerns: string manipulation, date formatting, API helpers, and data validation all jumbled together.
Step 1: Dependency map — Claude found 31 files importing from utils.js, using 14 different functions
Step 2: Characterization tests — 47 tests written in 8 minutes, all passing
Step 3: Extraction — Split into 4 focused modules: strings.js, dates.js, api-helpers.js, validators.js
Step 4: Migration — Updated 31 import statements using a single Claude prompt:
Update all 31 files that import from utils.js to import from the correct new module.
Here's the mapping: [function] → [new module]
Do them in batches of 10 and run the characterization tests after each batch.
Step 5: Validation — All 47 characterization tests passed. Zero behavioral regressions.
Total time: about 2 hours. Without Claude, this would have been a multi-day project with a high chance of introducing bugs.
The key insight
Legacy code refactoring with Claude Code isn't about asking Claude to "refactor this" — it's about using Claude as a systematic analysis and execution tool while you maintain strategic control.
You decide what to refactor and why. Claude maps the blast radius, writes the safety net tests, makes the incremental changes, and validates the results.
The workflow above scales to any size codebase. I've used it on single 100-line files and on multi-module systems with hundreds of interdependencies.
Running into token limits during long refactoring sessions? SimplyLouie is a $2/month Claude API proxy — same API, no usage limits, works with --continue to pick up exactly where you left off.
Top comments (0)