How I use Claude Code to refactor legacy code without breaking everything
Legacy code refactoring is one of the most dangerous things you can do. You touch one thing, something else breaks. You think you understand the system, then you discover it depends on a side effect you didn't know existed.
I've been using Claude Code for legacy refactoring for several months now. Here's what actually works — and what doesn't.
The core problem with legacy code + AI
The naive approach: paste the old code into Claude and say "refactor this."
The problem: Claude doesn't know what the code is connected to. It doesn't know which parts are load-bearing and which are vestigial. It doesn't know about the one weird side effect that three other things depend on.
The result: a clean-looking refactor that breaks in production in ways that are hard to trace.
The workflow that actually works
Step 1: Map before you touch
Before any refactoring, I do a mapping session:
# Start Claude Code in the repo root
claude
# Then in the session:
> Analyze src/payments/processor.js and tell me:
> 1. What does this file export?
> 2. Where are those exports used across the codebase?
> 3. What external systems does this touch (DB, APIs, queues)?
> 4. What would break if this file changed?
> Do NOT change anything. Just map.
This produces a dependency map before Claude touches a single line.
Step 2: Write characterization tests first
Characterization tests document what the code currently does — not what it should do. They capture the current behavior, warts and all.
> Based on your analysis, write characterization tests for processor.js.
> These tests should capture the CURRENT behavior, including any quirks.
> Use the existing test framework (check package.json).
> Do NOT refactor anything yet. Just tests.
Now you have a safety net. If the refactor breaks anything, the tests catch it.
Step 3: Refactor in small slices
Never ask Claude to refactor an entire file at once. Slice it:
> Now refactor ONLY the error handling in processor.js.
> Keep everything else identical.
> Run the characterization tests after each change.
> Stop if any test fails.
Small slices = small blast radius if something goes wrong.
Step 4: The "explain what changed" check
After each refactor slice:
> Explain exactly what you changed and why.
> List any behavior differences between the old code and new code.
> Are there any edge cases the old code handled that the new code doesn't?
This forces Claude to surface implicit behavior changes before they hit production.
The patterns that trip up Claude
Monkey-patched globals: If legacy code patches global objects, Claude might clean that up — breaking things that depend on the patched behavior.
Fix: explicitly tell Claude which globals exist and why.
Implicit ordering dependencies: Code that depends on execution order in non-obvious ways.
Fix: ask Claude to identify all stateful operations and their order before refactoring.
Error swallowing: Legacy code often catches and ignores errors. Claude's refactors tend to surface these as thrown exceptions — which can break callers that expected silent failures.
Fix: ask Claude to flag every caught-and-ignored error before touching them.
Database side effects: Legacy code sometimes has DB writes buried in unexpected places (getters, validators, etc.).
Fix: grep for all DB calls before starting. Tell Claude exactly where they are.
The CLAUDE.md setup for legacy work
For any legacy refactoring project, I add a section to CLAUDE.md:
## Legacy Code Rules
- ALWAYS map dependencies before refactoring
- ALWAYS write characterization tests before changing behavior
- NEVER refactor more than one concern at a time
- NEVER change error handling behavior without explicit approval
- NEVER remove code that looks unused without grepping for dynamic requires
- Flag any code that touches: [list your critical systems]
This persists across sessions. Every new Claude Code session in this repo starts with these constraints.
The rate limit problem with legacy refactoring
Legacy refactoring sessions are token-hungry. You're feeding Claude:
- The old code
- The dependency map
- The characterization tests
- The refactored code
- The explanation of changes
For a large legacy module, this can easily hit 50,000-100,000 tokens per session. At Claude's standard rate limits, you'll hit the wall mid-refactor — exactly when you don't want interruptions.
I solved this by switching my ANTHROPIC_BASE_URL to SimplyLouie — it's a $2/month Claude API proxy that removes the per-session rate limiting. When you're in the middle of a complex legacy refactoring session, you don't want to stop and wait for limits to reset.
The checklist I run before every legacy refactoring session
□ CLAUDE.md has legacy code rules section
□ Characterization tests written and passing
□ Dependency map documented
□ Critical systems identified and listed
□ Small slice defined (not "refactor the whole thing")
□ Rate limits checked — do I have enough headroom?
□ Rollback plan: git stash, feature branch, or backup
What I've refactored successfully
- A 3,000-line Express middleware chain from 2019 (took 4 sessions)
- A payment processor with 47 implicit state dependencies
- A reporting system that mixed business logic with SQL in ways that would make a DBA cry
- An authentication module that had been patched by 6 different engineers over 5 years
In every case, the characterization-tests-first approach was what made it safe.
The anti-pattern to avoid
The worst thing you can do: ask Claude to "clean up" legacy code without constraints. It will produce beautiful, modern code that silently changes behavior in 3-4 places. You won't find out until something breaks in production at 2am.
Constraints are not limitations. They're what make Claude Code actually safe for legacy work.
If your legacy refactoring sessions are hitting Claude's rate limits mid-session, SimplyLouie is a $2/month API proxy that removes per-session limits. 7-day free trial, card required — but if a mid-session rate limit has ever cost you an hour of work, it pays for itself.
Top comments (0)