DEV Community

brian austin
brian austin

Posted on

How I use Claude Code to refactor legacy code — without breaking production

How I use Claude Code to refactor legacy code — without breaking production

Legacy code refactoring is one of the highest-risk activities in software development. You're changing code that works — just badly — and you need to make it better without breaking anything.

I've been using Claude Code for legacy refactoring for the past few months and developed a workflow that makes it significantly safer. Here's exactly what I do.

The core problem with legacy refactoring

Legacy code refactoring fails for predictable reasons:

  1. No tests — you can't know if you broke something
  2. Hidden dependencies — changing one function breaks something 5 files away
  3. Context loss — the AI doesn't understand why the code was written this way
  4. Scope creep — refactoring one thing exposes 10 other problems
  5. Rate limits — long refactoring sessions hit token limits mid-task

My workflow addresses all five.

Step 1: Map before you touch

Before changing a single line, I have Claude Code build a complete map of the code I'm about to refactor.

Claude, I want to refactor [function/module/file]. Before we change anything:

1. List every function in this file
2. List every file that imports or calls these functions
3. List every external dependency (DB calls, API calls, filesystem)
4. Identify what has tests and what doesn't
5. Flag anything that looks load-bearing (error handling, auth checks, etc.)

Do NOT change anything yet. Just map.
Enter fullscreen mode Exit fullscreen mode

This map becomes your refactoring contract. You know exactly what's in scope before you touch it.

Step 2: Write characterization tests

Legacy code usually has no tests. Before refactoring, I write what Michael Feathers calls "characterization tests" — tests that document what the code currently does, right or wrong.

Now write characterization tests for [function]. These tests should:
- Capture the current behavior exactly (even if that behavior seems wrong)
- Cover all code paths visible in the function
- Use simple assertions, not mocks
- Be named like: test('currently returns null for empty input', ...)

Don't test what it SHOULD do. Test what it DOES do.
Enter fullscreen mode Exit fullscreen mode

This gives you a safety net. If your refactoring changes the output of anything, tests will catch it.

Step 3: Refactor in small, verifiable chunks

Never ask Claude Code to refactor an entire module at once. The session will hit token limits, context will drift, and you'll end up with half-refactored code.

Instead:

Refactor only [specific function]. Rules:
- Keep the same function signature (don't change inputs/outputs)
- Don't touch any other functions
- Run the characterization tests after each change
- If any test fails, stop and tell me before continuing
Enter fullscreen mode Exit fullscreen mode

One function at a time. Tests passing after each one.

Step 4: Handle the "why was this written this way" problem

Legacy code often has bizarre patterns that exist for good historical reasons — a workaround for an old browser bug, a hack for a vendor API limitation, a performance optimization for hardware that no longer exists.

I always include this prompt before starting:

Before refactoring, look for comments, commit messages, or patterns that suggest
WHY this code exists in its current form. Flag anything that looks like an
intentional workaround rather than bad code. I'll decide whether to keep those
parts before you refactor them.
Enter fullscreen mode Exit fullscreen mode

This prevents Claude from "helpfully" removing the one hack that keeps everything working.

Step 5: The token limit problem

Long refactoring sessions inevitably hit rate limits. This is especially painful mid-refactoring — you're halfway through a function, tests are failing, and the session ends.

My mitigation:

Before starting a long refactoring session:

We're about to do a long refactoring session. At any point where you've completed
a logical chunk (function, class, module), output a brief status report:
- What was refactored
- Current test status
- What remains
- Any open issues

This way if we hit rate limits, I can restart with context.
Enter fullscreen mode Exit fullscreen mode

When you DO hit rate limits, having a separate API endpoint means you can continue immediately without waiting for rate limit resets. I use SimplyLouie as my backup endpoint — it's $2/month and proxies to the same Claude API, so my CLAUDE.md config and workflow carry over exactly.

The status report format means I can start a new session with:

Continuing refactoring session. Status so far: [paste status report]
Next: [what was remaining]
Enter fullscreen mode Exit fullscreen mode

Zero context loss even after a rate limit.

Step 6: Integration testing before merge

Once all unit-level characterization tests pass, I do one final integration check:

The refactoring is complete. Now:
1. Check if any of the calling code (from our map in Step 1) needs updating
2. Look for any type mismatches between our refactored functions and their callers
3. Check for any database transactions that might have changed behavior
4. Write one integration test that exercises the full path through the refactored code
Enter fullscreen mode Exit fullscreen mode

This catches the cross-file issues that unit tests miss.

The CLAUDE.md additions for refactoring projects

I add these to my CLAUDE.md when starting a refactoring project:

## REFACTORING RULES
- Never refactor more than one function per message
- Always run tests after each change
- Never change function signatures without explicit permission
- Flag any workarounds or unusual patterns before removing them
- Output status report after each completed chunk
Enter fullscreen mode Exit fullscreen mode

What this workflow solves

Problem Solution
No tests Characterization tests before touching anything
Hidden dependencies Map phase before refactoring
Context loss Status reports at every checkpoint
Scope creep One function at a time rule
Rate limits Backup API endpoint + status report format

Real results

Using this workflow, I recently refactored a 1,200-line authentication module that hadn't been touched in 4 years. The whole process took about 6 hours of Claude Code sessions. Zero production incidents. All 47 characterization tests still passing.

Without the workflow (my previous attempt, 3 months earlier): I got 60% through the refactoring, hit a rate limit, lost context, and spent 2 hours trying to recover the half-finished state. Eventually rolled back.

The difference: checkpoints, characterization tests, and having a reliable API endpoint when rate limits hit.


If you're doing serious refactoring work with Claude Code and hitting rate limits, SimplyLouie is $2/month — same Claude API, no rate limit anxiety.

Top comments (0)