How I Cut a 3-Day Refactor Down to 4 Hours Using a Single AI Prompt Pattern
Last quarter I inherited a 4,000-line Node.js service — no tests, inconsistent error handling, callbacks mixed with promises. My estimate to the team: 3 days minimum.
I finished in 4 hours. Here's exactly what I did differently.
The Setup
The service was a payment webhook handler. Spaghetti, but well-defined inputs and outputs — I knew what "correct" looked like. That constraint matters for what came next.
The Prompt Pattern That Changed the Work
Instead of asking AI to rewrite the file (which produces confident garbage), I used a constraint-first decomposition prompt:
Context: I have a Node.js webhook handler, ~400 lines, mixed callbacks/promises,
no error boundaries. I cannot change the function signatures — external callers depend on them.
Task: Identify the 5 highest-risk sections I should refactor first, ranked by:
1. Likelihood of silent failure
2. How much other code depends on it
For each section, give me: the specific anti-pattern, a one-paragraph explanation of the risk,
and a before/after snippet using async/await. Do not refactor anything outside those 5 sections.
That last line — "do not refactor anything outside those 5 sections" — is the key. It forces the model to act like a scoped reviewer, not an eager rewriter.
What Came Out
The model returned five ranked sections with concrete snippets. Two of them I already suspected. Three I had missed entirely — one was a .catch() that silently swallowed a database timeout and returned a 200 to Stripe anyway. That bug had been in production for months.
I ran each suggested change independently, verified behavior, committed. No big-bang rewrite. No merge conflict nightmare.
The Numbers
| Metric | Before | After |
|---|---|---|
| Estimated time | 3 days | — |
| Actual time | — | 4 hours |
| Test coverage | 0% | 74% (added alongside) |
| Silent failure points found | 2 (known) | 5 (3 new) |
What Made It Work
- Constraints in the prompt — scope prevents hallucinated rewrites
- Explicit ranking criteria — "likelihood of silent failure" gets better output than "what's bad"
- Incremental commits — treated each AI suggestion as a PR, not a paste job
The difference isn't the AI. It's the prompting discipline. Most developers throw a file at a model and hope. Giving it a job description with guardrails is a different skill — and it compounds fast once you have a small library of patterns like this one.
If you want the full set of prompt patterns I've built around legacy code, reviews, and debugging, I've packaged them up here: AI Prompt Playbook for Developers.
Top comments (0)