Sathish

Posted on Apr 16

Cursor + Claude: a repeatable refactor workflow

#nextjs #webdev #typescript #javascript

I refactor with Cursor + Claude using a strict, diff-first loop.
I add a “tripwire” test first, so AI can’t lie.
I use one script to block dead imports + circular deps.
I ship smaller PRs by slicing refactors into phases.

Context

I build small SaaS apps. Usually solo. Usually fast.

Refactors are where I lose time. Not feature work. Refactors.

The failure mode is always the same. I ask an AI to “clean up” a folder. It touches 23 files. Tests still pass. Then I hit runtime and get TypeError: Cannot read properties of undefined. Brutal.

So I stopped doing vibe refactors.

Now I do repeatable refactors. Boring ones. With tripwires. With diffs I can actually review. Cursor + Claude still helps a lot. But only inside a tight workflow.

1) I write a tripwire test first. Always.

If I can’t prove behavior, I can’t refactor.

I don’t mean “increase coverage.” I mean a single test that fails loudly if I break the contract.

Most of my refactors are around data-shape drift. Objects coming from APIs. DB rows. JSON blobs. The tripwire is usually “given input X, output must be Y.”

Here’s a tiny example. A toUserDTO() function. I’ve broken this exact thing by renaming fields during a refactor.

// user.mapper.ts
export type UserRow = {
  id: string;
  email: string;
  created_at: string; // ISO
};

export type UserDTO = {
  id: string;
  email: string;
  createdAt: string;
};

export function toUserDTO(row: UserRow): UserDTO {
  // Keep mapping explicit. Avoid spreading unknown fields.
  return {
    id: row.id,
    email: row.email,
    createdAt: row.created_at,
  };
}

// user.mapper.test.ts
import { describe, expect, it } from "vitest";
import { toUserDTO } from "./user.mapper";

describe("toUserDTO", () => {
  it("maps snake_case to camelCase without changing values", () => {
    const dto = toUserDTO({
      id: "u_123",
      email: "a@b.com",
      created_at: "2026-04-16T10:11:12.000Z",
    });

    expect(dto).toEqual({
      id: "u_123",
      email: "a@b.com",
      createdAt: "2026-04-16T10:11:12.000Z",
    });
  });
});

Now the AI can refactor structure. Rename files. Extract helpers.

But it can’t “accidentally” change behavior without me noticing. The test becomes a landmine.

One thing that bit me — I used to ask Claude to “improve the test” too. Don’t. The test is the contract. Lock it.

2) I force Cursor to work in small, named phases

Big refactors fail because the AI loses the plot.

Cursor will happily apply a sweeping edit across a repo. Then you’re stuck reviewing 600 lines of diff with your brain half off.

My fix is dumb. I name phases and I don’t let the AI cross them.

Phase examples:

Phase A: move files, keep exports identical
Phase B: replace barrel exports
Phase C: delete dead code

I also keep a “refactor plan” file in the repo while I’m working. Not forever. Just during the refactor. Cursor’s context gets better, and I stop re-explaining myself.

<!-- REFACTOR_PLAN.md -->
# Refactor: src/lib -> src/core

## Phase A (no behavior changes)
- Move files under src/core
- Keep public exports identical
- Add temp re-exports to avoid breaking imports

## Phase B
- Update internal imports to new paths
- Remove temp re-exports

## Phase C
- Delete dead files
- Run typecheck + tests

## Non-goals
- No renaming DTO fields
- No changing runtime behavior

Then I prompt like this (inside Cursor):

“Do Phase A only. Don’t touch behavior. Keep exports identical. Update REFACTOR_PLAN.md checkbox when done.”

It sounds silly.

It works because I can review Phase A diffs quickly. Mostly file moves and export shims. If it breaks, it breaks in a small surface area.

And yeah — I’ve spent 4 hours on a refactor where I skipped this. Most of it was wrong.

3) I block dead imports and circular deps with one script

The most common refactor breakage I ship is import rot.

Stuff like:

a file moved but an old path still compiles somewhere
a circular dependency introduced by a new index.ts
a barrel export that re-exports server-only code into the client bundle

TypeScript won’t always save you. Especially with path aliases, dynamic imports, or unused exports.

So I run a dependency check script locally before I even open a PR.

This uses dependency-cruiser. It’s not perfect. But it catches the dumb stuff.

# install
npm i -D dependency-cruiser

// scripts/depcheck.mjs
import { cruise } from "dependency-cruiser";

const result = await cruise(["src"], {
  // Keep it strict. Refactors are when this matters.
  ruleSet: {
    forbidden: [
      {
        name: "no-circular",
        severity: "error",
        from: {},
        to: { circular: true },
      },
      {
        name: "no-orphans",
        severity: "warn",
        from: { orphan: true },
        to: {},
      },
      {
        name: "no-deprecated-core",
        severity: "error",
        from: { path: "^src/" },
        to: { path: "^src/legacy/" },
      },
    ],
  },

  // Make output readable in CI logs.
  outputType: "err",
  validate: true,
});

if (result.output?.trim()) {
  console.error(result.output);
  process.exit(1);
}

console.log("depcheck: ok");

// package.json
{
  "scripts": {
    "depcheck": "node scripts/depcheck.mjs"
  }
}

This catches cycles like:

src/core/index.ts -> src/core/http.ts -> src/core/index.ts

And orphans like:

src/utils/oldThing.ts that nobody imports anymore

Cursor + Claude helps here too. When depcheck fails, I paste the exact error output and ask for the smallest fix.

Smallest. Fix.

Not “refactor the architecture.”

4) I do diff reviews like a paranoid person

AI makes code changes fast. Reviewing is the bottleneck.

So I changed how I review.

Rules I follow:

I don’t review by file tree. I review by diff chunks.
I search the diff for export * from and index.ts. Every time.
I grep for any and as unknown as. If those appear during a refactor, something’s off.

This is my local helper script. It’s basic. It saves me from myself.

# scripts/refactor-review.sh
set -euo pipefail

# 1) Show summary first.
git status --porcelain

echo "\n--- DIFF STATS ---"
git diff --stat

echo "\n--- BARRELS / RE-EXPORTS ---"
# Find new/changed barrel exports in the diff
# (works even if ripgrep isn't installed)
git diff | grep -E "^\+.*export \* from|^\+.*from \"\.\/index\"" || true

echo "\n--- TYPE ESCAPES ---"
# Catch common 'make TS shut up' edits
# These are sometimes legit. Often they're hiding a broken import.
git diff | grep -E "^\+.*\bany\b|^\+.*as unknown as" || true

echo "\n--- DONE ---"

Run it:

bash scripts/refactor-review.sh

When that script prints anything under “TYPE ESCAPES”, I stop. I go fix the root cause.

Because the root cause is usually: I moved a file, broke an import, and the AI patched it with a cast.

That’s not a fix. That’s debt.

Results

My refactors got smaller.

Before this workflow, a “cleanup” often turned into 20 to 40 files changed in one sitting, with at least 1 runtime-only bug that I’d catch after deploying to preview.

This week, I did 3 refactor batches. Each stayed under 12 files changed. I hit npm run depcheck 9 times total and it caught 2 circular dependencies and 6 orphan files before they hit git history.

The bigger win: review time. I went from staring at diffs for ~45 minutes per refactor to ~15 minutes, because the changes were phased and the tripwire test stayed stable.

Key takeaways

Write one tripwire test before touching structure.
Refactor in named phases. Cursor needs boundaries.
Add a dependency check script. Run it a lot.
Treat any and double-casts as refactor smoke.
Keep exports boring until the very end.

Closing

Cursor + Claude can refactor fast. That’s the easy part.

The hard part is keeping refactors reviewable and reversible, especially when you’re solo and nobody’s catching your mistakes.

What’s your current “tripwire” for refactors — a specific test, a script like depcheck, or something else you run every single time?

Top comments (1)

APIBuilderHQ • Apr 20

The diff-first + tripwire test approach makes refactors feel much safer. Adding a dedicated “review context” layer before letting the model touch code has saved me from a lot of over-engineering.How do you usually catch cases where the AI starts going off-track during a refactor?