Sathish

Posted on Feb 19

Cursor + Claude: my AI code review checklist

#typescript #webdev #tutorial #javascript

I don’t “vibe code” blind. I run a checklist.
I make Claude review diffs, not ideas.
I automate checks: types, lint, tests, secret scan.
I ship fewer regressions. With receipts.

Context

I build small SaaS projects. Usually solo. Usually fast.

Cursor + Claude makes that speed possible. Also dangerous.

My first month with AI-assisted coding was… brutal. I shipped code that looked right, compiled right, and still broke auth flows because I forgot one edge case. Spent 4 hours chasing it. Most of it was wrong.

So I stopped asking AI to “build features”. I started using it like a very fast reviewer.

This post is my exact checklist. The thing I run before I merge. It’s boring. That’s the point.

1) I force Claude to review the diff. Not my vibes.

If I paste a whole file, Claude hallucinates context.

If I paste a diff, it behaves like a reviewer.

In Cursor, I select the git diff chunk. Then I ask one question:

“Review this diff. Find bugs. Find missing cases. Suggest tests. Don’t rewrite style.”

But I also give it structure. Otherwise it rambles.

Here’s the prompt template I keep in a snippet. I literally paste this.

You are reviewing a PR diff.

Rules:
- Focus on correctness, edge cases, security, and performance.
- Call out any behavior change.
- If you suggest a fix, show the minimal patch.
- Suggest at least 2 test cases.
- Don’t suggest refactors unless required for correctness.

Input:


Output format:
1) High-risk issues (with line refs)
2) Medium-risk issues
3) Tests I should add
4) Minimal patch (only if needed)

One thing that bit me — Claude will “approve” code that fails typecheck if you don’t tell it you’re using TypeScript strict mode.

So I add one line when needed:

Project: TypeScript "strict": true. Runtime: Node 20.

That’s it. No poetry.

2) I run a local CI script. Every time.

I don’t trust myself to remember commands.

So I made one script. It’s my merge gate.

This is tuned for Next.js + TypeScript. Works fine for plain Node too.

Create scripts/ci-local.mjs:

#!/usr/bin/env node
import { execSync } from "node:child_process";

const run = (cmd) => {
  console.log(`\n$ ${cmd}`);
  execSync(cmd, { stdio: "inherit" });
};

try {
  // Fast fail first
  run("node -v");
  run("npm -v");

  // Deterministic install (CI-like)
  run("npm ci");

  // Quality gates
  run("npm run lint");
  run("npm run typecheck");
  run("npm test");

  console.log("\n✅ local CI passed");
} catch (e) {
  console.error("\n❌ local CI failed");
  process.exit(1);
}

Then in package.json:

{
  "scripts": {
    "typecheck": "tsc -p tsconfig.json --noEmit",
    "ci:local": "node scripts/ci-local.mjs"
  }
}

Now my flow is simple.

Make changes with Cursor.
Ask Claude to review the diff.
Run npm run ci:local.

If it fails, I fix first. No new prompts. No scope creep.

And yeah, npm ci is slower. I still do it. I want the same pain CI will feel.

3) I make AI prove input/output behavior with tests

AI is great at writing code.

AI is also great at confidently changing behavior.

So I pin behavior with tiny tests. Always.

Even for “small” utility functions.

Here’s a real example pattern: normalize user input. Whitespace. Unicode. Case.

I’ve shipped bugs here. Twice.

src/lib/normalize.ts:

// Minimal, deterministic normalization.
// Keep it boring. Tests do the talking.
export function normalizeEmail(raw: string): string {
  return raw
    .trim()
    .toLowerCase()
    .normalize("NFKC");
}

src/lib/normalize.test.ts (Vitest):

import { describe, expect, it } from "vitest";
import { normalizeEmail } from "./normalize";

describe("normalizeEmail", () => {
  it("trims and lowercases", () => {
    expect(normalizeEmail("  Foo@Example.Com  ")).toBe("foo@example.com");
  });

  it("normalizes unicode", () => {
    // Full-width Latin chars -> normal Latin (NFKC)
    expect(normalizeEmail("Ｆｏｏ@Example.Com")).toBe("foo@example.com");
  });
});

Cursor + Claude writes these tests fast.

But I decide the cases.

That’s the trick. Don’t let AI pick what “correct” means.

If you don’t have tests set up, add Vitest:

npm i -D vitest and in package.json:

"test": "vitest run"

Then you’re not guessing anymore.

4) I scan for secrets because AI loves copying them

This one’s embarrassing.

I once pasted an .env value into chat. Then I copied code back. Then I almost committed it.

No drama. Just real life.

Now I run a secret scan locally. Same command every time.

I use gitleaks because it’s dead simple.

Install:

macOS: brew install gitleaks

Then run:

# Scan the repo (tracked + untracked)
gitleaks detect --source . --no-git --redact

Want it automated? Add a script.

scripts/secret-scan.mjs:

#!/usr/bin/env node
import { execSync } from "node:child_process";

try {
  execSync("gitleaks version", { stdio: "inherit" });
} catch {
  console.error("gitleaks not found. Install it first: brew install gitleaks");
  process.exit(1);
}

try {
  execSync("gitleaks detect --source . --no-git --redact", { stdio: "inherit" });
  console.log("✅ secret scan passed");
} catch {
  console.error("❌ secret scan failed");
  process.exit(1);
}

Then chain it into local CI:

run("node scripts/secret-scan.mjs");

This catches the dumb stuff.

The stuff you only notice after pushing.

5) I keep a “dumb log” so Claude stops repeating mistakes

Cursor chat resets. My brain resets too.

So I keep a file: NOTES.md.

Not docs. Not marketing. Just landmines.

Example entries from my real projects:

“Next.js route handlers: don’t return 200 with empty body. It becomes a silent bug.”
“Supabase RLS: always test with anon key + logged-in user. Both.”
“Zod refine: return boolean, not string. I lost 40 minutes.”

Then I feed the relevant lines to Claude when it’s about to touch that area.

It sounds silly.

It saves hours.

And it makes AI feel consistent. Like a teammate that remembers.

Results

Before I used this checklist, I’d merge 6–10 PRs a week and usually spend 2–3 hours per week debugging avoidable regressions. Stuff like “works locally, breaks in prod” or “edge case returns 200 with wrong payload”.

After I started doing diff-based reviews + ci:local + 2–4 focused tests per change, that dropped to about 20–40 minutes a week. Not zero. Never zero. But way less chaos.

Also: I’ve caught 3 secret leaks locally with gitleaks that absolutely would’ve landed in git.

Key takeaways

Make Claude review diffs. It reviews code, not stories.
One command to rule your local checks: lint, types, tests.
Write tests to freeze behavior. Especially input normalization.
Run a secret scan because AI will paste whatever you paste.
Keep a tiny landmine log (NOTES.md). Feed it back later.

Closing

Cursor + Claude makes me faster. It also makes me overconfident.

The checklist fixes that. Mostly.

If you already do AI-assisted coding: what’s your merge gate command, and what’s the one check you refuse to skip?

Top comments (1)

MergeShield • Mar 11

Love seeing practical checklists like this. Most AI code review advice targets enterprise teams with big budgets. Solo developers and small teams need patterns that work at their scale.

One addition I'd make to the checklist: risk-based triage before review. Not every PR needs the same depth of review. Before running your full checklist, score the change on two dimensions: (1) What did it touch? Config files and tests are low risk. Authentication, payments, and database migrations are high risk. (2) How complex is the change? A 10-line focused fix is different from a 200-line refactor across multiple files.

For low-risk + low-complexity changes, a quick skim might be enough. For high-risk + high-complexity, that's where you want the full Cursor + Claude review workflow. This simple triage saves significant time when you're reviewing 10+ PRs a day as a solo developer.

Another pattern that's worked well: keep a trust log for your AI tools. Track which tool generated each PR and whether it needed significant revision. Over time you build intuition for what each tool handles well and where it consistently needs human intervention. That's informal trust scoring, and it scales better than trying to deeply review everything.