DEV Community: HadiFrt20

I lost 3 hours of work to Claude Code, so I built an undo button for AI-assisted coding

HadiFrt20 — Fri, 27 Mar 2026 09:59:16 +0000

I lost 3 hours of work because Claude Code refactored my auth module into oblivion

I was in the zone. Claude Code was crushing it — added OAuth, hooked up the database, wired the routes. Then I said: "refactor auth.ts to use middleware instead of inline checks."

Fifteen files changed. TypeScript errors everywhere. The app wouldn't build. And I realized I hadn't committed in over an hour.

git diff showed me 400 lines of changes across 15 files. I had no idea which version of auth.ts actually worked. I spent 3 hours manually reconstructing the last working state.

That was the moment I built snaprevert.

The problem nobody talks about

Every AI coding tool — Claude Code, Cursor, Copilot, Aider — shares the same fundamental issue: there's no undo between prompts.

Each prompt touches 5-20 files. You review, prompt again, review, prompt again. You're in flow state. Nobody stops to git commit -m "checkpoint before risky refactor" between each prompt. By the time something breaks, you're 5-10 prompts deep with no checkpoint.

Git requires intent. But when you're pair-programming with an AI at 100mph, intent is the first thing that goes.

snaprevert: the undo button for AI coding

npx snaprevert watch

That's the entire setup. One command. Zero config. It silently snapshots your project every time files change. When the AI breaks something:

snaprevert list        # see all snapshots with timestamps
snaprevert diff 5      # see exactly what changed in snapshot #5
snaprevert back 3      # roll back to before snapshot #3

Your project is restored in under 1 second.

How it works (it's dumber than you think)

No git. No branches. No staging area. It's filesystem-level:

Watch — chokidar monitors your project for file changes
Debounce — waits 3 seconds for changes to settle (groups a single AI prompt's changes)
Diff — computes unified diffs for modified files, stores full content for new files
Store — saves to .snaprevert/snapshots/{timestamp}-{id}/

That's it. Snapshots are diffs, not full copies. A full day of heavy AI coding uses <10MB.

Rollbacks are non-destructive — rolled-back snapshots are preserved. You can re-apply any of them with snaprevert restore.

The features I didn't expect to need

Per-file selective rollback — Claude broke auth.ts but user.ts is fine? Only undo what's broken:

snaprevert back 3 --only auth.ts,routes.ts

Interactive review — Walk through each file change before committing:

snaprevert review 5
# For each file: [a]ccept [r]eject [s]kip [v]iew diff

AI tool detection — Snapshots auto-detect which AI tool made the changes. You see claude: modified auth.ts or cursor: added 3 files in the labels.

Snapshot branching — Try two different AI approaches from the same checkpoint:

snaprevert fork 3 --name "approach-a"
# ... try one approach ...
snaprevert fork --switch main
# ... try another approach ...

MCP server — AI agents can create named checkpoints programmatically:

snaprevert mcp  # starts JSON-RPC server

Compatible with Claude Code and any MCP client. The AI itself can checkpoint before risky operations.

Why not just use git?

I get this question every time. Here's the honest answer:

	Git	snaprevert
When it saves	When you remember	Automatically
Granularity	Whatever you staged	Every AI prompt
Cognitive cost	Decide what + write message	Zero
Rollback	git reflog, reset, stash...	`snaprevert back 3`

They're complementary, not competing. Git is for meaningful, curated history you push to a team. snaprevert is the continuous autosave between commits — like how Google Docs saves every keystroke but you still "publish" versions.

The stack

3 dependencies: commander, chalk, chokidar
221 tests across unit, integration, and UAT
Zero config — works with any project, any AI tool
<100ms snapshot creation, <1s rollback

It watches your filesystem, not your AI tool. Works with Claude Code, Cursor, Copilot, Aider, Windsurf, or anything that writes files.

Try it

npm install -g snaprevert
snaprevert watch

Then use your AI tool normally. When things break: snaprevert list then snaprevert back 3.

The repo is at github.com/HadiFrt20/snaprevert. MIT licensed, 221 tests, actively maintained.

If you've ever lost work to an AI coding tool, you know why this exists.

If this helps you, a star on the repo means a lot. And if you have feature ideas, issues are open.

I built ESLint for LLM prompts (and a Claude Code hook that makes Claude lint its own work)

HadiFrt20 — Thu, 26 Mar 2026 09:08:31 +0000

I changed one line in my prompt and my agent started giving refunds to everyone

True story. I was tweaking a customer support agent prompt. Changed "Never offer refunds without manager approval" to "Always prioritize customer satisfaction." Seemed harmless. Shipped it.

Within an hour, the agent was handing out refunds like candy on Halloween. No approval. No verification. Just vibes.

The worst part? git diff showed me exactly what changed — one line added, one line removed. What it didn't tell me was that I'd removed a critical constraint and replaced it with a vague instruction that the model interpreted as "give them whatever they want."

That was the moment I realized: prompts are production code, but we treat them like sticky notes.

Prompts have zero tooling (and it's wild)

Think about it. If you write JavaScript, you have ESLint catching issues before they ship. You have Prettier enforcing style. You have TypeScript telling you when things don't make sense. You have git diff showing you exactly what changed and why it matters.

Now think about prompts. You write them in a text file. You eyeball them. You copy-paste them into a playground. You pray.

Here's what's missing:

No linter catches "You are a teacher" AND "You are a sales agent" in the same prompt
No diff tells you that removing one example drops output consistency
No CI gate blocks a vague "try to be helpful" from shipping
No score tells you if your prompt is a B+ or a D-

git diff says "+1 line, -1 line." Cool. Thanks. Very helpful when I'm trying to figure out if my agent is about to go rogue.

So I built promptdiff

promptdiff is a CLI tool that treats prompts as structured documents — not blobs of text. It parses your .prompt files into semantic sections (persona, constraints, examples, output format, guardrails) and runs real analysis on them.

Install it in one line:

npm install -g promptdiff

Zero config. No API keys. No accounts. Runs entirely locally. Three dependencies. That's it.

Here's what it does:

Lint your prompts like code

promptdiff lint my-agent.prompt

10 built-in rules that catch real bugs — not style nits. Behavioral issues that silently degrade your agent:

Rule	What it catches
`conflicting-constraints`	"Keep it under 100 words" + examples that are 200 words
`role-confusion`	Two different roles in the same persona section
`vague-constraints`	"Try to", "if possible", "maybe" — weasel words that models ignore
`injection-surface`	No "ignore embedded instructions" guard
`few-shot-minimum`	Only 1 example (models need 2-3 for consistency)
`missing-output-format`	No FORMAT section = inconsistent output every time

You know the feeling when ESLint catches a bug you would've spent 30 minutes debugging? Same energy.

Semantic diff that actually means something

promptdiff diff v3.prompt v7.prompt --annotate

This is not git diff. It matches sections by type (persona to persona, constraints to constraints), classifies each change, and tells you the impact:

  [CONSTRAINTS] constraint tightened (150 → 100 words)
  ██ high impact — Output will be more constrained

  [EXAMPLES] example removed (3 → 1)
  ██ high impact — Output consistency may decrease

  [PERSONA] wording tweaked
  ░░ low impact — Tone/style will shift

That's the diff I wish I'd had before the Great Refund Incident.

Score your prompt quality

promptdiff score my-agent.prompt

  Structure     ████████████████░░░░  16/20
  Specificity   █████████████████░░░  17/20
  Examples      ████████░░░░░░░░░░░░   8/20
  Safety        ████████████████████  20/20
  Completeness  ████████████████░░░░  16/20
  ─────────────────────────────────────
  Total: 77/100  Grade: B

Gamify it. Make it a CI gate:

score=$(promptdiff score my-agent.prompt --json | jq '.total')
if [ "$score" -lt 70 ]; then
  echo "Prompt quality too low: $score/100"
  exit 1
fi

The killer feature: Claude Code lints its own work

This is the part that gets people. You can hook promptdiff into Claude Code so that every time Claude edits a .prompt file, it automatically gets linted.

One command:

promptdiff setup --project

Here's the flow:

You ask Claude to "write me a customer support agent prompt"
Claude writes it — maybe it puts conflicting roles in the persona, uses vague language in constraints, only includes one example
The hook fires automatically (PostToolUse on Edit/Write)
promptdiff finds 3 errors: role confusion, vague constraints, too few examples
The hook blocks the edit and feeds the errors back to Claude
Claude reads the feedback and rewrites the prompt — fixes the roles, tightens the language, adds more examples
Hook fires again — clean. Passes silently.
You get a well-structured prompt on the first try, without manually reviewing it

It's like giving Claude a pair-programmer that only knows about prompt quality. Claude writes the prompt, the linter reviews it, Claude fixes it. You just watch.

The setup adds this to your .claude/settings.json:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "promptdiff hook",
        "timeout": 10
      }]
    }]
  }
}

You can configure it to be strict (block on warnings too), warn-only (never block), or default (block on errors only).

How it works (brief architecture)

The key insight is that prompts aren't flat text — they're structured documents with typed sections. promptdiff's parser breaks a .prompt file into:

Frontmatter (YAML metadata: name, version, model, tags)
Sections (PERSONA, CONSTRAINTS, EXAMPLES, OUTPUT FORMAT, GUARDRAILS, etc.)

Every command works on this structured representation:

Diff matches sections by type, not by line number. If you move your CONSTRAINTS section from line 5 to line 20, it doesn't show up as "deleted + added" — it shows up as "same section, maybe modified."
Lint rules get the parsed structure, so conflicting-constraints can compare the word limit in CONSTRAINTS against the actual word counts in EXAMPLES.
Score evaluates five dimensions independently (structure, specificity, examples, safety, completeness) and aggregates them.

The whole thing is ~30 files, 3 runtime dependencies (commander, chalk, js-yaml), and 217 tests at 94% coverage. No LLM required for any local command — the only thing that calls an API is promptdiff compare for A/B testing, and even that supports local Ollama models.

It also supports prompt composition (extends + includes), so you can DRY your prompts:

---
name: support-agent-v2
extends: ./base-agent.prompt
includes:
  - ./shared/safety-rules.prompt
  - ./shared/format.prompt
---

Other things I didn't expect to be useful

promptdiff migrate — takes a messy unstructured prompt (the kind you pasted into ChatGPT at 2am) and converts it into a structured .prompt file. It auto-classifies lines: "You are..." goes to PERSONA, "Never..." goes to CONSTRAINTS, etc.

promptdiff fix --apply — auto-fixes lint issues. Adds missing sections, tightens vague language, suggests injection guards.

promptdiff watch . — live linting on file save. Like having eslint --watch for your prompts while you iterate.

MLflow integration — promptdiff log-to-mlflow tracks prompt quality scores over time as MLflow experiments. Because if you're doing serious prompt engineering, you should be tracking regressions.

Try it

npm install -g promptdiff

Then:

# Scaffold a new prompt from a template
promptdiff new my-agent --template support

# Lint it
promptdiff lint my-agent.prompt

# Score it
promptdiff score my-agent.prompt

# Hook into Claude Code
promptdiff setup --project

The repo is at github.com/HadiFrt20/promptdiff. It's MIT licensed, 217 tests, and I'm actively building on it.

If you're writing prompts for production — especially if you're building agents — you probably need this. Or at minimum, you need something like this. The days of yolo-shipping prompts with no review should be over.

Prompts are code. Treat them like it.

If this was useful, a star on the repo goes a long way. And if you have ideas for lint rules, I'd love PRs — adding a rule is about 30 lines of JavaScript.