After 12 months running Claude Code on 14 parallel projects, the same regression cycle kept happening:
Claude: "All tests passing ✅"
Me: [merges PR]
Prod: [throws 500s]
Me: [2h debugging]
Tomorrow: [same cycle]
The model isn't lying on purpose — it's just optimistic about its own work. The fix isn't a better prompt. The fix is a workflow guard.
So I built verify-before-stop.sh — a Stop hook that blocks the session from ending until verification is actually logged. Here's the full implementation and why it works.
The pattern
Claude Code's Stop hook fires when the model tries to end a session. The hook receives JSON on stdin with stop_hook_active, transcript_path, session_id. If the hook exits non-zero with stderr message, Claude continues the turn — and the stderr becomes part of the model's context.
That's the leverage point. If the model claims "done" without proof, exit 2 with instructions for what proof is required.
The hook (50 lines)
#!/bin/bash
# verify-before-stop.sh
INPUT=$(cat)
# Avoid infinite loop — Claude Code sets stop_hook_active=true
# when continuing from a previous block
STOP_HOOK_ACTIVE=$(echo "$INPUT" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('true' if d.get('stop_hook_active') else 'false')" \
2>/dev/null)
if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
exit 0
fi
VERIFY_LOG=".claude/state/stop-verify.log"
mkdir -p .claude/state
# No file changes → pure conversation → allow stop
HAS_CHANGES=$(git diff --name-only 2>/dev/null | head -5)
HAS_UNTRACKED=$(git ls-files --others --exclude-standard 2>/dev/null | grep -v '.claude/state/' | head -5)
if [ -z "$HAS_CHANGES" ] && [ -z "$HAS_UNTRACKED" ]; then
exit 0
fi
# Files changed → require VERIFIED log entry in last 5 min
if [ -f "$VERIFY_LOG" ]; then
FIVE_MIN_AGO=$(date -v-5M +%s 2>/dev/null || date -d '5 minutes ago' +%s 2>/dev/null || echo 0)
LAST_VERIFY=$(grep '|VERIFIED' "$VERIFY_LOG" 2>/dev/null | tail -1 | cut -d'|' -f1)
LAST_ACTION=$(grep '|VERIFY_ACTION' "$VERIFY_LOG" 2>/dev/null | tail -1 | cut -d'|' -f1)
if [ -n "$LAST_VERIFY" ] && [ "$LAST_VERIFY" -gt "$FIVE_MIN_AGO" ] 2>/dev/null; then
if [ -n "$LAST_ACTION" ] && [ "$LAST_ACTION" -gt "$FIVE_MIN_AGO" ] 2>/dev/null; then
echo "$(date +%s)|STOP_ALLOWED" >> "$VERIFY_LOG"
exit 0
fi
fi
fi
echo "$(date +%s)|STOP_BLOCKED" >> "$VERIFY_LOG"
echo "⛔ BLOCKED: files changed but no verification logged in last 5 min." >&2
echo "Required: log a VERIFY_ACTION + VERIFIED entry, e.g.:" >&2
echo ' echo "$(date +%s)|VERIFY_ACTION|ran npm test, 3 failures" >> .claude/state/stop-verify.log' >&2
echo ' echo "$(date +%s)|VERIFIED" >> .claude/state/stop-verify.log' >&2
exit 2
Install
Drop into .claude/hooks/ and wire it up:
// .claude/settings.json
{
"hooks": {
"Stop": [{
"matcher": "*",
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/verify-before-stop.sh" }
]
}]
}
}
Restart your Claude Code session. Done.
How verification works in practice
When Claude tries to end after edits, the hook blocks with stderr telling it exactly what to log. The model then has to either:
# A) Actually verify
$ npm test
# ...test output...
$ echo "$(date +%s)|VERIFY_ACTION|npm test all green" >> .claude/state/stop-verify.log
$ echo "$(date +%s)|VERIFIED" >> .claude/state/stop-verify.log
# B) Admit it didn't verify
# Claude in next turn: "I couldn't actually run the tests — they require a
# dev DB. Here's what I changed; please run npm test manually before merging."
Either way, you stop getting false "done" reports.
4 design decisions worth defending
1. Why a 5-minute TTL on VERIFIED?
Long enough that legitimate verify→stop sequences don't trip on it. Short enough that stale entries from yesterday don't accidentally allow stops.
2. Why git diff as the change signal?
Catches both edits and untracked new files. Excludes .claude/state/ (the hook's own writes) to avoid self-triggering loops.
3. Why exit 2 (not 1)?
Claude Code treats exit 2 as a Stop block with stderr passed to the model. Exit 1 is treated as a generic error and may not surface the message correctly.
4. Why explicit VERIFY_ACTION + VERIFIED two-line pattern?
The VERIFY_ACTION line forces the model to commit what it verified in writing — not just claim it. Doubles as an audit trail you can grep later.
Watch out for the cap
Claude Code v2.1.143+ added a built-in safeguard: after ~8 consecutive Stop-hook blocks, the turn ends with a warning regardless. This is intentional — prevents broken hooks from infinite-looping the model. Override with CLAUDE_CODE_STOP_HOOK_BLOCK_CAP=20 if you have legitimate reason for more retries.
Design your hook to give the model enough information to comply on the next turn, not on the 8th turn.
What this saved me
After 6 months of using this hook across 14 projects:
- Eliminated "AI says tests pass, they didn't" regressions — the #1 source of "wait, I thought you said this worked" merges.
-
Forced explicit verification logging —
.claude/state/stop-verify.lognow reads like an audit trail. - Survived conversation compaction — the log file persists, so even after Claude Code summarizes the session, the verification record is still on disk.
Open source
Full source on GitHub: https://github.com/ianymu/claude-verify-before-stop
MIT license. Star it if you find it useful — easier to find when you need it again.
If you want the rest
I packaged this with 5 more hooks I built over those 12 months — cost-tracker.sh (logs every $ of Opus spend to costs.jsonl), block-secrets.sh (scans Write/Edit/Bash for sk-ant-*/JWT/AWS keys before commit), force-progress-update.sh (checkpoint every 5 actions to survive compaction), pre-compact-diary.sh (preserves WIP context), enforce-autoplan.sh (no code-write without a plan).
Full 6-pack is $49 launch price (regular $79), 30-day money-back, instant download via Polar: https://landing-ianymu.vercel.app
But honestly — verify-before-stop alone solves 80% of the pain. Start there. Use it free for a week, then decide if the rest is worth it.
Discussion
Curious if you've built similar workflow guards. The verify-before-stop pattern feels obvious in hindsight — what other "lies of completion" patterns have you caught?
Drop your own hooks in the comments. If they're solid I'll add them to the README with credit.
— Ian
Top comments (0)