I was running the most sophisticated CLAUDE.md I’d ever seen — a structured review pipeline that forces Claude through architecture, code quality, tests, and performance checks before touching anything. On paper it was brilliant. In practice I’d given myself a part-time job as a QA engineer for a robot.
So I ripped it out.
TL;DR: There are 3 layers to controlling Claude Code: review (catching mistakes after), enforcement (blocking mistakes during), and intent (preventing mistakes before). Most devs — including YC’s CEO — are stuck on layer 1. The real gains live in layer 3. Full Prompt Contracts breakdown.

The Prompt That Broke My Week
That review pipeline? It wasn’t mine. It came from Garry Tan — as in, the CEO of Y Combinator. He shared his Claude Code prompt on X a few weeks ago and it went viral. 3,000 likes. Dev Twitter lost its mind. And I get why — the man runs Y Combinator and still writes his own CLAUDE.md. That alone puts him ahead of 90% of tech CEOs who “use AI” meaning they dictate to an assistant who then types into ChatGPT.
His setup forces Claude into Plan Mode with a structured review pipeline: architecture first, code quality second, tests third, performance fourth. Each section surfaces up to 4 issues. Each issue gets 2–3 options with tradeoffs. Claude numbers everything, letters the options, and waits for your explicit go-ahead before touching a single line.
It’s thorough. It’s disciplined. It’s basically a senior engineer’s PR review checklist, automated.
I ran it on a Convex project I was refactoring — a scheduling engine with Clerk auth and Supabase for the parts Convex doesn’t cover. And honestly? The architecture review caught a coupling issue between my auth middleware and my API routes that I’d been pretending didn’t exist for a week. The performance pass found two N+1 queries I genuinely didn’t know about. Good stuff.
So why’d I rip it out?
Because after 48 hours I realized I’d become a full-time reviewer of Claude reviewing my code. Four sections. Up to 4 issues each. 2–3 options per issue. Each requiring my explicit “yes, option B, proceed.” On a medium feature, that’s 30–45 minutes of back-and-forth before anything changes. It’s like hiring a contractor then spending your entire day watching them through a window, nodding at every nail.
Garry Tan built a code review system for his AI. Which means he’s using AI to generate more review work for himself 🙃
The review treadmill is seductive because it feels productive. You’re reading, evaluating, making decisions. But you’re doing all of it downstream — after Claude already picked a direction. You’re course-correcting a car that’s already driving.
And the kicker? Claude doesn’t remember your corrections next session anyway.
Three Layers, One Hierarchy
After ditching Garry’s prompt I didnt go back to vibes. I’ve been running Claude Code on production SaaS for months and I’ve seen every failure mode — the silent .env deletion, the midnight Supabase schema swap, the classic “I optimized your code by removing the part that worked.” What I’ve landed on is a hierarchy.
Layer 1: Review (Garry Tan’s approach) Let Claude code, then check the output. Structured review, Plan Mode, “ask me before you breathe.” Catches mistakes after they exist in your codebase.
Layer 2: Enforcement (Hooks) Shell scripts that fire automatically when Claude does something. Format on save, block dangerous commands, run typechecks. Catches mechanical mistakes during execution. No human attention required.
Layer 3: Intent (Prompt Contracts) You shape what Claude understands about your project before it writes anything. Architecture decisions, naming conventions, the WHY behind your stack. Prevents mistakes from being conceived.
Most devs live on Layer 1.
Some power users bolt on Layer 2.
Almost nobody builds Layer 3 properly — which is wild, because that’s where the compounding returns live.

Layer 2: The One That Pays For Itself Overnight
I know — I just told you Layer 3 is the endgame. But Layer 2 is where the fastest wins live, because it’s pure automation. Set it up once, never think about it again. Like a smoke detector, except instead of fire it detects Claude trying to rm -rfyour project root.
The concept: your CLAUDE.md is the style guide. Hooks are the linter. One suggests, the other enforces.
My actual .claude/settings.json setup (partial):
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": ".claude/hooks/bash-firewall.sh"
}
]
},
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": ".claude/hooks/protect-sensitive-files.sh"
}
]
}
],
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "jq -r '.tool_input.file_path' | xargs npx prettier --write"
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "osascript -e 'display notification \\"Claude needs your attention\\" with title \\"Claude Code\\"'"
}
]
}
]
}
}
And the bash firewall that saves my life about once a week:
#!/usr/bin/env bash
set -euo pipefail
cmd=$(jq -r '.tool_input.command // ""')
deny_patterns=(
'rm\\s+-rf\\s+/'
'git\\s+reset\\s+--hard'
'git\\s+push\\s+--force'
'DROP\\s+TABLE'
'DELETE\\s+FROM'
)
for pat in "${deny_patterns[@]}"; do
if echo "$cmd" | grep -Eiq "$pat"; then
echo "Blocked: matches denied pattern '$pat'." 1>&2
exit 2
fi
done
exit 0
The file protection hook is the one I added after Claude decided my .env.local was "unused" and tried to clean it up. Once. That was enough. You know that feeling when you watch your prod credentials disappear in real time? Yeah. Now Claude physically cannot touch .env*, package-lock.json, or anything in .git/.
#!/usr/bin/env bash
set -euo pipefail
file=$(jq -r '.tool_input.file_path // ""')
deny_globs=(".env\*" "package-lock.json" ".git/\*")
for g in "${deny_globs[@]}"; do
if printf '%s\\n' "$file" | grep -Eiq "^${g//\\\*/.\*}$"; then
echo "Edits to '$file' are blocked." 1>&2
exit 2
fi
done
exit 0
The Stop hook with macOS notification sounds dumb until you’re running 3 Claude sessions in parallel and one of them’s been sitting idle for god knows how long while you’re deep in another terminal. That dead time adds up fast when you’re juggling branches.
The PostToolUse Prettier hook killed most of the formatting nitpicks on my PRs.
Not because Claude writes ugly code — it usually doesn’t — but “usually” is not a word you want anywhere near your formatting standards.
None of this is glamorous. Nobody’s writing viral tweets about "matcher": "Edit|Write" configs. But every hook you add is one less thing to remember, one less mistake to catch manually, one less "oh god what did it just do" moment. They compound silently while you sleep. Like compound interest, except its useful.
Layer 3: Where I Actually Disagree With Garry Tan
This is the layer that turned my Claude Code output from “impressive but needs babysitting” to “I scan the diff for 5 minutes and ship.”
I wrote a full breakdown of Prompt Contracts a few weeks ago — that article covers the theory and the original framework in detail. What I want to talk about here is the specific gap in Garry’s approach that Prompt Contracts fill.
Garry’s prompt tells Claude: “Review this plan. Flag DRY violations. Present options. Ask before proceeding.”
That’s Layer 1 thniking applied with Layer 1 sophistication. It’s the best version of “check your work after you’re done.” But it doesn’t address WHY Claude makes the wrong choices in the first place.
A Convex example from last week.
I have a project where mutations follow a strict naming convention: [domain].[action].[target]. So billing.create.invoice, auth.verify.session, scheduling.update.slot. The frontend team greps for these patterns in their API layer. Break the pattern, break their workflow.
With Garry’s prompt: Claude writes a mutation called createNewInvoice. The architecture review catches it as a "code organization concern." I pick option B ("rename to follow convention"), Claude renames it. 10 minutes spent on something that should never have happened.
With a Prompt Contract: Claude sees in my CLAUDE.md — “Convex mutations follow [domain].[action].[target] format. The frontend greps for billing.* to auto-generate their API client. Breaking this pattern breaks the build pipeline."
Claude writes billing.create.invoice the first time.
Zero review needed.
One approach: you catch mistakes and fix them.
Other approach: the mistakes don’t happen because Claude gets the constraint. The difference compounds across every feature, every day, every session.
And before someone drops the “just give Claude context and respect instead of constraints 🤓” line — yeah, the WHY behind each decision IS the context. Saying “we use Clerk because we need webhook-based session sync across 3 services” IS giving Claude understanding. Prompt Contracts aren’t threats. They’re architecture documentation that happens to live in a file the AI reads on startup. But I digress.
The gap between “review AI output carefully” and “shape AI intent so theres less to review” — that’s where the 10x lives.
What This Looks Like in Practice
My actual workflow on a new feature, right now, today:
- CLAUDE.md has my stack decisions with the reasoning, naming conventions, file structure rules, and a “things you will be tempted to do but shouldn’t” section (Layer 3)
- Hooks handle formatting, type checking, destructive command blocking, and notifications (Layer 2)
- I scan the diff for ~5 minutes. Occasionally Claude still surprises me. When it does, I add one line to CLAUDE.md so it never happens again (feeding Layer 3)
- Plan Mode / structured review: I use it maybe once a week. Big architectural decisions, complex refactors. Not for every feature. (Layer 1, sparingly)
With Garry’s prompt, step 4 was the entire workflow. Every feature. Every time. That’s a 30–45 minute tax on every ship cycle, vs my current ~5 minutes.
I don’t have clean before/after metrics — I wasn’t tracking revert rates when I started, because who does. What I can tell you is the feel of the workflow changed completely. Before: every git log had 2-3 "revert: undo Claude's helpful suggestion" commits per day. Now: maybe one a week, and it's usually because my Prompt Contract has a gap I haven't patched yet. Each revert teaches me where to tighten Layer 3, and the overall trend is down.
The Part Nobody Wants to Hear
Garry Tan’s prompt is great. It is legitimately the best structured review setup I’ve seen shared publically. If you’re doing nothing right now — no CLAUDE.md, no hooks, just “vibe coding” and praying — go use his prompt. Seriously. It’s a massive upgrade over nothing.
But it’s designed for a world where you don’t trust your AI enough to let it work unsupervised. And here’s the irony: the reason you can’t trust it is because you haven’t given it the context it needs to be trustworthy. The fix isn’t more review. Its better input.
The prompt isn’t the bottleneck. Your intent is.
The full Prompt Contracts framework is worth reading if you want the deep dive. Next week I’m publishing my actual complete CLAUDE.md — the one I use on every project, with every section explained. Follow if you don’t want to miss it.
And if you’re running Garry’s prompt and shipping well: keep going. Genuinely. But when the review treadmill starts feeling like a part-time job, you’ll know where the upgrade is. 🦞
Top comments (0)