You wrote it in CLAUDE.md. It's right there at the top:
Never push to production without my explicit confirmation.
Claude read it. Confirmed it understood. Then pushed to production without asking.
This happens to developers every week. And most of them assume it's a bug.
It is not a bug.
Why CLAUDE.md instructions don't equal CLAUDE.md behavior
When Claude Code reads your CLAUDE.md, it does not parse it the way a compiler parses a config file. It reads it the way a person reads a memo — then weighs it against everything else in context.
Your instruction is not a hard constraint. It is an input. A preference with weight.
Claude evaluates:
- What did the user ask for this turn?
- What does CLAUDE.md say?
- What does the immediate context suggest?
- What behavior is most likely to complete the task?
If the current task context pulls in a different direction, lower-weight instructions lose. Silently.
The rule "never push without confirmation" competes with "the user asked me to deploy this and seems to expect it to happen." When context is strong and the rule is weak, the rule loses.
The three patterns that cause silent rule violations
Pattern 1: Vague permission framing
Don't deploy without asking me first.
This is interpretable. "Asking" could mean many things. "First" is ambiguous. The agent reads this as a soft preference and makes a judgment call.
Pattern 2: Implicit scope
You wrote the rule once at the top of CLAUDE.md. Halfway through a long session, the context window has compacted. The rule is still technically present — but its weight in the model's attention has dropped.
Pattern 3: Competing instructions
Your user message says "finish the deployment." Your CLAUDE.md says "ask before deploying." One of these is explicit and present. The other is background context. Explicit wins.
What actually works
The rules that survive are specific, unconditional, and use language that leaves no room for interpretation:
Weak:
Don't push to production without my approval.
Stronger:
HARD RULE — NO EXCEPTIONS:
Do not run any deployment command (vercel deploy, railway up, fly deploy,
git push to main/prod) without first outputting:
"DEPLOYMENT PAUSE: Ready to deploy [description]. Confirm with: yes, deploy"
Wait for explicit confirmation before proceeding.
If you are uncertain whether an action is a deployment, treat it as one.
The difference:
- Explicit trigger condition
- Exact output required before pausing
- No ambiguity about what counts as deployment
- Explicit fallback behavior for edge cases
Rule placement matters more than you think
Claude Code reads your CLAUDE.md at session start. If your rules are buried after 200 lines of project context, they have less weight in the initial context than rules placed at the top.
For any rule that must survive context compaction and session length:
- Place it in the first 30 lines of CLAUDE.md
- Use ALL CAPS headers (not because it looks important — because it signals the instruction type)
- Repeat critical safety rules in a dedicated
## SAFETY RULESsection
The compaction problem
In long sessions, Claude Code compacts earlier context to preserve the active window. Your CLAUDE.md rules were injected at session start. By hour 3, compaction may have summarized them.
Mitigation:
- Split long sessions before context window fills (use
/compactmanually) - Keep CLAUDE.md under 300 lines — longer files get summarized less accurately
- Re-state critical rules in your message when the stakes are high
CLAUDE.md is not a safety system
This is the uncomfortable truth: CLAUDE.md is a behavioral guide, not an access control layer. If you need a hard guarantee that a command won't run, you need something outside the model — a wrapper script, a confirmation hook, or a deployment pipeline that requires human input at the infrastructure level.
CLAUDE.md can reduce unwanted behavior dramatically. Written correctly, it will stop most violations. But it cannot guarantee zero violations in all edge cases.
Use it as the first defense. Build real constraints at the system level for anything where failure is unacceptable.
The practical checklist
Before your next session:
- [ ] Production safety rules are in the first 30 lines of CLAUDE.md
- [ ] Each rule specifies exactly what to output before pausing
- [ ] No rule uses vague framing ("ask me first", "check before doing")
- [ ] You have a
## HARD RULES — NO EXCEPTIONSsection for critical constraints - [ ] CLAUDE.md is under 300 lines (if not, refactor)
- [ ] For actual zero-trust: confirm via infrastructure, not instructions
If you want a complete set of tested rules that handle production safety, scope creep, file boundaries, and confirmation flows — the CLAUDE.md Rules Pack ($27) includes 50+ production-tested rules and the structure logic behind them.
New to this? Start with the free starter — a working CLAUDE.md + cursor rules baseline, no cost.
The gap between "Claude read the rule" and "Claude followed the rule" is a design problem. The fix is in how you write rules, not in trusting that writing them was enough.
Top comments (0)