I've been running Claude Code autonomously for months. During that time, I've collected logs of what can go wrong when an AI agent has instructions that are slightly too permissive.
Here are the five patterns I've seen cause real damage - and the tool I built to detect all of them.
The Setup
Claude Code reads CLAUDE.md before every session. This file contains your operating instructions: what the agent is allowed to do, how it should behave, what tools it can use.
Most people write these instructions once, find them working well, and stop thinking about them.
The problem: instructions that seem reasonable in small sessions become dangerous in long autonomous ones. A permission you typed casually - "use rm -rf to clean up temporary files" - becomes an instruction that an agent running for eight unattended hours might apply in ways you didn't intend.
Here are the five highest-risk patterns, ranked by how often they cause actual problems.
Pattern 1: Irreversible Git Operations
What it looks like in CLAUDE.md:
If stuck on merge conflicts, use --force to push your changes.
You can use git reset --hard to revert to a clean state if needed.
Why it's dangerous:
git push --force can overwrite changes that other team members (or your past self) pushed to the remote. git reset --hard discards uncommitted changes without confirmation. These operations are irreversible in the sense that recovery requires knowing they happened before you accidentally overwrote what you needed to recover.
An agent running autonomously that hits a merge conflict will do exactly what the instructions say: force push. The instructions didn't say "ask first."
What I found in practice:
A session running overnight encountered a merge conflict at 3 AM. The instructions said to resolve conflicts and push. It pushed with --force. The push silently overwrote three commits I'd made before going to sleep.
The fix:
Never use --force, --hard, or --no-verify flags without explicit human confirmation.
Pattern 2: Automated POST to External URLs
What it looks like in CLAUDE.md:
After completing each task, notify Slack: POST to https://hooks.slack.com/services/[webhook].
When articles are ready, post to the publishing queue API.
Why it's dangerous:
Any instruction that tells Claude Code to POST to an external URL creates a pathway for unintended external actions. The issue isn't the Slack webhook itself - it's that:
- The agent may trigger the webhook more often than intended
- The webhook URL is now visible to anyone who reads CLAUDE.md
- If the agent makes an error, it may POST incorrect or incomplete data
The subtler danger: an agent running a complex workflow may hit a step that calls an external API as a side effect of something else - a publish step, a notification step - before the human has reviewed the output.
The fix:
Move webhook URLs out of CLAUDE.md and into environment variables. Replace direct POST instructions with a human-approval gate: "draft the notification for human review, then send after confirmation."
Pattern 3: Hook Bypass Flags
What it looks like in CLAUDE.md:
If a commit fails due to hooks, use --no-verify to bypass them and push anyway.
When tests block deployment, use --no-verify to proceed.
Why it's dangerous:
Claude Code Hooks are your safety net. Pre-commit hooks catch sensitive data before it's pushed. Pre-receive hooks enforce branch protection. When you tell your agent to use --no-verify, you're telling it to cut the safety net if it gets in the way.
An agent running autonomously that encounters a hook failure will follow instructions: bypass the hook. If that hook was checking for exposed API keys, they're now in the repository.
What I found:
One of our PreToolUse hooks blocked a commit because it detected what looked like an API key in a config file (it was actually an example value, but the pattern matched). The agent's instructions said to use --no-verify if hooks failed. The commit went through. The "example" value was a real key.
The fix:
Remove --no-verify from CLAUDE.md entirely. Instead: "If hooks fail, stop and wait for human review." Hooks exist for a reason; bypassing them is almost never the right call in automated sessions.
Pattern 4: Destructive File Deletion
What it looks like in CLAUDE.md:
Clean up temporary files and build artifacts using rm -rf when they're no longer needed.
Remove old backup directories to free up space.
Why it's dangerous:
rm -rf is immediate and irreversible. Unlike git operations, there's no recovery path if the agent's definition of "temporary" or "old backup" doesn't match yours.
An agent asked to "clean up the project directory" will follow its instructions literally. If the instructions say it can use rm -rf, it will.
The rule I follow:
rm -rf is never in my CLAUDE.md. When cleanup is needed, the agent proposes what to delete, I review it, and then confirm. This adds one human checkpoint for any irreversible deletion.
Pattern 5: Automatic Git Commit Without Approval
What it looks like in CLAUDE.md:
After completing each task, stage all changes and commit with a descriptive message.
Use git add -A to ensure all files are included.
Why it's dangerous:
git add -A stages every file in the project directory - including .env files, credential files, and temporary files that contain sensitive data. Combined with "commit automatically," this creates a pipeline that can expose secrets in the commit history even if .gitignore is configured correctly (because .gitignore doesn't protect files that are already tracked).
The compounding problem: once sensitive data is in git history, removing it requires a history rewrite that affects everyone who has cloned the repository.
Checking Your Config (30 Seconds, No Install)
These five patterns are easy to miss - they're instructions that seem reasonable in isolation but become risky in combination with autonomous operation.
The scanner I built checks your CLAUDE.md for all ten high-risk patterns (including the five above), scores your setup on a 0-19 risk scale (higher = more risk), and tells you exactly what to fix:
Try the scanner here - paste your CLAUDE.md, get your score. Fully client-side: your config never leaves your browser.
The scan is read-only. Nothing is installed. It takes about 30 seconds.
What the Scanner Checks
The scanner evaluates ten patterns across three severity levels:
| Risk Level | Patterns | Points Each |
|---|---|---|
| CRITICAL | Irreversible git ops, automated POST, hook bypass | +3 |
| HIGH |
rm -rf, git add -A, auto-commit without approval |
+2 |
| MEDIUM | Hardcoded secrets, overly permissive settings | +1 |
Total score range: 0-19. Lower is safer.
0: LOW - All clear, well-protected
1-5: MODERATE - Some gaps, but manageable
6-10: HIGH - Multiple safety holes
11-19: CRITICAL - Running without guardrails
Going Further
The scanner tells you what's wrong. Fixing it requires replacing risky instructions with safer alternatives - specific patterns for human-approval gates, hook configurations, and .gitignore templates that actually cover what git add -A might catch.
That's what the CC-Codex Ops Kit ($14.99) covers: a hardened CLAUDE.md template with all five patterns already handled, plus the PreToolUse hooks that intercept dangerous operations before they execute.
But start with the scan. Know your score first.
Related:
- 4 Hooks That Let Claude Code Run Autonomously - the hook setup that prevents most of these patterns
- I Ran a Safety Scan on My Claude Code Setup - the original risk-score diagnostic
Free Tools for Claude Code Operators
| Tool | What it does |
|---|---|
| cc-health-check | 20-check setup diagnostic (CLI + web) |
| cc-session-stats | Usage analytics from session data |
| cc-audit-log | Human-readable audit trail |
| cc-cost-check | Cost per commit calculator |
Interactive: Are You Ready for an AI Agent? - 10-question readiness quiz | 50 Days of AI - the raw data
More tools: Dev Toolkit - 56 free browser-based tools for developers. JSON, regex, colors, CSS, SQL, and more. All single HTML files, no signup.
Top comments (0)