Most developers stop at CLAUDE.md. That's layer 1. A production Claude Code harness needs 5 layers: memory, tools, permissions, hooks, and observability. Here's the full setup guide.
Claude Code harness has 5 layers:
- Memory — CLAUDE.md, MEMORY.md, .claude/commands/
- Tools — MCP servers (sweet spot: 2–3)
- Permissions — settings.json allow/deny lists
- Hooks — PreToolUse/PostToolUse verification
- Observability — Decision logging, cost tracking, anomaly detection
Most developers only have layer 1. Setup order: 1→4→2→3→5 (guardrails before capabilities).
Why? Because LangChain gained +13.7 benchmark points from harness changes alone — jumping from 52.8% to 66.5% on the same model.
Layer 1: Memory (The Foundation)
Your CLAUDE.md is the project rules file. Claude reads it every prompt and follows it consistently.
What goes in memory:
- CLAUDE.md — 40–60 lines max. Project context, conventions, constraints.
- MEMORY.md — Long-term learning. "We discovered X fails without Y."
- .claude/commands/ — Reusable prompt templates as commands.
The ETH Zurich finding: CLAUDE.md alone caps improvement at ~4%. It's necessary but not sufficient.
The HumanLayer benchmark: Teams keeping CLAUDE.md under 60 lines saw better compliance than those writing 200-line manifestos. Shorter = clearer.
# Example CLAUDE.md structure
## Project Identity
- Framework: Next.js 15 + TypeScript
- Package manager: pnpm
- Architecture: API routes + React components
## You Are
- A full-stack developer shipping features
- Opinionated about patterns: prefer hooks > HOCs
- Balancing speed with maintainability
## Rules
1. Always include tests when modifying /lib
2. Use conventional commits for all commits
3. If suggesting breaking changes, warn first
4. Database migrations need rollback logic
## Code Conventions
- Folder structure: /pages, /components, /lib, /styles
- Component naming: PascalCase for React files
- API routes: camelCase for endpoint handlers
## What NOT to do
- Don't refactor without atomic commits
- Don't add dependencies without checking bundle impact
- Don't commit .env files
Layer 2: Tools (Adding Capability)
Tools are MCP servers. Claude uses them to read files, run commands, query databases.
The HumanLayer finding: Too many tools cause agent confusion. Each tool is context overhead. Sweet spot: 2–3 MCP servers per project.
Not 20. Not "all available servers."
Which 2–3 tools?
- Filesystem tool — read/write/execute (almost always)
- One domain-specific tool — database, API, CLI
- Optional: Observability tool — logs, metrics
Example for a Next.js project:
- Filesystem (built-in)
- PostgreSQL client (query → fix migrations)
- GitHub API (check PR status → adjust approach)
More tools = more tokens + more decision fatigue for Claude.
Layer 3: Permissions (The Guardrails)
Permissions live in settings.json. Specify exactly what Claude is allowed to do.
Allowlist over denylist. It's safer to say "Claude can only modify these files" than "Claude cannot do X."
{
"permissions": {
"filesystem": {
"allow": [
"/src/**",
"/public/**",
"*.config.js",
".env.local"
],
"deny": [
"/node_modules/**",
"/.git/**",
"/build/**",
".env"
]
},
"execution": {
"allow": ["npm run test", "npm run build"],
"deny": ["rm -rf", "sudo *"]
}
}
}
Why this matters:
- Claude won't accidentally delete node_modules (been there)
- Can't run destructive commands without review
- Enforced at runtime, not a suggestion
Check settings.json into git. This becomes part of your project's DNA.
Layer 4: Hooks (Deterministic Enforcement)
Hooks are the most powerful layer. They run before and after Claude uses tools.
PreToolUse hook: Intercept tool calls, validate them, reject bad ones.
PostToolUse hook: Inspect results, catch anomalies, trigger alerts.
Boris Cherny, Anthropic, calls verification "the most important thing" for quality. Hooks are that verification.
#!/bin/bash
# Runs before every tool use
TOOL=$1
PARAMS=$2
case $TOOL in
"filesystem_write")
if echo "$PARAMS" | grep -E "(node_modules|\.git|\.env)" > /dev/null; then
echo "REJECTED: Protected path"
exit 1
fi
;;
"command_execute")
if echo "$PARAMS" | grep -E "(rm -rf|:(){ :|:)" > /dev/null; then
echo "REJECTED: Dangerous command"
exit 1
fi
;;
esac
echo "APPROVED"
exit 0
#!/bin/bash
# Runs after every tool use
TOOL=$1
RESULT=$2
DURATION=$3
if (( DURATION > 30 )); then
echo "⚠️ Slow tool: $TOOL took ${DURATION}s"
fi
if echo "$RESULT" | grep -i "error\|failed\|undefined"; then
echo "🔴 Tool failed: $(echo $RESULT | head -20)"
fi
Where to set hooks:
- .claude/hooks/pre-tool-use.sh
- .claude/hooks/post-tool-use.sh
Hooks are not bypassed. They're enforcement.
Layer 5: Observability (Learning from Decisions)
Observability means: logging decisions, tracking costs, detecting anomalies.
What to log:
- Which tools Claude called and why
- Tokens used per session (cost tracking)
- Time spent on each decision
- Failures and retries
The HumanLayer insight: Surface only failures, not 4,000 lines of passing tests.
Most developers log everything. Better: log strategically.
#!/bin/bash
# Log Claude's decisions
echo "$(date '+%Y-%m-%d %H:%M:%S') | Tool: $TOOL | Status: $STATUS | Tokens: $TOKENS | Duration: ${DURATION}s" >> .claude/logs/decisions.log
TOTAL_COST=$(grep "Tokens:" .claude/logs/decisions.log | awk '{sum+=$NF} END {print sum}')
if (( $(echo "$TOTAL_COST > 5.00" | bc -l) )); then
echo "💰 Cost alert: $TOTAL_COST USD today"
fi
ERROR_RATE=$(grep "FAILED" .claude/logs/decisions.log | wc -l)
if (( ERROR_RATE > 5 )); then
echo "🚨 High error rate detected: $ERROR_RATE failures in last hour"
fi
Setup Order Matters: 1 → 4 → 2 → 3 → 5
Why not 1 → 2 → 3 → 4 → 5?
Wrong order: Capabilities before guardrails
- Build CLAUDE.md ✅
- Add 10 MCP servers ⚠️
- Grant all permissions ⚠️
- No hooks (too late, broke things already)
- Now add observability (chaos already happened)
Right order: Guardrails first
- Build CLAUDE.md ✅ (memory/rules)
- Add hooks ✅ (enforcement before tools exist)
- Add 2–3 MCP servers ✅ (now hooks guard them)
- Restrict permissions ✅ (layered safety)
- Add observability ✅ (track what's working)
Adding hooks after tools is like adding seatbelts after the crash.
Production-Ready Harness: 10-Item Checklist
- [ ] CLAUDE.md exists, 40–60 lines, checked into git
- [ ] MEMORY.md setup with "lessons learned"
- [ ] .claude/commands/ has 3+ reusable prompts
- [ ] Max 3 MCP servers chosen and documented
- [ ] settings.json has allowlist (filesystem, execution)
- [ ] .claude/hooks/pre-tool-use.sh validates calls
- [ ] .claude/hooks/post-tool-use.sh inspects results
- [ ] .claude/logs/ directory exists + observability hook running
- [ ] Cost tracking implemented (tokens/session)
- [ ] Team knows where each file lives + how to update it
FAQ
Which layer do I need first?
Layer 1 (CLAUDE.md). Everything depends on clear memory. Start there.
Does this harness slow down Claude Code?
No. Hooks add ~100–300ms per tool use. Worth it for the safety. Observability has negligible cost.
What are the most important hooks?
PreToolUse (validation) and PostToolUse (anomaly detection). Those two prevent 80% of issues.
How many MCP servers is "too many"?
More than 5 becomes noise. More than 3 means you're probably adding tools you won't use. Start with 1–2, add more only when they solve a real workflow problem.
Can I skip permissions and just use hooks?
Technically yes, but no. Permissions are defense-in-depth. Hooks catch mistakes. Permissions prevent them.
How do I update CLAUDE.md over time?
Document it in MEMORY.md. "We added this rule because X failed." Over time, CLAUDE.md stabilizes.
Originally published on ShipWithAI. I write about Claude Code workflows, AI-assisted development, and building production systems with AI. Full blog + templates at shipwithai.io.
What's your harness score? Drop it in the comments. Do you have all 5 layers, or are you still at layer 1?
Top comments (0)