DEV Community

Cover image for Beyond CLAUDE.md: 5 Layers Your AI Agent Harness Is Missing
ShipWithAI
ShipWithAI

Posted on • Originally published at shipwithai.io

Beyond CLAUDE.md: 5 Layers Your AI Agent Harness Is Missing

Most developers stop at CLAUDE.md. That's layer 1. A production Claude Code harness needs 5 layers: memory, tools, permissions, hooks, and observability. Here's the full setup guide.

Claude Code harness has 5 layers:

  1. Memory — CLAUDE.md, MEMORY.md, .claude/commands/
  2. Tools — MCP servers (sweet spot: 2–3)
  3. Permissions — settings.json allow/deny lists
  4. Hooks — PreToolUse/PostToolUse verification
  5. Observability — Decision logging, cost tracking, anomaly detection

Most developers only have layer 1. Setup order: 1→4→2→3→5 (guardrails before capabilities).

Why? Because LangChain gained +13.7 benchmark points from harness changes alone — jumping from 52.8% to 66.5% on the same model.


Layer 1: Memory (The Foundation)

Your CLAUDE.md is the project rules file. Claude reads it every prompt and follows it consistently.

What goes in memory:

  • CLAUDE.md — 40–60 lines max. Project context, conventions, constraints.
  • MEMORY.md — Long-term learning. "We discovered X fails without Y."
  • .claude/commands/ — Reusable prompt templates as commands.

The ETH Zurich finding: CLAUDE.md alone caps improvement at ~4%. It's necessary but not sufficient.

The HumanLayer benchmark: Teams keeping CLAUDE.md under 60 lines saw better compliance than those writing 200-line manifestos. Shorter = clearer.

# Example CLAUDE.md structure

## Project Identity
- Framework: Next.js 15 + TypeScript
- Package manager: pnpm
- Architecture: API routes + React components

## You Are
- A full-stack developer shipping features
- Opinionated about patterns: prefer hooks > HOCs
- Balancing speed with maintainability

## Rules
1. Always include tests when modifying /lib
2. Use conventional commits for all commits
3. If suggesting breaking changes, warn first
4. Database migrations need rollback logic

## Code Conventions
- Folder structure: /pages, /components, /lib, /styles
- Component naming: PascalCase for React files
- API routes: camelCase for endpoint handlers

## What NOT to do
- Don't refactor without atomic commits
- Don't add dependencies without checking bundle impact
- Don't commit .env files
Enter fullscreen mode Exit fullscreen mode

Layer 2: Tools (Adding Capability)

Tools are MCP servers. Claude uses them to read files, run commands, query databases.

The HumanLayer finding: Too many tools cause agent confusion. Each tool is context overhead. Sweet spot: 2–3 MCP servers per project.

Not 20. Not "all available servers."

Which 2–3 tools?

  1. Filesystem tool — read/write/execute (almost always)
  2. One domain-specific tool — database, API, CLI
  3. Optional: Observability tool — logs, metrics

Example for a Next.js project:

  • Filesystem (built-in)
  • PostgreSQL client (query → fix migrations)
  • GitHub API (check PR status → adjust approach)

More tools = more tokens + more decision fatigue for Claude.


Layer 3: Permissions (The Guardrails)

Permissions live in settings.json. Specify exactly what Claude is allowed to do.

Allowlist over denylist. It's safer to say "Claude can only modify these files" than "Claude cannot do X."

{
  "permissions": {
    "filesystem": {
      "allow": [
        "/src/**",
        "/public/**",
        "*.config.js",
        ".env.local"
      ],
      "deny": [
        "/node_modules/**",
        "/.git/**",
        "/build/**",
        ".env"
      ]
    },
    "execution": {
      "allow": ["npm run test", "npm run build"],
      "deny": ["rm -rf", "sudo *"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Why this matters:

  • Claude won't accidentally delete node_modules (been there)
  • Can't run destructive commands without review
  • Enforced at runtime, not a suggestion

Check settings.json into git. This becomes part of your project's DNA.


Layer 4: Hooks (Deterministic Enforcement)

Hooks are the most powerful layer. They run before and after Claude uses tools.

PreToolUse hook: Intercept tool calls, validate them, reject bad ones.
PostToolUse hook: Inspect results, catch anomalies, trigger alerts.

Boris Cherny, Anthropic, calls verification "the most important thing" for quality. Hooks are that verification.

#!/bin/bash
# Runs before every tool use

TOOL=$1
PARAMS=$2

case $TOOL in
  "filesystem_write")
    if echo "$PARAMS" | grep -E "(node_modules|\.git|\.env)" > /dev/null; then
      echo "REJECTED: Protected path"
      exit 1
    fi
    ;;
  "command_execute")
    if echo "$PARAMS" | grep -E "(rm -rf|:(){ :|:)" > /dev/null; then
      echo "REJECTED: Dangerous command"
      exit 1
    fi
    ;;
esac

echo "APPROVED"
exit 0
Enter fullscreen mode Exit fullscreen mode
#!/bin/bash
# Runs after every tool use

TOOL=$1
RESULT=$2
DURATION=$3

if (( DURATION > 30 )); then
  echo "⚠️  Slow tool: $TOOL took ${DURATION}s"
fi

if echo "$RESULT" | grep -i "error\|failed\|undefined"; then
  echo "🔴 Tool failed: $(echo $RESULT | head -20)"
fi
Enter fullscreen mode Exit fullscreen mode

Where to set hooks:

  • .claude/hooks/pre-tool-use.sh
  • .claude/hooks/post-tool-use.sh

Hooks are not bypassed. They're enforcement.


Layer 5: Observability (Learning from Decisions)

Observability means: logging decisions, tracking costs, detecting anomalies.

What to log:

  • Which tools Claude called and why
  • Tokens used per session (cost tracking)
  • Time spent on each decision
  • Failures and retries

The HumanLayer insight: Surface only failures, not 4,000 lines of passing tests.

Most developers log everything. Better: log strategically.

#!/bin/bash
# Log Claude's decisions

echo "$(date '+%Y-%m-%d %H:%M:%S') | Tool: $TOOL | Status: $STATUS | Tokens: $TOKENS | Duration: ${DURATION}s" >> .claude/logs/decisions.log

TOTAL_COST=$(grep "Tokens:" .claude/logs/decisions.log | awk '{sum+=$NF} END {print sum}')
if (( $(echo "$TOTAL_COST > 5.00" | bc -l) )); then
  echo "💰 Cost alert: $TOTAL_COST USD today"
fi

ERROR_RATE=$(grep "FAILED" .claude/logs/decisions.log | wc -l)
if (( ERROR_RATE > 5 )); then
  echo "🚨 High error rate detected: $ERROR_RATE failures in last hour"
fi
Enter fullscreen mode Exit fullscreen mode

Setup Order Matters: 1 → 4 → 2 → 3 → 5

Why not 1 → 2 → 3 → 4 → 5?

Wrong order: Capabilities before guardrails

  1. Build CLAUDE.md ✅
  2. Add 10 MCP servers ⚠️
  3. Grant all permissions ⚠️
  4. No hooks (too late, broke things already)
  5. Now add observability (chaos already happened)

Right order: Guardrails first

  1. Build CLAUDE.md ✅ (memory/rules)
  2. Add hooks ✅ (enforcement before tools exist)
  3. Add 2–3 MCP servers ✅ (now hooks guard them)
  4. Restrict permissions ✅ (layered safety)
  5. Add observability ✅ (track what's working)

Adding hooks after tools is like adding seatbelts after the crash.


Production-Ready Harness: 10-Item Checklist

  • [ ] CLAUDE.md exists, 40–60 lines, checked into git
  • [ ] MEMORY.md setup with "lessons learned"
  • [ ] .claude/commands/ has 3+ reusable prompts
  • [ ] Max 3 MCP servers chosen and documented
  • [ ] settings.json has allowlist (filesystem, execution)
  • [ ] .claude/hooks/pre-tool-use.sh validates calls
  • [ ] .claude/hooks/post-tool-use.sh inspects results
  • [ ] .claude/logs/ directory exists + observability hook running
  • [ ] Cost tracking implemented (tokens/session)
  • [ ] Team knows where each file lives + how to update it

FAQ

Which layer do I need first?
Layer 1 (CLAUDE.md). Everything depends on clear memory. Start there.

Does this harness slow down Claude Code?
No. Hooks add ~100–300ms per tool use. Worth it for the safety. Observability has negligible cost.

What are the most important hooks?
PreToolUse (validation) and PostToolUse (anomaly detection). Those two prevent 80% of issues.

How many MCP servers is "too many"?
More than 5 becomes noise. More than 3 means you're probably adding tools you won't use. Start with 1–2, add more only when they solve a real workflow problem.

Can I skip permissions and just use hooks?
Technically yes, but no. Permissions are defense-in-depth. Hooks catch mistakes. Permissions prevent them.

How do I update CLAUDE.md over time?
Document it in MEMORY.md. "We added this rule because X failed." Over time, CLAUDE.md stabilizes.


Originally published on ShipWithAI. I write about Claude Code workflows, AI-assisted development, and building production systems with AI. Full blog + templates at shipwithai.io.

What's your harness score? Drop it in the comments. Do you have all 5 layers, or are you still at layer 1?

Top comments (0)