Harness engineering is everything around your AI agent except the model: memory, tools, permissions, hooks, observability. LangChain gained 13.7 benchmark points changing only the harness. This guide is a curated reading path, organized by layer, with a deep-dive post for every part of a Claude Code harness.
Layer 1 only (what most devs have)
→ Advice the model may ignore
All 5 layers (Memory → Tools →
→ Enforcement the model
Permissions → Hooks → Observability)
cannot bypass
LangChain jumped from 52.8% to 66.5% on Terminal Bench 2.0 by changing only the harness. Same model. 13.7 points of pure architecture gain (LangChain Blog, Feb 2026). Most Claude Code users stop at Layer 1. This guide is the reading path to the other four.
If you want the theory of harness engineering, read the pillar post. If you want the architecture deep-dive, read the 5 layers post. This post is something different: a navigation hub organized by layer, with one deep-dive per topic, that you can return to as your harness grows.
What is Claude Code harness engineering?
Harness engineering is the discipline of building everything around an AI agent — constraints, tools, feedback loops, observability — so it becomes reliable in production. For Claude Code, the harness is five layers: Memory (CLAUDE.md), Tools (MCP), Permissions (settings.json), Hooks (PreToolUse/PostToolUse), and Observability (session logs).
The formula: Agent = Model + Harness (Martin Fowler, Apr 2026).
The model is commodity. Every team on Sonnet 4.6 or Opus 4.7 gets the same raw capability. Your harness is what differentiates your team's output.
What are the 5 layers of a Claude Code harness?
| Layer | Purpose | Claude Code File |
|---|---|---|
| 1. Memory | What the agent knows | CLAUDE.md, MEMORY.md |
| 2. Tools | What it can reach | settings.json (MCP) |
| 3. Permissions | What it's allowed to do | settings.json allow/deny |
| 4. Hooks | What's enforced at runtime | PreToolUse/PostToolUse |
| 5. Observability | What you can see afterward | Session logs, cost tracking |
Layer 1: What does your agent know before you type?
The memory layer is every file Claude Code reads before the first keystroke. CLAUDE.md holds your project rules. MEMORY.md holds the evolving state. Most developers ship only a CLAUDE.md and treat it as a wishlist of aspirations.
Your AI Agent Forgets Everything. Here's the Fix. — MEMORY.md is a 200-line index that Claude reads at session start. Setup takes 5 minutes. Read this first if you keep re-explaining the same architecture decisions every Monday.
Your CLAUDE.md Is an Instruction File. It Should Be a Failure Log. — Mitchell Hashimoto's AGENTS.md in Ghostty has zero aspirational lines. Every entry traces to a real agent mistake. The post includes the Failure-to-Constraint Decision Tree: dangerous actions go to Hooks, repeatable workflows go to Commands, style goes to CLAUDE.md.
Layer 4: What can the agent NOT do?
Hooks are the enforcement layer. Memory is advice. Hooks are law. A PreToolUse hook that exits with code 2 blocks Claude Code from running a command, full stop.
# PreToolUse hook: 6 lines that save you from yourself
if [[ "$TOOL_INPUT" == *"DROP TABLE"* ]] && [[ "$ENV" == "production" ]]; then
echo "BLOCKED: destructive SQL in production" >&2
exit 2
fi
exit 0
Which Claude Code Hook Do You Need? A Decision Guide — The 4 handler types (Deny, Log, Transform, Enrich), when to reach for PreToolUse vs PostToolUse, and which 3 hooks every production setup should have.
A PreToolUse hook exiting with code 2 is the only mechanism in Claude Code that unconditionally blocks a tool call. Instructions in CLAUDE.md can still be overridden by context or model reasoning. Hooks cannot be bypassed.
Layer 5: How do you know what your agent actually did?
Observability turns "my agent did something weird" into a reproducible bug report. One of LangChain's three harness improvements was a verification middleware that made the agent check its own work before marking a task complete.
Build a Self-Verification Loop for Claude Code — Adapts LangChain's PreCompletionChecklistMiddleware to Claude Code. Boris Cherny (creator of Claude Code) calls verification "probably the most important thing" for quality.
LangChain's three improvements mapped to layers: context injection (Layer 1), self-verification loops (Layer 5), and compute allocation (Layer 5). No single layer explained the full +13.7 point gain. They needed three layers working together.
Why does this actually work?
Three independent data points prove constraints beat capability:
- LangChain: +13.7 on Terminal Bench 2.0 with harness changes only
- OpenAI Codex: ~1 million lines of production code, zero human-written lines over five months, all inside heavily constrained harness environments
- Mitchell Hashimoto's Ghostty: every AGENTS.md line is a prevented failure
The Constraint Paradox: Less AI Freedom, Better Code — Breaks down all three data points with benchmark tables and the counterintuitive finding that running at maximum reasoning budget scored worse (53.9%) than high (63.6%). Read this when someone says "we just need a smarter model."
Why does this matter for your career?
84% of developers use AI tools. Only 29% trust the output. That 55-point gap is the senior engineer's new job. One harness committed to version control multiplies across your whole team. Writing a great CLAUDE.md for 10 developers pays off more than writing 10,000 lines of code yourself.
Senior Engineers Don't Write Code. They Build Harnesses. — The career case with a harness review checklist for your next PR and the 4-era evolution of where senior engineers add value.
Where should you start reading?
Three paths based on where you are today:
New to harness engineering. Start with the pillar post for the definition, then the 5 layers post for the architecture. Come back here for your next deep-dive.
You have a CLAUDE.md and want more rigor. Read the memory fix post first to add MEMORY.md, then the failure-log pattern to rewrite your existing CLAUDE.md. Those two posts cover all of Layer 1.
Your agent has scared you at least once. Skip to the hook decision guide and ship one PreToolUse guard before your next session. Then read the constraint paradox for why this actually works.
FAQ
What is Claude Code harness engineering?
Harness engineering for Claude Code is configuring five layers around the model (Memory, Tools, Permissions, Hooks, Observability) to make the agent reliable in production. The model is commodity. The harness is your differentiator.
Do I need all 5 layers to start?
No. Start with Memory (CLAUDE.md + MEMORY.md) and Hooks (one PreToolUse guard). Those two cover the most common failure modes. Add the rest as your team scales or when a specific incident motivates it.
How is harness engineering different from prompt engineering?
Prompt engineering shapes what the agent tries. Context engineering shapes what the agent knows. Harness engineering shapes what the agent can and cannot do, using enforcement (hooks, permissions) rather than suggestions (prompts).
Does this only apply to Claude Code?
The principles apply to any AI coding agent. The implementation details (CLAUDE.md, PreToolUse hooks, MCP config) are Claude Code-specific. Claude Code offers the most programmable harness surface in the market today.
Try it now: Pick one path above, open the first linked post, copy one code block into your .claude/ folder, and run one Claude Code session with the change applied. The compound benefit starts on session #2.
Which layer would you add first? Drop it in the comments.
Originally published on ShipWithAI. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.
Top comments (0)