<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ShipWithAI</title>
    <description>The latest articles on DEV Community by ShipWithAI (@shipwithaiio).</description>
    <link>https://dev.to/shipwithaiio</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3878878%2Fd66b5c8e-e12a-4e3c-bf3b-b04ed48b4def.png</url>
      <title>DEV Community: ShipWithAI</title>
      <link>https://dev.to/shipwithaiio</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shipwithaiio"/>
    <language>en</language>
    <item>
      <title>Harness Engineering: Why the System Around AI Matters More Than the AI Itself</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Mon, 20 Apr 2026 12:03:07 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/harness-engineering-why-the-system-around-ai-matters-more-than-the-ai-itself-1o9i</link>
      <guid>https://dev.to/shipwithaiio/harness-engineering-why-the-system-around-ai-matters-more-than-the-ai-itself-1o9i</guid>
      <description>&lt;p&gt;Harness engineering is everything around your AI agent except the model: memory, tools, permissions, hooks, observability. LangChain gained 13.7 benchmark points by changing only the harness (52.8% to 66.5%, same model). Most developers only have Layer 1 (CLAUDE.md). Production needs all 5.&lt;/p&gt;




&lt;p&gt;Two lines of config. Same AI model. Completely different reliability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# CLAUDE.md approach (can be ignored)&lt;/span&gt;
&lt;span class="s2"&gt;"Never delete production database tables."&lt;/span&gt;
&lt;span class="c"&gt;# Claude reads this, weighs it against 200K tokens of context, may ignore it.&lt;/span&gt;

&lt;span class="c"&gt;# Hook approach (always enforced)&lt;/span&gt;
&lt;span class="c"&gt;# PreToolUse hook: command contains "DROP TABLE" + env=production → exit 2 → BLOCKED.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first is advice. The second is enforcement.&lt;/p&gt;

&lt;p&gt;One lives in a markdown file that competes with thousands of other tokens for the model's attention. The other is a shell script that runs before every command and cannot be bypassed. The gap between these two approaches is the gap most teams don't know exists.&lt;/p&gt;

&lt;p&gt;That gap has a name now: &lt;strong&gt;harness engineering&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is harness engineering? (And why prompt engineering isn't enough)
&lt;/h2&gt;

&lt;p&gt;Harness engineering is the discipline of building constraints, tools, feedback loops, and observability around an AI agent to make it reliable in production. The formula, popularized by &lt;a href="https://blog.langchain.com/improving-deep-agents-with-harness-engineering/" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; and refined on &lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;Martin Fowler's site&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Agent = Model + Harness&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model is a commodity. The harness is your competitive advantage.&lt;/p&gt;

&lt;p&gt;Mitchell Hashimoto, creator of Terraform and Ghostty, defined the core idea: anytime you find an agent makes a mistake, you engineer a solution so the agent never makes that mistake again. In Ghostty's repository, each line in the AGENTS.md file corresponds to a specific past agent failure that's now prevented.&lt;/p&gt;

&lt;p&gt;The industry has moved through three distinct eras:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Years&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Key Question&lt;/th&gt;
&lt;th&gt;Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Engineering&lt;/td&gt;
&lt;td&gt;2022-2024&lt;/td&gt;
&lt;td&gt;Crafting better instructions&lt;/td&gt;
&lt;td&gt;"How do I phrase this?"&lt;/td&gt;
&lt;td&gt;Instructions get diluted in long contexts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Engineering&lt;/td&gt;
&lt;td&gt;2025&lt;/td&gt;
&lt;td&gt;Curating what the model sees&lt;/td&gt;
&lt;td&gt;"What information does it need?"&lt;/td&gt;
&lt;td&gt;Knowing isn't doing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Harness Engineering&lt;/td&gt;
&lt;td&gt;2026&lt;/td&gt;
&lt;td&gt;Building systems around the agent&lt;/td&gt;
&lt;td&gt;"What can it do, and what can't it?"&lt;/td&gt;
&lt;td&gt;Emerging discipline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Prompt engineering shapes what the agent &lt;em&gt;tries&lt;/em&gt;. Context engineering shapes what the agent &lt;em&gt;knows&lt;/em&gt;. Harness engineering shapes what the agent &lt;strong&gt;can and cannot do&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How did LangChain gain 13.7 benchmark points without changing the model?
&lt;/h2&gt;

&lt;p&gt;By improving three harness components, LangChain jumped from 52.8% to 66.5% on &lt;a href="https://www.tbench.ai/news/announcement-2-0" rel="noopener noreferrer"&gt;Terminal Bench 2.0&lt;/a&gt; (a benchmark of 89 real-world terminal tasks) while keeping the same model, gpt-5.2-codex. They went from Top 30 to Top 5. No fine-tuning. No model swap. Just harness changes.&lt;/p&gt;

&lt;p&gt;Here are the three changes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Context injection.&lt;/strong&gt; LangChain's &lt;code&gt;LocalContextMiddleware&lt;/code&gt; maps the environment upfront and injects it directly into the agent's context. Before this change, the agent wasted steps trying to understand its surroundings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Self-verification loops.&lt;/strong&gt; After each action, the agent verifies its output against task-specific criteria before moving on. Not just "run the tests." The agent checks whether the output matches what the task actually asked for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Compute allocation.&lt;/strong&gt; This one is counterintuitive: running at maximum reasoning budget (xhigh) scored only 53.9%, while the high setting scored 63.6%. More compute caused timeouts that hurt overall performance.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Before harness changes&lt;/td&gt;
&lt;td&gt;52.8%&lt;/td&gt;
&lt;td&gt;Baseline, Top 30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;After harness changes (high reasoning)&lt;/td&gt;
&lt;td&gt;66.5%&lt;/td&gt;
&lt;td&gt;Top 5, +13.7pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max reasoning (xhigh)&lt;/td&gt;
&lt;td&gt;53.9%&lt;/td&gt;
&lt;td&gt;Worse than baseline, timeouts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you're evaluating AI coding tools by comparing model benchmarks alone, you're measuring the wrong variable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are the 5 layers of an AI agent harness?
&lt;/h2&gt;

&lt;p&gt;A production harness has five layers. Most developers I talk to in the Claude Code community have Layer 1 and maybe part of Layer 2. That leaves three layers of reliability on the table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What It Is&lt;/th&gt;
&lt;th&gt;Problem It Solves&lt;/th&gt;
&lt;th&gt;Claude Code Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Memory&lt;/td&gt;
&lt;td&gt;Persistent context across sessions&lt;/td&gt;
&lt;td&gt;Agent "forgets" your conventions every session&lt;/td&gt;
&lt;td&gt;CLAUDE.md, MEMORY.md, .claude/commands/&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Tools&lt;/td&gt;
&lt;td&gt;Extended capabilities beyond built-ins&lt;/td&gt;
&lt;td&gt;Agent can't access your APIs, databases, or services&lt;/td&gt;
&lt;td&gt;MCP servers, custom tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Permissions&lt;/td&gt;
&lt;td&gt;What the agent is allowed to do&lt;/td&gt;
&lt;td&gt;Agent edits sensitive files or runs dangerous commands&lt;/td&gt;
&lt;td&gt;settings.json allow/deny lists&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Hooks&lt;/td&gt;
&lt;td&gt;Automated enforcement at lifecycle points&lt;/td&gt;
&lt;td&gt;Instructions get ignored under context pressure&lt;/td&gt;
&lt;td&gt;PreToolUse/PostToolUse hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Observability&lt;/td&gt;
&lt;td&gt;Knowing what the agent actually did&lt;/td&gt;
&lt;td&gt;No visibility into agent decisions or cost&lt;/td&gt;
&lt;td&gt;Session logs, cost tracking, action audit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Think of it like your CI/CD pipeline. You built that infrastructure once, and the whole team benefits on every push. A harness works the same way for AI agent sessions.&lt;/p&gt;

&lt;p&gt;OpenAI demonstrated this at scale. Their Codex team shipped roughly one million lines of production code, with zero lines written by human hands, over five months. Their harness included AGENTS.md files, reproducible dev environments, and mechanical invariants in CI. Development throughput was roughly one-tenth the time a human team would have needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where is your harness right now?
&lt;/h2&gt;

&lt;p&gt;Run this checklist:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Do you have a CLAUDE.md with project conventions and constraints?&lt;/td&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Do you have MCP servers connecting Claude Code to external tools?&lt;/td&gt;
&lt;td&gt;Tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Do you have settings.json with explicit allow/deny lists?&lt;/td&gt;
&lt;td&gt;Permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Do you have at least one PreToolUse hook that blocks dangerous actions?&lt;/td&gt;
&lt;td&gt;Hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Can you see what Claude did in each session and how much it cost?&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Your score:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1/5&lt;/strong&gt;: You're in the majority. Most developers stop at CLAUDE.md.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2-3/5&lt;/strong&gt;: Ahead of most. You've started building real infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4-5/5&lt;/strong&gt;: Production-ready. You're doing harness engineering whether you knew the name or not.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Be honest about question 4. If the answer is no, your agent can still &lt;code&gt;rm -rf&lt;/code&gt; your project directory. CLAUDE.md says "don't do that." A hook actually prevents it.&lt;/p&gt;

&lt;p&gt;Here's why this matters: an ETH Zurich study (Feb 2026) tested context files across 138 real-world tasks from 12 Python repositories. Human-written context files improved agent success by only about 4%. LLM-generated ones actually &lt;em&gt;reduced&lt;/em&gt; success by ~3% while increasing inference costs by over 20%. Instructions alone aren't enough. You need enforcement layers.&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you start building a harness today?
&lt;/h2&gt;

&lt;p&gt;You don't need all 5 layers at once. Start with three high-impact changes that take less than 30 minutes total.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Win 1: Create a MEMORY.md (5 minutes)
&lt;/h3&gt;

&lt;p&gt;MEMORY.md is a lightweight index that points to where knowledge lives in your project. Unlike CLAUDE.md (which holds static rules), MEMORY.md tracks evolving state: recent decisions, architectural changes, active work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Auth&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;src/lib/auth/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Clerk, not NextAuth. Migrated March 2026.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;prisma/schema.prisma&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — PostgreSQL on Supabase. All queries via Prisma.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Deploy&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;docs/deploy.md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Vercel preview for PRs, production on main.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Testing&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;vitest.config.ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Vitest unit, Playwright E2E. Min 80% coverage.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;API&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;src/app/api/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Server Actions preferred over API routes for mutations.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Win 2: Add one PreToolUse guardrail hook (15 minutes)
&lt;/h3&gt;

&lt;p&gt;This hook blocks Claude Code from editing sensitive files. Copy-paste ready:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# .claude/hooks/block-sensitive-files.sh&lt;/span&gt;
&lt;span class="c"&gt;# Blocks edits to .env, credentials, and CI config&lt;/span&gt;

&lt;span class="nv"&gt;INPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;FILE_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.tool_input.file_path // empty'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;SENSITIVE&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s1"&gt;'.env'&lt;/span&gt; &lt;span class="s1"&gt;'credentials'&lt;/span&gt; &lt;span class="s1"&gt;'.github/workflows'&lt;/span&gt; &lt;span class="s1"&gt;'secrets'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;pattern &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SENSITIVE&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$pattern&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"BLOCKED: Cannot edit sensitive file: &lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
  &lt;span class="k"&gt;fi
done

&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register it in &lt;code&gt;.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Edit|Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/block-sensitive-files.sh"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Win 3: Enable cost awareness (10 minutes)
&lt;/h3&gt;

&lt;p&gt;Track what each session costs so you notice anomalies early. Boris Cherny, creator of Claude Code, calls verification "probably the most important thing" for quality:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Give Claude a way to verify its work. If Claude has that feedback loop, it will 2-3x the quality of the final result."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Start simple: review &lt;code&gt;~/.claude/projects/&lt;/code&gt; after each session to check what Claude did and how much it cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between harness engineering and prompt engineering?
&lt;/h3&gt;

&lt;p&gt;Prompt engineering shapes what the agent tries. Context engineering shapes what the agent knows. Harness engineering shapes what the agent can and cannot do. They're not replacements — they're layers. A production AI workflow uses all three, but harness engineering provides the strongest reliability guarantees because it uses enforcement (hooks, permissions) rather than suggestions (prompts, context).&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need harness engineering for Claude Code?
&lt;/h3&gt;

&lt;p&gt;Yes. Claude Code is itself a harness that Anthropic built around their model. But it's the &lt;em&gt;inner&lt;/em&gt; harness. You need an &lt;em&gt;outer&lt;/em&gt; harness tailored to your project: CLAUDE.md for conventions, hooks for guardrails, MCP servers for tools, permissions for safety boundaries, and observability for cost control.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is harness engineering only for Claude Code?
&lt;/h3&gt;

&lt;p&gt;No. The principles apply to any AI coding agent: Cursor, GitHub Copilot, OpenAI Codex, Windsurf, Cline. Claude Code happens to offer the most programmable harness surface (17 hook events, MCP protocol, skills system), which is why examples here use it. The concepts transfer directly to other tools.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Pick one quick win above and implement it before your next Claude Code session. Quick Win 2 is copy-paste ready and takes 3 minutes.&lt;/p&gt;

&lt;p&gt;What's your harness score right now? Drop it in the comments — I'm curious how many devs have gone beyond Layer 1.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
