<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ShipWithAI</title>
    <description>The latest articles on DEV Community by ShipWithAI (@shipwithaiio).</description>
    <link>https://dev.to/shipwithaiio</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3878878%2Fd66b5c8e-e12a-4e3c-bf3b-b04ed48b4def.png</url>
      <title>DEV Community: ShipWithAI</title>
      <link>https://dev.to/shipwithaiio</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shipwithaiio"/>
    <language>en</language>
    <item>
      <title>The Complete Claude Code Harness Engineering Guide (5 Layers, 8 Deep-Dives)</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Fri, 08 May 2026 01:00:00 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/the-complete-claude-code-harness-engineering-guide-5-layers-8-deep-dives-3d4j</link>
      <guid>https://dev.to/shipwithaiio/the-complete-claude-code-harness-engineering-guide-5-layers-8-deep-dives-3d4j</guid>
      <description>&lt;p&gt;Harness engineering is everything around your AI agent except the model: memory, tools, permissions, hooks, observability. LangChain gained 13.7 benchmark points changing only the harness. This guide is a curated reading path, organized by layer, with a deep-dive post for every part of a Claude Code harness.&lt;/p&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Layer 1 only (what most devs have)
  → Advice the model may ignore

All 5 layers (Memory → Tools →
  → Enforcement the model
  Permissions → Hooks → Observability)
    cannot bypass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LangChain jumped from 52.8% to 66.5% on Terminal Bench 2.0 by changing only the harness. Same model. 13.7 points of pure architecture gain (&lt;a href="https://blog.langchain.com/improving-deep-agents-with-harness-engineering/" rel="noopener noreferrer"&gt;LangChain Blog, Feb 2026&lt;/a&gt;). Most Claude Code users stop at Layer 1. This guide is the reading path to the other four.&lt;/p&gt;

&lt;p&gt;If you want the &lt;em&gt;theory&lt;/em&gt; of harness engineering, read the &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-harness-engineering-claude-code" rel="noopener noreferrer"&gt;pillar post&lt;/a&gt;. If you want the &lt;em&gt;architecture&lt;/em&gt; deep-dive, read the &lt;a href="https://shipwithai.io/blog/claude-code-harness-5-layers/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-harness-5-layers" rel="noopener noreferrer"&gt;5 layers post&lt;/a&gt;. This post is something different: a navigation hub organized by layer, with one deep-dive per topic, that you can return to as your harness grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Claude Code harness engineering?
&lt;/h2&gt;

&lt;p&gt;Harness engineering is the discipline of building everything around an AI agent — constraints, tools, feedback loops, observability — so it becomes reliable in production. For Claude Code, the harness is five layers: Memory (CLAUDE.md), Tools (MCP), Permissions (settings.json), Hooks (PreToolUse/PostToolUse), and Observability (session logs).&lt;/p&gt;

&lt;p&gt;The formula: &lt;strong&gt;Agent = Model + Harness&lt;/strong&gt; (&lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;Martin Fowler, Apr 2026&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The model is commodity. Every team on Sonnet 4.6 or Opus 4.7 gets the same raw capability. Your harness is what differentiates your team's output.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are the 5 layers of a Claude Code harness?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Claude Code File&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Memory&lt;/td&gt;
&lt;td&gt;What the agent knows&lt;/td&gt;
&lt;td&gt;CLAUDE.md, MEMORY.md&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Tools&lt;/td&gt;
&lt;td&gt;What it can reach&lt;/td&gt;
&lt;td&gt;settings.json (MCP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Permissions&lt;/td&gt;
&lt;td&gt;What it's allowed to do&lt;/td&gt;
&lt;td&gt;settings.json allow/deny&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Hooks&lt;/td&gt;
&lt;td&gt;What's enforced at runtime&lt;/td&gt;
&lt;td&gt;PreToolUse/PostToolUse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Observability&lt;/td&gt;
&lt;td&gt;What you can see afterward&lt;/td&gt;
&lt;td&gt;Session logs, cost tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Layer 1: What does your agent know before you type?
&lt;/h2&gt;

&lt;p&gt;The memory layer is every file Claude Code reads before the first keystroke. CLAUDE.md holds your project rules. MEMORY.md holds the evolving state. Most developers ship only a CLAUDE.md and treat it as a wishlist of aspirations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://shipwithai.io/blog/claude-code-memory-md-fix/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-memory-md-fix" rel="noopener noreferrer"&gt;Your AI Agent Forgets Everything. Here's the Fix.&lt;/a&gt;&lt;/strong&gt; — MEMORY.md is a 200-line index that Claude reads at session start. Setup takes 5 minutes. Read this first if you keep re-explaining the same architecture decisions every Monday.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://shipwithai.io/blog/claude-md-failure-log-pattern/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-md-failure-log-pattern" rel="noopener noreferrer"&gt;Your CLAUDE.md Is an Instruction File. It Should Be a Failure Log.&lt;/a&gt;&lt;/strong&gt; — Mitchell Hashimoto's AGENTS.md in Ghostty has zero aspirational lines. Every entry traces to a real agent mistake. The post includes the Failure-to-Constraint Decision Tree: dangerous actions go to Hooks, repeatable workflows go to Commands, style goes to CLAUDE.md.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 4: What can the agent NOT do?
&lt;/h2&gt;

&lt;p&gt;Hooks are the enforcement layer. Memory is advice. Hooks are law. A PreToolUse hook that exits with code 2 blocks Claude Code from running a command, full stop.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# PreToolUse hook: 6 lines that save you from yourself&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TOOL_INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="s2"&gt;"DROP TABLE"&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ENV&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"production"&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"BLOCKED: destructive SQL in production"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
&lt;span class="k"&gt;fi
&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;a href="https://shipwithai.io/blog/claude-code-hook-decision-guide/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-hook-decision-guide" rel="noopener noreferrer"&gt;Which Claude Code Hook Do You Need? A Decision Guide&lt;/a&gt;&lt;/strong&gt; — The 4 handler types (Deny, Log, Transform, Enrich), when to reach for PreToolUse vs PostToolUse, and which 3 hooks every production setup should have.&lt;/p&gt;

&lt;p&gt;A PreToolUse hook exiting with code 2 is the only mechanism in Claude Code that unconditionally blocks a tool call. Instructions in CLAUDE.md can still be overridden by context or model reasoning. Hooks cannot be bypassed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 5: How do you know what your agent actually did?
&lt;/h2&gt;

&lt;p&gt;Observability turns "my agent did something weird" into a reproducible bug report. One of LangChain's three harness improvements was a verification middleware that made the agent check its own work before marking a task complete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://shipwithai.io/blog/claude-code-self-verification-loop/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-self-verification-loop" rel="noopener noreferrer"&gt;Build a Self-Verification Loop for Claude Code&lt;/a&gt;&lt;/strong&gt; — Adapts LangChain's PreCompletionChecklistMiddleware to Claude Code. Boris Cherny (creator of Claude Code) calls verification "probably the most important thing" for quality.&lt;/p&gt;

&lt;p&gt;LangChain's three improvements mapped to layers: context injection (Layer 1), self-verification loops (Layer 5), and compute allocation (Layer 5). No single layer explained the full +13.7 point gain. They needed three layers working together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why does this actually work?
&lt;/h2&gt;

&lt;p&gt;Three independent data points prove constraints beat capability:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LangChain&lt;/strong&gt;: +13.7 on Terminal Bench 2.0 with harness changes only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Codex&lt;/strong&gt;: ~1 million lines of production code, zero human-written lines over five months, all inside heavily constrained harness environments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mitchell Hashimoto's Ghostty&lt;/strong&gt;: every AGENTS.md line is a prevented failure&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://shipwithai.io/blog/harness-engineering-constraint-paradox/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-harness-engineering-constraint-paradox" rel="noopener noreferrer"&gt;The Constraint Paradox: Less AI Freedom, Better Code&lt;/a&gt;&lt;/strong&gt; — Breaks down all three data points with benchmark tables and the counterintuitive finding that running at maximum reasoning budget scored &lt;em&gt;worse&lt;/em&gt; (53.9%) than high (63.6%). Read this when someone says "we just need a smarter model."&lt;/p&gt;




&lt;h2&gt;
  
  
  Why does this matter for your career?
&lt;/h2&gt;

&lt;p&gt;84% of developers use AI tools. Only 29% trust the output. That 55-point gap is the senior engineer's new job. One harness committed to version control multiplies across your whole team. Writing a great CLAUDE.md for 10 developers pays off more than writing 10,000 lines of code yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://shipwithai.io/blog/harness-engineering-senior-developer-guide/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-harness-engineering-senior-developer-guide" rel="noopener noreferrer"&gt;Senior Engineers Don't Write Code. They Build Harnesses.&lt;/a&gt;&lt;/strong&gt; — The career case with a harness review checklist for your next PR and the 4-era evolution of where senior engineers add value.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where should you start reading?
&lt;/h2&gt;

&lt;p&gt;Three paths based on where you are today:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New to harness engineering.&lt;/strong&gt; Start with the &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-harness-engineering-claude-code" rel="noopener noreferrer"&gt;pillar post&lt;/a&gt; for the definition, then the &lt;a href="https://shipwithai.io/blog/claude-code-harness-5-layers/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-harness-5-layers" rel="noopener noreferrer"&gt;5 layers post&lt;/a&gt; for the architecture. Come back here for your next deep-dive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You have a CLAUDE.md and want more rigor.&lt;/strong&gt; Read &lt;a href="https://shipwithai.io/blog/claude-code-memory-md-fix/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-memory-md-fix" rel="noopener noreferrer"&gt;the memory fix post&lt;/a&gt; first to add MEMORY.md, then &lt;a href="https://shipwithai.io/blog/claude-md-failure-log-pattern/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-md-failure-log-pattern" rel="noopener noreferrer"&gt;the failure-log pattern&lt;/a&gt; to rewrite your existing CLAUDE.md. Those two posts cover all of Layer 1.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your agent has scared you at least once.&lt;/strong&gt; Skip to the &lt;a href="https://shipwithai.io/blog/claude-code-hook-decision-guide/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-hook-decision-guide" rel="noopener noreferrer"&gt;hook decision guide&lt;/a&gt; and ship one PreToolUse guard before your next session. Then read &lt;a href="https://shipwithai.io/blog/harness-engineering-constraint-paradox/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-harness-engineering-constraint-paradox" rel="noopener noreferrer"&gt;the constraint paradox&lt;/a&gt; for why this actually works.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Claude Code harness engineering?
&lt;/h3&gt;

&lt;p&gt;Harness engineering for Claude Code is configuring five layers around the model (Memory, Tools, Permissions, Hooks, Observability) to make the agent reliable in production. The model is commodity. The harness is your differentiator.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need all 5 layers to start?
&lt;/h3&gt;

&lt;p&gt;No. Start with Memory (CLAUDE.md + MEMORY.md) and Hooks (one PreToolUse guard). Those two cover the most common failure modes. Add the rest as your team scales or when a specific incident motivates it.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is harness engineering different from prompt engineering?
&lt;/h3&gt;

&lt;p&gt;Prompt engineering shapes what the agent tries. Context engineering shapes what the agent knows. Harness engineering shapes what the agent &lt;em&gt;can and cannot do&lt;/em&gt;, using enforcement (hooks, permissions) rather than suggestions (prompts).&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this only apply to Claude Code?
&lt;/h3&gt;

&lt;p&gt;The principles apply to any AI coding agent. The implementation details (CLAUDE.md, PreToolUse hooks, MCP config) are Claude Code-specific. Claude Code offers the most programmable harness surface in the market today.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Pick one path above, open the first linked post, copy one code block into your &lt;code&gt;.claude/&lt;/code&gt; folder, and run one Claude Code session with the change applied. The compound benefit starts on session #2.&lt;/p&gt;

&lt;p&gt;Which layer would you add first? Drop it in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/claude-code-harness-engineering-guide/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-harness-engineering-guide" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>claude</category>
    </item>
    <item>
      <title>Hardening Your npm CI in 5 Concrete Layers</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Thu, 07 May 2026 14:20:33 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/hardening-your-npm-ci-in-5-concrete-layers-309f</link>
      <guid>https://dev.to/shipwithaiio/hardening-your-npm-ci-in-5-concrete-layers-309f</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;Your CI pipeline installs dependencies far more often than any developer’s laptop. That frequency makes it the biggest npm attack surface. I recently saw the Bitwarden breach where a hijacked GitHub Action pulled a malicious CLI for 90 minutes and harvested every credential on the runner. Below is the exact 5‑layer playbook we dog‑fooded at ShipWithAI to stop that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Most CI configs still look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;   &lt;span class="c1"&gt;# mutable tag&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt; &lt;span class="c1"&gt;# mutable tag&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm install&lt;/span&gt;           &lt;span class="c1"&gt;# silent version bumps&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm publish&lt;/span&gt;           &lt;span class="c1"&gt;# uses stored NPM_TOKEN&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The red flags are obvious: mutable tags, &lt;code&gt;npm install&lt;/code&gt;, long‑lived tokens, no lockfile validation, and no dependency review. Each one is a foothold for an attacker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution Walkthrough
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Layer 1 – Enforce &lt;code&gt;npm ci&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;npm ci&lt;/code&gt; installs &lt;strong&gt;only&lt;/strong&gt; from the lockfile and fails on any mismatch. It also wipes &lt;code&gt;node_modules&lt;/code&gt; first, guaranteeing a clean slate. Replace every &lt;code&gt;npm install&lt;/code&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install deps&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci --ignore-scripts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Commit a project‑level &lt;code&gt;.npmrc&lt;/code&gt; with &lt;code&gt;ignore-scripts=true&lt;/code&gt;, &lt;code&gt;save-exact=true&lt;/code&gt;, and &lt;code&gt;audit-level=moderate&lt;/code&gt; so every runner inherits the same defaults.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2 – Validate lockfile integrity
&lt;/h3&gt;

&lt;p&gt;Add &lt;code&gt;lockfile-lint&lt;/code&gt; to the workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Lint lockfile&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npx lockfile-lint --allowed-hosts npmjs.com --validate-https&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This blocks PRs that tamper with the lockfile source URLs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3 – Dependency review action
&lt;/h3&gt;

&lt;p&gt;GitHub’s &lt;code&gt;dependency-review-action&lt;/code&gt; flags new or changed dependencies before merge:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dependency review&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github/dependency-review-action@v2&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;allow-scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;runtime,development&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 4 – Pin actions to SHA
&lt;/h3&gt;

&lt;p&gt;Instead of &lt;code&gt;actions/setup-node@v4&lt;/code&gt;, use the exact SHA of the release you’ve vetted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@d3b0c5f...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a tag gets hijacked, your workflow stays on the trusted commit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 5 – OIDC trusted publishing
&lt;/h3&gt;

&lt;p&gt;Replace static &lt;code&gt;NPM_TOKEN&lt;/code&gt; secrets with OIDC tokens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Publish&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm/publish-action@v2&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;token-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oidc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub issues a short‑lived token that expires with the job, eliminating long‑lived credential leakage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Switching to &lt;code&gt;npm ci&lt;/code&gt; alone caught three silent version bumps in the first week. Adding the full stack stopped a malicious lockfile PR from ever reaching merge and removed the need to store a permanent NPM token.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic installs&lt;/strong&gt; (&lt;code&gt;npm ci&lt;/code&gt;) are non‑negotiable for CI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate lockfiles&lt;/strong&gt; before they touch the runner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review deps&lt;/strong&gt; on every PR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pin actions&lt;/strong&gt; to immutable SHAs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Publish with OIDC&lt;/strong&gt; to avoid static secrets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion &amp;amp; CTA
&lt;/h2&gt;

&lt;p&gt;These five layers are easy to copy‑paste into any repo and give you a solid defense against the kind of supply‑chain hijack that hit Bitwarden. Follow me for more concrete SDLC hardening tips and feel free to drop your CI questions in the comments.&lt;/p&gt;

&lt;p&gt;Originally published at &lt;a href="https://shipwithai.io/blog/npm-ci-security-team-playbook/" rel="noopener noreferrer"&gt;https://shipwithai.io/blog/npm-ci-security-team-playbook/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>npm</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Which Claude Code Hook Do You Need? A Decision Guide</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Wed, 06 May 2026 01:00:00 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/which-claude-code-hook-do-you-need-a-decision-guide-21h5</link>
      <guid>https://dev.to/shipwithaiio/which-claude-code-hook-do-you-need-a-decision-guide-21h5</guid>
      <description>&lt;p&gt;Claude Code has 4 hook handler types (command, prompt, agent, http) and 21 lifecycle events. Most developers default to command hooks on PreToolUse. This decision guide helps you pick the right type for the right event, and tells you which 3 to implement first.&lt;/p&gt;




&lt;p&gt;Two configs. Same goal: block a force push to main. Different reliability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Command hook (deterministic, &amp;lt;5ms)&lt;/span&gt;
&lt;span class="nv"&gt;COMMAND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.tool_input.command // empty'&lt;/span&gt; &amp;lt; /dev/stdin&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$COMMAND&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-qE&lt;/span&gt; &lt;span class="s1"&gt;'git push.*(--force|-f).*main'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"BLOCKED: force push to main"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Prompt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;hook&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(non-deterministic,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;300-2000&lt;/span&gt;&lt;span class="err"&gt;ms)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Block this if it looks like a force push to a production branch"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The command hook is 5 lines of bash. It runs in under 5ms. It catches every &lt;code&gt;git push --force main&lt;/code&gt; without exception.&lt;/p&gt;

&lt;p&gt;The prompt hook calls an LLM. It takes 300-2000ms. It might decide &lt;code&gt;--force-with-lease&lt;/code&gt; is safe enough to allow.&lt;/p&gt;

&lt;p&gt;Both are "hooks." Choosing the wrong type turns a guardrail into a suggestion. CLAUDE.md instructions achieve 70-90% compliance. Hooks achieve 100% — but only when you pick the right one.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are the 4 Claude Code hook handler types?
&lt;/h2&gt;

&lt;p&gt;Each type trades speed for intelligence differently. Pick the wrong type and your 100% guardrail drops to a probabilistic suggestion.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Handler&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Deterministic?&lt;/th&gt;
&lt;th&gt;Codebase Access?&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;command&lt;/td&gt;
&lt;td&gt;&amp;lt;5ms&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (stdin only)&lt;/td&gt;
&lt;td&gt;Guardrails, formatting, logging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;prompt&lt;/td&gt;
&lt;td&gt;300-2000ms&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Nuanced decisions on Stop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;agent&lt;/td&gt;
&lt;td&gt;2-10s&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (full tools)&lt;/td&gt;
&lt;td&gt;Deep verification, architecture&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;http&lt;/td&gt;
&lt;td&gt;50-500ms&lt;/td&gt;
&lt;td&gt;Yes (your server)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Team policies, centralized audit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Command hooks&lt;/strong&gt; are shell scripts. They read JSON from stdin, run fast, and return deterministic results. Use them for anything you can express as a string match, path check, or regex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt hooks&lt;/strong&gt; call an LLM to make a judgment call. Only use them when the decision genuinely requires reasoning, like evaluating subagent output quality on &lt;code&gt;SubagentStop&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent hooks&lt;/strong&gt; spawn a full Claude Code session that can read files, search code, and run tools. Reserve them for verification tasks that need codebase context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTTP hooks&lt;/strong&gt; POST to your server. Useful for centralized team policies and audit logging.&lt;/p&gt;

&lt;p&gt;The critical rule: &lt;strong&gt;never use prompt-based hooks for safety boundaries.&lt;/strong&gt; Prompt hooks involve LLM judgment, and LLMs can be wrong. Safety boundaries need deterministic command hooks.&lt;/p&gt;




&lt;h2&gt;
  
  
  When should you use CLAUDE.md vs a hook vs both?
&lt;/h2&gt;

&lt;p&gt;Use CLAUDE.md for conventions the agent should follow. Use hooks for rules the agent must never break. Use both when you want the agent to understand WHY while the hook enforces WHAT.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Is this a HARD constraint (must NEVER be violated)?
├── YES → Can you test it with a string/path/regex check?
│         ├── YES → Command hook (PreToolUse)
│         └── NO  → Does it need codebase context?
│                   ├── YES → Agent hook
│                   └── NO  → Prompt hook or HTTP hook
└── NO  → Is it a preference or convention?
              ├── YES → CLAUDE.md (~70-90% compliance)
              └── NO  → Is it a repeatable workflow?
                        ├── YES → Skill or .claude/commands/
                        └── NO  → You probably don't need it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When should you use both? When the constraint is structural (hook enforces it) but the agent also benefits from understanding the reasoning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hook&lt;/strong&gt;: PreToolUse blocks &lt;code&gt;git push --force&lt;/code&gt; to main&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLAUDE.md&lt;/strong&gt;: "We use &lt;code&gt;--force-with-lease&lt;/code&gt; instead of &lt;code&gt;--force&lt;/code&gt; because a force push overwrote a teammate's commits in March 2026"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hook prevents the bad action. The CLAUDE.md helps the agent choose the right alternative.&lt;/p&gt;




&lt;h2&gt;
  
  
  Which hook events should you implement first?
&lt;/h2&gt;

&lt;p&gt;Start with 3 events in this order:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;th&gt;Handler&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Setup Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1st&lt;/td&gt;
&lt;td&gt;PreToolUse&lt;/td&gt;
&lt;td&gt;command&lt;/td&gt;
&lt;td&gt;Block dangerous actions&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2nd&lt;/td&gt;
&lt;td&gt;PostToolUse&lt;/td&gt;
&lt;td&gt;command&lt;/td&gt;
&lt;td&gt;Auto-format, log actions&lt;/td&gt;
&lt;td&gt;20 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3rd&lt;/td&gt;
&lt;td&gt;Stop&lt;/td&gt;
&lt;td&gt;agent&lt;/td&gt;
&lt;td&gt;Verify work before done&lt;/td&gt;
&lt;td&gt;30 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4th&lt;/td&gt;
&lt;td&gt;SessionStart&lt;/td&gt;
&lt;td&gt;command&lt;/td&gt;
&lt;td&gt;Load env vars, context&lt;/td&gt;
&lt;td&gt;10 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5th&lt;/td&gt;
&lt;td&gt;SubagentStop&lt;/td&gt;
&lt;td&gt;prompt&lt;/td&gt;
&lt;td&gt;Validate subagent output&lt;/td&gt;
&lt;td&gt;20 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6th&lt;/td&gt;
&lt;td&gt;PermissionRequest&lt;/td&gt;
&lt;td&gt;command&lt;/td&gt;
&lt;td&gt;Auto-approve safe patterns&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7th&lt;/td&gt;
&lt;td&gt;PreCompact&lt;/td&gt;
&lt;td&gt;command&lt;/td&gt;
&lt;td&gt;Preserve context on compact&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Your first hook — a PreToolUse command hook that blocks force pushes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# .claude/hooks/block-force-push.sh&lt;/span&gt;
&lt;span class="c"&gt;# Blocks git push --force and -f to main/master/production&lt;/span&gt;

&lt;span class="nv"&gt;COMMAND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.tool_input.command // empty'&lt;/span&gt; &amp;lt; /dev/stdin&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$COMMAND&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-qE&lt;/span&gt; &lt;span class="s1"&gt;'git push.*(--force|-f)'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$COMMAND&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-qE&lt;/span&gt; &lt;span class="s1"&gt;'(main|master|production)'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"BLOCKED: force push to protected branch"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register it in &lt;code&gt;.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/block-force-push.sh"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How do you handle multiple hooks on the same event?
&lt;/h2&gt;

&lt;p&gt;Hooks on the same event run in definition order. For PreToolUse, the strictest decision wins: deny beats defer, defer beats ask, ask beats allow. If any hook denies, the action is blocked regardless of what other hooks return.&lt;/p&gt;

&lt;p&gt;Chain hooks from fastest to slowest to minimize latency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/block-force-push.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/validate-paths.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/log-action.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Decision precedence hierarchy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;deny   → Action blocked. Feedback sent to model.
defer  → Action paused (headless mode). External UI resumes.
ask    → User prompted for confirmation.
allow  → Action proceeds. Skips built-in permission check.
(none) → Default behavior. Built-in permission check runs.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What are the most common hook mistakes?
&lt;/h2&gt;

&lt;p&gt;Three mistakes account for most "my hook doesn't work" reports:&lt;/p&gt;

&lt;h3&gt;
  
  
  Exit code cheat sheet
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Exit Code&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Model Sees Feedback?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Success (parse JSON from stdout)&lt;/td&gt;
&lt;td&gt;Yes, if JSON provided&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Block action (stderr becomes feedback)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Any other&lt;/td&gt;
&lt;td&gt;Silent error (logged in verbose only)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The exit 1 vs exit 2 distinction is the #1 gotcha. Exit 1 means "my hook crashed." Claude Code logs it quietly and continues. Exit 2 means "I'm deliberately blocking this action."&lt;/p&gt;

&lt;h3&gt;
  
  
  Debug workflow
&lt;/h3&gt;

&lt;p&gt;Test any hook manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'{"tool_name":"Bash","tool_input":{"command":"git push --force main"}}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    | bash .claude/hooks/block-force-push.sh
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Exit code: &lt;/span&gt;&lt;span class="nv"&gt;$?&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the hook doesn't run at all, check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Path correct?&lt;/strong&gt; Command path is relative to project root, not the hooks directory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Matcher correct?&lt;/strong&gt; &lt;code&gt;"matcher": "Bash"&lt;/code&gt; matches the tool name, not the command content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Settings level?&lt;/strong&gt; Project &lt;code&gt;.claude/settings.json&lt;/code&gt; overrides user &lt;code&gt;~/.claude/settings.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File executable?&lt;/strong&gt; Run &lt;code&gt;chmod +x .claude/hooks/your-hook.sh&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON valid?&lt;/strong&gt; A syntax error in settings.json silently disables all hooks&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are the 4 Claude Code hook handler types?
&lt;/h3&gt;

&lt;p&gt;Command (shell scripts, &amp;lt;5ms, deterministic), prompt (LLM judgment, 300-2000ms), agent (multi-turn verification with codebase access, 2-10s), and http (webhooks, 50-500ms). Use command hooks for guardrails and formatting. Use prompt or agent hooks for nuanced decisions that require reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use CLAUDE.md or a hook for security rules?
&lt;/h3&gt;

&lt;p&gt;Hooks. CLAUDE.md instructions achieve 70-90% compliance because they compete with 200K tokens of context. A PreToolUse command hook achieves 100% compliance because it runs outside the LLM's reasoning chain. Use CLAUDE.md to explain WHY. Use hooks to enforce WHAT.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between PreToolUse and PostToolUse hooks?
&lt;/h3&gt;

&lt;p&gt;PreToolUse runs BEFORE a tool executes and can block it (exit code 2) or modify its input. PostToolUse runs AFTER execution and cannot undo the action, but it can auto-format code, log what happened, or inject feedback. PreToolUse for prevention, PostToolUse for reaction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Claude Code hooks run in headless mode?
&lt;/h3&gt;

&lt;p&gt;Yes. All hook types work in headless mode (&lt;code&gt;claude -p&lt;/code&gt;). PreToolUse hooks can return &lt;code&gt;permissionDecision: "defer"&lt;/code&gt; to pause execution for external UI collection. This makes hooks fully compatible with CI/CD pipelines and SDK-based workflows.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Copy the force-push blocker script into &lt;code&gt;.claude/hooks/block-force-push.sh&lt;/code&gt;, register it in &lt;code&gt;.claude/settings.json&lt;/code&gt;, make it executable with &lt;code&gt;chmod +x&lt;/code&gt;, and test it with the debug command above. Verify exit code 2. You now have one production-ready guardrail.&lt;/p&gt;

&lt;p&gt;Which hook event would you implement first? Drop it in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/claude-code-hook-decision-guide/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-hook-decision-guide" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>4 Lines in ~/.npmrc That Block 80% of npm Supply Chain Attacks</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Mon, 04 May 2026 01:00:00 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/4-lines-in-npmrc-that-block-80-of-npm-supply-chain-attacks-1acp</link>
      <guid>https://dev.to/shipwithaiio/4-lines-in-npmrc-that-block-80-of-npm-supply-chain-attacks-1acp</guid>
      <description>&lt;p&gt;Four lines in &lt;code&gt;~/.npmrc&lt;/code&gt; block the most common npm supply chain attacks before they execute. Setup takes 30 seconds. This is the bare-minimum defense for anyone letting Claude Code or Cursor run &lt;code&gt;npm install&lt;/code&gt; on their machine.&lt;/p&gt;




&lt;p&gt;These four lines are on my laptop right now. I added them the morning the axios news broke and forgot about them. Since then, every &lt;code&gt;npm install&lt;/code&gt; Claude Code has run on my machine, across five side projects, has skipped lifecycle scripts by default. Zero breakage. Zero effort.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.npmrc
&lt;/span&gt;&lt;span class="py"&gt;ignore-scripts&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;save-exact&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;audit-level&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;moderate&lt;/span&gt;
&lt;span class="py"&gt;fund&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In 2025, attackers published &lt;strong&gt;454,648 malicious npm packages&lt;/strong&gt; — roughly half a million in a single year (&lt;a href="https://www.sonatype.com/blog/open-source-malware-index-q4-2025-automation-overwhelms-ecosystems" rel="noopener noreferrer"&gt;Sonatype Open Source Malware Index, 2026&lt;/a&gt;). The four lines above block the most common payload mechanism (lifecycle scripts) for every project on your laptop, including whatever Claude Code ran at 2am last night.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why is your default npm setup unsafe in 2026?
&lt;/h2&gt;

&lt;p&gt;npm ships with lifecycle scripts enabled by default. That means any package, direct or transitive, can execute arbitrary code on your machine during &lt;code&gt;npm install&lt;/code&gt; — before you ever type &lt;code&gt;require()&lt;/code&gt;. Over 99% of all open source malware now targets npm.&lt;/p&gt;

&lt;p&gt;Here's the same attack pattern, compressed across eight years:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Incident&lt;/th&gt;
&lt;th&gt;Payload vector&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2018&lt;/td&gt;
&lt;td&gt;event-stream (Bitcoin wallet stealer, 2M/wk)&lt;/td&gt;
&lt;td&gt;postinstall&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025 Sep&lt;/td&gt;
&lt;td&gt;Shai-Hulud worm, 18 packages, 2.6B/wk downloads&lt;/td&gt;
&lt;td&gt;postinstall&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026 Mar&lt;/td&gt;
&lt;td&gt;
&lt;a href="mailto:axios@1.14.1"&gt;axios@1.14.1&lt;/a&gt; RAT, 100M/wk downloads&lt;/td&gt;
&lt;td&gt;postinstall&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three incidents across eight years. Same mechanism every time. npm's official response each time is to unpublish the package and write a blog post. No structural change to how &lt;code&gt;postinstall&lt;/code&gt; works.&lt;/p&gt;

&lt;p&gt;The uncomfortable part: 84% of developers use AI coding tools, and 41% of code written in 2025 was AI-generated or AI-assisted. AI agents install packages at machine speed, with approval fatigue doing the rest. The human review step that used to catch weird dependencies has already been deleted from most workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  What each line does
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;ignore-scripts=true&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Disables &lt;code&gt;preinstall&lt;/code&gt;, &lt;code&gt;install&lt;/code&gt;, and &lt;code&gt;postinstall&lt;/code&gt; lifecycle scripts for every &lt;code&gt;npm install&lt;/code&gt;. The OWASP NPM Security Cheat Sheet calls this the single most effective mitigation against malicious or compromised packages. The &lt;a href="mailto:axios@1.14.1"&gt;axios@1.14.1&lt;/a&gt; RAT, the Shai-Hulud worm, event-stream's Bitcoin stealer — all needed this mechanism to execute. Turn it off globally and the default delivery vehicle is gone.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;save-exact=true&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Pins exact versions in &lt;code&gt;package.json&lt;/code&gt; whenever you add a package. Without it, &lt;code&gt;npm install axios&lt;/code&gt; writes &lt;code&gt;"axios": "^1.14.0"&lt;/code&gt;, a caret range that resolves to &lt;code&gt;1.14.1&lt;/code&gt; on the next clean install. With &lt;code&gt;save-exact=true&lt;/code&gt;, the same command writes &lt;code&gt;"axios": "1.14.0"&lt;/code&gt;. A hijacked patch release cannot silently promote itself into your lockfile.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;audit-level=moderate&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Raises &lt;code&gt;npm install&lt;/code&gt; exit code when known CVEs of moderate or higher severity are present. Default behavior is warn-only. This flag makes audit block instead — which means CI or Claude Code sessions fail loud rather than scrolling past.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;fund=false&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Removes the "N packages are looking for funding" message from every install. Cosmetic, but it matters. When your install output is 80% funding notices, the warnings that actually matter (audit, deprecation, peer dependency conflicts) get buried. Signal hygiene is a security layer.&lt;/p&gt;

&lt;p&gt;Verify your config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm config get ignore-scripts save-exact audit-level fund
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output: &lt;code&gt;true&lt;/code&gt;, &lt;code&gt;true&lt;/code&gt;, &lt;code&gt;moderate&lt;/code&gt;, &lt;code&gt;false&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why does this work for most npm attacks?
&lt;/h2&gt;

&lt;p&gt;The dominant payload pattern in 2025 and 2026 npm attacks is a lifecycle script that runs during install. Disabling those scripts breaks the default delivery vehicle.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attack vector&lt;/th&gt;
&lt;th&gt;Real example&lt;/th&gt;
&lt;th&gt;Line that blocks it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;postinstall RAT&lt;/td&gt;
&lt;td&gt;
&lt;a href="mailto:axios@1.14.1"&gt;axios@1.14.1&lt;/a&gt; (2026)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ignore-scripts=true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Silent minor/patch hijack&lt;/td&gt;
&lt;td&gt;Maintainer account takeover&lt;/td&gt;
&lt;td&gt;&lt;code&gt;save-exact=true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Known CVE buried as warning&lt;/td&gt;
&lt;td&gt;Any reported advisory&lt;/td&gt;
&lt;td&gt;&lt;code&gt;audit-level=moderate&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warning fatigue hiding alerts&lt;/td&gt;
&lt;td&gt;Every install, all day&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fund=false&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What does this NOT protect against?
&lt;/h2&gt;

&lt;p&gt;Honest boundaries. This config blocks the most common vector, not every vector:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Malicious code in the package's main module.&lt;/strong&gt; Anything that runs on &lt;code&gt;require()&lt;/code&gt; or &lt;code&gt;import&lt;/code&gt; is unaffected by &lt;code&gt;ignore-scripts&lt;/code&gt;. If the package is actively imported by your code, the payload runs at runtime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Toolchain exploits.&lt;/strong&gt; &lt;code&gt;--ignore-scripts&lt;/code&gt; stops npm lifecycle hooks, but git still runs during install, and external binaries still execute if the install process invokes them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typosquatting and slopsquatting.&lt;/strong&gt; AI assistants sometimes hallucinate package names that attackers have preemptively registered. OWASP flags this as the fastest-growing npm attack class in 2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Packages already in &lt;code&gt;node_modules&lt;/code&gt;.&lt;/strong&gt; The four lines only protect future installs. Clean rebuild recommended: &lt;code&gt;rm -rf node_modules package-lock.json &amp;amp;&amp;amp; npm install&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Roughly 20% of recent high-impact npm malware executes outside lifecycle scripts through runtime &lt;code&gt;require()&lt;/code&gt; or compromised main modules. Treat &lt;code&gt;.npmrc&lt;/code&gt; as necessary-but-not-sufficient.&lt;/p&gt;




&lt;h2&gt;
  
  
  What breaks when you set &lt;code&gt;ignore-scripts=true&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;A small set of packages genuinely need lifecycle scripts to compile native binaries or download platform assets. The usual suspects: &lt;code&gt;bcrypt&lt;/code&gt;, &lt;code&gt;node-sass&lt;/code&gt;, &lt;code&gt;sharp&lt;/code&gt;, &lt;code&gt;esbuild&lt;/code&gt;, &lt;code&gt;puppeteer&lt;/code&gt;, and &lt;code&gt;canvas&lt;/code&gt;. You will notice immediately because they fail loud, not silent.&lt;/p&gt;

&lt;p&gt;Fix per-package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install normally, then rebuild the one package that needs it&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;sharp
npm rebuild sharp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For projects with multiple native-compile dependencies, use an allow-list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--save-dev&lt;/span&gt; @lavamoat/allow-scripts
npx allow-scripts auto
npx allow-scripts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Why it needs scripts&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bcrypt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Native C++ compilation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sharp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Binary download + native bindings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;node-sass&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;LibSass native build&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;esbuild&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Platform binary download&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;puppeteer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Chromium download&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;canvas&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cairo/Pango native bindings&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Ninety percent of projects never hit any of these. The ones that do fail on the first CI run after the config change, and you fix them once.&lt;/p&gt;




&lt;h2&gt;
  
  
  Upgrading to hook-based defense
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;~/.npmrc&lt;/code&gt; is the user-scoped floor. The next layer is process-level enforcement: intercepting every &lt;code&gt;npm install&lt;/code&gt; Claude Code tries to run, auditing it before the command executes, and blocking the call if it's missing &lt;code&gt;--ignore-scripts&lt;/code&gt; or pointing at a new unreviewed dependency.&lt;/p&gt;

&lt;p&gt;With 41% of all code now AI-generated or AI-assisted, the agent — not the human — is the primary &lt;code&gt;npm install&lt;/code&gt; trigger. That's a &lt;code&gt;PreToolUse&lt;/code&gt; hook in &lt;code&gt;.claude/settings.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://shipwithai.io/blog/claude-code-npm-supply-chain-hooks/" rel="noopener noreferrer"&gt;hook post&lt;/a&gt; covers the three-layer setup: PreToolUse audit, PostToolUse lockfile diff, and CLAUDE.md enforcement rules.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Will &lt;code&gt;ignore-scripts=true&lt;/code&gt; break my builds?
&lt;/h3&gt;

&lt;p&gt;Usually no for pure-JavaScript dependencies, which is 90%+ of a typical React or Node project. Yes for native-compile packages like &lt;code&gt;bcrypt&lt;/code&gt;, &lt;code&gt;sharp&lt;/code&gt;, and &lt;code&gt;esbuild&lt;/code&gt;. Fix is &lt;code&gt;npm rebuild &amp;lt;pkg&amp;gt;&lt;/code&gt; per package or &lt;code&gt;@lavamoat/allow-scripts&lt;/code&gt; for a team-wide allow-list.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I commit &lt;code&gt;.npmrc&lt;/code&gt; to my repo?
&lt;/h3&gt;

&lt;p&gt;Personal config goes in &lt;code&gt;~/.npmrc&lt;/code&gt; (never committed, user defaults). Project-level &lt;code&gt;.npmrc&lt;/code&gt; at the repo root can be committed as long as it contains no secrets. Registry auth tokens belong only in &lt;code&gt;~/.npmrc&lt;/code&gt;, never in the repo.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this work for pnpm and yarn?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;.npmrc&lt;/code&gt; is shared. pnpm reads &lt;code&gt;ignore-scripts=true&lt;/code&gt; natively. Yarn classic also reads &lt;code&gt;.npmrc&lt;/code&gt;. Yarn Berry uses &lt;code&gt;.yarnrc.yml&lt;/code&gt; instead, and the equivalent setting is &lt;code&gt;enableScripts: false&lt;/code&gt;. Bun also honors &lt;code&gt;.npmrc&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is &lt;code&gt;npm audit&lt;/code&gt; still useful if I set &lt;code&gt;audit-level=moderate&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;Yes, and it becomes more useful. The flag changes audit from warn-mode to block-mode on CVEs at moderate severity or higher. Audit still only catches &lt;em&gt;published&lt;/em&gt; CVEs. For zero-days, you need the hook layer from the &lt;a href="https://shipwithai.io/blog/claude-code-npm-supply-chain-hooks/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-npm-supply-chain-hooks" rel="noopener noreferrer"&gt;hooks post&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Open a terminal and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"ignore-scripts=true&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;save-exact=true&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;audit-level=moderate&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;fund=false"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.npmrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify with &lt;code&gt;npm config get ignore-scripts save-exact audit-level fund&lt;/code&gt;. Total time: under 30 seconds.&lt;/p&gt;

&lt;p&gt;Ready for the process-level defense? Read → &lt;a href="https://shipwithai.io/blog/claude-code-npm-supply-chain-hooks/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-npm-supply-chain-hooks" rel="noopener noreferrer"&gt;Stop npm Supply Chain Attacks with Claude Code Hooks&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/npm-install-security-30-seconds/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-npm-install-security-30-seconds" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>security</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Claude Code Forgets Everything Between Sessions. MEMORY.md Fixes That</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Sat, 02 May 2026 01:00:00 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/claude-code-forgets-everything-between-sessions-memorymd-fixes-that-1flb</link>
      <guid>https://dev.to/shipwithaiio/claude-code-forgets-everything-between-sessions-memorymd-fixes-that-1flb</guid>
      <description>&lt;h2&gt;
  
  
  Claude Code resets context every session. MEMORY.md gives it persistent memory of your project's evolving state in a 200-line index file. Setup takes 5 minutes. One prompt at the end of each session keeps it current.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Session 1: "This project uses Clerk for auth, not NextAuth."
# Session 2: "As I mentioned, we use Clerk..."
# Session 3: "We migrated to Clerk in March. Stop suggesting NextAuth."
# Session 4: "READ THE CLAUDE.MD. We use Clerk."
# Session 5: "..."
# Session 6: *opens CLAUDE.md, adds it in bold, all caps*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sound familiar? Developers spend 10-15 minutes per session rebuilding context that was clear yesterday (&lt;a href="https://cleanaim.com/silent-wiring/problems/context-loss/" rel="noopener noreferrer"&gt;CleanAim, 2026&lt;/a&gt;). Over a month of daily sessions, that's 5-10 hours of repeating yourself.&lt;/p&gt;

&lt;p&gt;The fix is one file. MEMORY.md is a lightweight index that Claude Code reads at session start. Not a conversation log. Not a code dump. A table of contents for your project's current state.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://shipwithai.io/blog/why-claude-md-matters/" rel="noopener noreferrer"&gt;CLAUDE.md holds your static rules&lt;/a&gt; (conventions, build commands, constraints). MEMORY.md holds your evolving state (recent migrations, active decisions, what changed last week). They're both part of &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/?utm_source=copy&amp;amp;utm_medium=subtack&amp;amp;utm_campaign=blog-harness-engineering-claude-code" rel="noopener noreferrer"&gt;Layer 1 in the harness engineering framework&lt;/a&gt;, and most developers only have the first half.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why does Claude Code forget everything between sessions?
&lt;/h2&gt;

&lt;p&gt;Claude Code starts each session with a fresh context window. It reads CLAUDE.md and MEMORY.md at startup, but nothing else carries over from previous conversations. The &lt;code&gt;--continue&lt;/code&gt; flag resumes one specific conversation, but decisions spread across multiple sessions are lost unless you write them down.&lt;/p&gt;

&lt;p&gt;Here's the gap most developers hit:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;CLAUDE.md&lt;/th&gt;
&lt;th&gt;MEMORY.md&lt;/th&gt;
&lt;th&gt;--continue&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Persists across sessions&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Last session only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content type&lt;/td&gt;
&lt;td&gt;Static rules&lt;/td&gt;
&lt;td&gt;Evolving state&lt;/td&gt;
&lt;td&gt;Full conversation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Who updates it&lt;/td&gt;
&lt;td&gt;You (manually)&lt;/td&gt;
&lt;td&gt;You + Claude&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Size limit&lt;/td&gt;
&lt;td&gt;No hard limit&lt;/td&gt;
&lt;td&gt;200 lines / 25KB&lt;/td&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Conventions, constraints&lt;/td&gt;
&lt;td&gt;Decisions, migrations, active work&lt;/td&gt;
&lt;td&gt;Resuming interrupted work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;CLAUDE.md doesn't change session to session. It says "use Vitest for tests" and that's true tomorrow too. But "we migrated from Prisma to Drizzle last Tuesday" is evolving state. It matters for a month, then it's old news. That kind of context belongs in MEMORY.md.&lt;/p&gt;

&lt;p&gt;Claude Code does have auto memory since v2.0.64. The AutoDream feature consolidates learnings after 24+ hours and 5+ sessions. But auto memory captures broad patterns, not your specific decision to use TanStack Query over SWR on April 5th. MEMORY.md is the manual complement where you control exactly what persists.&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you set up MEMORY.md in 5 minutes?
&lt;/h2&gt;

&lt;p&gt;Create a file called MEMORY.md in your project root with 5-10 pointer entries, each under 150 characters. Each entry points to where information lives in your project, not the information itself. Claude Code loads this file automatically at session start.&lt;/p&gt;

&lt;p&gt;Here's a realistic template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Project State (updated 2026-04-17)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Auth&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;src/lib/auth/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; - Clerk since March 2026. Migrated from NextAuth.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;prisma/schema.prisma&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; - PostgreSQL on Supabase. Drizzle ORM.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Deploy&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;docs/deploy.md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; - Vercel preview for PRs, production on main.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Testing&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;vitest.config.ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; - Vitest unit + Playwright E2E. 80% min.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;API&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;src/app/api/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; - Server Actions for mutations. API routes for webhooks only.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Payments&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;src/lib/stripe/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; - Stripe checkout. Webhooks at /api/webhooks/stripe.
&lt;span class="p"&gt;-&lt;/span&gt; [WIP] Dashboard redesign in progress. Branch: feature/dashboard-v2.
&lt;span class="p"&gt;-&lt;/span&gt; [Bug] Rate limiter false positives on /api/search. Issue #234.
&lt;span class="p"&gt;-&lt;/span&gt; [Decision] Chose TanStack Query over SWR, April 5. See docs/decisions/004.md.
&lt;span class="p"&gt;-&lt;/span&gt; [Deprecated] Old /api/v1/ routes. Remove after May 1 deadline.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each entry is a pointer. "Clerk since March 2026" tells Claude the auth system and when it changed. If Claude needs details, it reads &lt;code&gt;src/lib/auth/&lt;/code&gt;. The entry doesn't dump the auth implementation into MEMORY.md.&lt;/p&gt;

&lt;p&gt;One critical constraint: &lt;strong&gt;MEMORY.md is capped at 200 lines or 25KB, whichever is smaller.&lt;/strong&gt; Entries beyond line 200 are silently dropped with no warning. Keep it lean.&lt;/p&gt;




&lt;h2&gt;
  
  
  What makes a good MEMORY.md entry vs a bad one?
&lt;/h2&gt;

&lt;p&gt;Good entries are pointers under 150 characters that tell Claude where to look. Bad entries dump content that belongs in source files. The ETH Zurich AGENTbench study found that longer context files actually reduce agent success by ~3% while increasing costs by up to 19% (&lt;a href="https://www.marktechpost.com/2026/02/25/new-eth-zurich-study-proves-your-ai-coding-agents-are-failing-because-your-agents-md-files-are-too-detailed/" rel="noopener noreferrer"&gt;Gloaguen et al., 2026&lt;/a&gt;). Less is more.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bad Entry (content dump)&lt;/th&gt;
&lt;th&gt;Good Entry (pointer)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Auth uses Clerk with middleware at src/middleware.ts that checks session cookies and redirects unauthenticated users to /sign-in with a custom error page&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;[Auth](src/lib/auth/) - Clerk since March 2026. See middleware.ts.&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Database is PostgreSQL 16 on Supabase with connection pooling via pgBouncer, schema managed by Drizzle ORM using push strategy for migrations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;[DB](prisma/schema.prisma) - PostgreSQL/Supabase, Drizzle ORM.&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;The old API routes at /api/v1/users, /api/v1/products, and /api/v1/orders are deprecated and scheduled for removal in the next sprint after May 1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;[Deprecated] /api/v1/ routes. Remove after May 1.&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The bad entries average 25-30 words. The good entries average 8-12 words. Both give Claude the same actionable information.&lt;/p&gt;

&lt;p&gt;Why do short pointers work better? 80% of tokens in typical agent sessions are wasted on "finding things" rather than doing things. Pointers eliminate the finding. Claude reads "Clerk since March 2026" and goes straight to the auth code instead of spending 3 turns figuring out the auth stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Categories that belong in MEMORY.md:&lt;/strong&gt; decisions made (with dates), active migrations or refactors, work in progress (branch names, issue numbers), known bugs (with tracking links), deprecation deadlines, recent architecture changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does NOT belong:&lt;/strong&gt; static rules (→ &lt;a href="https://shipwithai.io/blog/why-claude-md-matters/" rel="noopener noreferrer"&gt;CLAUDE.md&lt;/a&gt;), code snippets (→ source files), architecture docs (→ &lt;code&gt;docs/&lt;/code&gt; directory), dangerous action prevention (→ &lt;a href="https://shipwithai.io/blog/claude-code-hook-decision-guide/" rel="noopener noreferrer"&gt;Hooks&lt;/a&gt;).&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you keep MEMORY.md current without complex hooks?
&lt;/h2&gt;

&lt;p&gt;At the end of each session, ask Claude one prompt. Claude reads the current MEMORY.md, adds or updates relevant entries, removes stale ones, and keeps it under the 200-line limit. No hooks, no automation, no third-party tools. One prompt, ten seconds.&lt;/p&gt;

&lt;p&gt;Here's the prompt (copy-paste ready):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Update MEMORY.md with what you learned this session: new decisions,
changed architecture, resolved bugs, anything future sessions should
know. Keep entries under 150 chars. Remove anything no longer relevant.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire workflow. Claude knows what changed because it just did the work. It writes the entries in the pointer format it already sees in the file. You review the diff, approve or tweak, and the next session starts with updated context.&lt;/p&gt;

&lt;p&gt;Do this at the end of sessions where something meaningful changed. Skip it for quick lookups or small fixes where nothing new was decided.&lt;/p&gt;

&lt;p&gt;Why manual beats auto-update hooks for this: hooks add complexity, can generate noisy entries, and aren't proven for memory quality. The manual prompt lets you review what gets added. You stay in control of what your agent remembers.&lt;/p&gt;




&lt;h2&gt;
  
  
  When should you prune MEMORY.md?
&lt;/h2&gt;

&lt;p&gt;Prune monthly, same cadence as &lt;a href="https://shipwithai.io/blog/claude-md-failure-log-pattern/" rel="noopener noreferrer"&gt;CLAUDE.md pruning&lt;/a&gt;. Remove entries older than 30 days that are no longer relevant. Graduate stable entries to CLAUDE.md. The 200-line limit is hard, and entries beyond it vanish silently.&lt;/p&gt;

&lt;p&gt;Four questions per entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For each MEMORY.md entry, ask:
1. Still true? → NO → Delete it
2. Stable for 30+ days? → YES → Graduate to CLAUDE.md
3. Duplicate of CLAUDE.md? → YES → Remove from MEMORY.md
4. Would a new teammate need this? → NO → Delete it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The graduation pattern is important. "Migrated from Prisma to Drizzle, April 2" is a MEMORY.md entry for the first month. After 30 days, the migration is old news. Graduate it to CLAUDE.md as a static rule: "ORM: Drizzle (not Prisma)." Then delete it from MEMORY.md.&lt;/p&gt;

&lt;p&gt;If your MEMORY.md grows past 150 lines, you're overdue for pruning. &lt;a href="https://www.humanlayer.dev/blog/writing-a-good-claude-md" rel="noopener noreferrer"&gt;HumanLayer keeps their CLAUDE.md under 60 lines&lt;/a&gt; for the same reason: fewer lines means higher signal per line.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is MEMORY.md in Claude Code?
&lt;/h3&gt;

&lt;p&gt;MEMORY.md is a project-level index file that Claude Code reads at the start of every session. It provides persistent memory of your project's evolving state: recent decisions, active work, migrations, and known issues. Each entry should be a pointer under 150 characters. The file is capped at 200 lines or 25KB.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between CLAUDE.md and MEMORY.md?
&lt;/h3&gt;

&lt;p&gt;CLAUDE.md holds static rules that rarely change: tech stack, naming conventions, build commands, constraints. MEMORY.md holds evolving state that changes between sessions: recent migrations, active decisions, work in progress, known bugs. Think of CLAUDE.md as the constitution and MEMORY.md as the changelog.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Claude Code have auto memory?
&lt;/h3&gt;

&lt;p&gt;Yes. Since v2.0.64, Claude Code has auto memory (AutoDream) that consolidates learnings after 24+ hours and 5+ sessions. It captures broad patterns automatically. But it doesn't track project-specific decisions like "chose TanStack Query over SWR on April 5." Use MEMORY.md for critical project state and let auto memory handle general patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many lines can MEMORY.md have?
&lt;/h3&gt;

&lt;p&gt;200 lines or 25KB, whichever is smaller. Entries beyond line 200 are silently dropped with no warning. Keep your file under 150 lines and prune monthly. Each entry should be a pointer under 150 characters. If your MEMORY.md consistently exceeds 150 lines, graduate stable entries to CLAUDE.md and delete resolved items.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Create &lt;code&gt;MEMORY.md&lt;/code&gt; in your project root. Write 5 pointer entries covering: auth, database, deploy, testing, and one active decision. Keep each under 150 characters. Start a new Claude Code session and verify it references your entries. At the end, run: "Update MEMORY.md with what you learned this session."&lt;/p&gt;

&lt;p&gt;How many lines is your MEMORY.md? Drop it in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/claude-code-memory-md-fix/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-memory-md-fix" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>claude</category>
    </item>
    <item>
      <title>Harness Engineering Is the New Senior Developer Skill (Here's Why)</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Thu, 30 Apr 2026 01:00:00 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/harness-engineering-is-the-new-senior-developer-skill-heres-why-4hef</link>
      <guid>https://dev.to/shipwithaiio/harness-engineering-is-the-new-senior-developer-skill-heres-why-4hef</guid>
      <description>&lt;p&gt;The highest-leverage activity for senior engineers in 2026 isn't writing code. It's building the 5-layer harness (memory, tools, permissions, hooks, observability) that makes every team member's AI output reliable. One harness, committed to version control, serves 10 developers.&lt;/p&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;84% of developers use AI coding tools.
29% trust what they produce.

That 55-point gap is the senior engineer's new job.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not a new model. Not a better prompt. A better system around the model.&lt;/p&gt;

&lt;p&gt;The gap between adoption and trust exists because developers adopted AI tools without building the systems to verify, constrain, and correct their output. The tool works fine. The harness is missing. And building that harness is the new leverage point for senior engineers.&lt;/p&gt;

&lt;p&gt;This post is the capstone of the &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/" rel="noopener noreferrer"&gt;Harness Engineering series&lt;/a&gt;. Previous posts covered each layer of the system. This one answers the career question: why should you, specifically, care about any of it?&lt;/p&gt;




&lt;h2&gt;
  
  
  Why is AI adoption high but trust low?
&lt;/h2&gt;

&lt;p&gt;Developer AI tool adoption reached 84% in 2025, with 51% using AI tools daily (&lt;a href="https://survey.stackoverflow.co/2025/ai" rel="noopener noreferrer"&gt;Stack Overflow Developer Survey, 2025&lt;/a&gt;). But trust in AI-generated code dropped from 40% to 29% over the same period (&lt;a href="https://shiftmag.dev/state-of-code-2025-7978/" rel="noopener noreferrer"&gt;ShiftMag, 2025&lt;/a&gt;). Adoption climbed while trust fell. That divergence tells you everything.&lt;/p&gt;

&lt;p&gt;The pattern looks like this: developer installs AI tool, generates code, eyeballs it, ships it. Works for prototypes. Breaks in production. After the third rollback, trust erodes. After the fifth, the team lead starts asking why they're paying for this.&lt;/p&gt;

&lt;p&gt;The problem isn't the model. The model generates reasonable code most of the time. The problem is that nothing verifies the output, nothing constrains the dangerous actions, and nothing remembers what went wrong last session.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without harness:
Developer → AI generates code → eyeball it → ship it → hope
Trust trajectory: down

With harness:
Developer → AI generates code → hooks verify → constraints block bad actions → memory prevents repeat mistakes
Trust trajectory: up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tool is the same in both cases. The system around it isn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where does senior engineer leverage live now?
&lt;/h2&gt;

&lt;p&gt;The leverage point for senior engineers has shifted four times in six years. Each shift multiplied output and made the previous skill table stakes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Years&lt;/th&gt;
&lt;th&gt;What You Optimize&lt;/th&gt;
&lt;th&gt;Your Leverage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Write good code&lt;/td&gt;
&lt;td&gt;Pre-2023&lt;/td&gt;
&lt;td&gt;Algorithms, architecture&lt;/td&gt;
&lt;td&gt;Your typing speed and design skill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write good prompts&lt;/td&gt;
&lt;td&gt;2023-2024&lt;/td&gt;
&lt;td&gt;Instructions to the model&lt;/td&gt;
&lt;td&gt;How well you phrase requests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Curate good context&lt;/td&gt;
&lt;td&gt;2025&lt;/td&gt;
&lt;td&gt;What the model sees&lt;/td&gt;
&lt;td&gt;CLAUDE.md, context windows, RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build good harnesses&lt;/td&gt;
&lt;td&gt;2026&lt;/td&gt;
&lt;td&gt;The system around the model&lt;/td&gt;
&lt;td&gt;Hooks, verification, constraints, memory&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each era didn't replace the previous one. It absorbed it. You still need to write good code. You still need good prompts. You still need good context. But the leverage multiplier is now in the harness layer, not the layers below it.&lt;/p&gt;

&lt;p&gt;LangChain proved this with numbers. Same model (gpt-5.2-codex), same prompts, same context window. Three harness changes: context injection, self-verification loops, and compute budget management. Result: 52.8% to 66.5% on &lt;a href="https://www.vals.ai/benchmarks/terminal-bench-2" rel="noopener noreferrer"&gt;Terminal Bench 2.0&lt;/a&gt;, a jump from Top 30 to Top 5.&lt;/p&gt;

&lt;p&gt;The model was never the bottleneck. The harness was.&lt;/p&gt;




&lt;h2&gt;
  
  
  What does a 5-layer harness system look like?
&lt;/h2&gt;

&lt;p&gt;A production harness has five layers: memory, tools, permissions, hooks, and observability. Each layer compounds the reliability of the layers below it. Building them in order (1, then 4, then 2, then 3, then 5) produces the fastest ROI. Most developers stop at Layer 1.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Memory&lt;/td&gt;
&lt;td&gt;Persistent context&lt;/td&gt;
&lt;td&gt;"Use Clerk not NextAuth" persists across sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Tools&lt;/td&gt;
&lt;td&gt;Extended capabilities&lt;/td&gt;
&lt;td&gt;MCP server for database queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Permissions&lt;/td&gt;
&lt;td&gt;Safety boundaries&lt;/td&gt;
&lt;td&gt;Block &lt;code&gt;rm -rf&lt;/code&gt;, allow &lt;code&gt;npm test&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Hooks&lt;/td&gt;
&lt;td&gt;Verification loops&lt;/td&gt;
&lt;td&gt;PostToolUse runs ESLint after every file edit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Observability&lt;/td&gt;
&lt;td&gt;Audit + cost tracking&lt;/td&gt;
&lt;td&gt;Token cost alerts at $2/session&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here's why the order matters. Memory (Layer 1) is free. You create a CLAUDE.md file with your project's rules, and every session starts with the right context. That alone eliminates the "explaining Clerk for the 6th time" problem.&lt;/p&gt;

&lt;p&gt;Hooks (Layer 4) come next because they enforce rules that memory can only suggest. A CLAUDE.md line saying "run tests before committing" gets ignored under pressure. A &lt;a href="https://shipwithai.io/blog/claude-code-self-verification-loop/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-self-verification-loop" rel="noopener noreferrer"&gt;PostToolUse hook&lt;/a&gt; that runs &lt;code&gt;npx eslint --quiet&lt;/code&gt; after every file edit cannot be bypassed. Memory advises. Hooks enforce.&lt;/p&gt;

&lt;p&gt;The rest fills in from there. Tools extend what the agent can do. Permissions restrict what it's allowed to do. Observability tells you what it actually did.&lt;/p&gt;

&lt;p&gt;One afternoon of setup. Every session after that is more reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  How does one harness multiply a team of 10?
&lt;/h2&gt;

&lt;p&gt;A harness committed to version control gives every developer on the team the same verification loops, the same constraints, and the same memory. One staff engineer's afternoon of harness work replaces 10 developers' daily context-rebuilding. OpenAI's Codex team shipped 1,500 PRs with just 3 engineers using this principle (&lt;a href="https://martinfowler.com/articles/exploring-gen-ai/harness-engineering.html" rel="noopener noreferrer"&gt;Fowler, 2026&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Three levels of multiplication:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Individual harness&lt;/strong&gt;: Your CLAUDE.md, your hooks, your MEMORY.md. It lives in the repo. Every &lt;code&gt;git clone&lt;/code&gt; inherits it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.claude/
    settings.json      # Hook configs, permission rules
CLAUDE.md              # Static rules, constraints, failure log
MEMORY.md              # Evolving state, active decisions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Team harness&lt;/strong&gt;: Shared MCP servers, shared hook configs, shared MEMORY.md entries for active migrations. When you add a constraint after a production incident, every team member gets it on their next &lt;code&gt;git pull&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Organizational harness&lt;/strong&gt;: Standard hook templates across repositories. Compliance hooks that prevent secrets in commits and block force pushes to main. The security team writes it once, every repo inherits it.&lt;/p&gt;

&lt;p&gt;The multiplication math is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without harness:
10 developers x 15 min/session rebuilding context = 2.5 hours/day wasted
Monthly: ~50 hours lost

With harness:
Setup: 4 hours (one staff engineer, one afternoon)
Daily savings: 2.5 hours
ROI positive: day 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is why staff engineer job descriptions at major tech companies increasingly mention "developer experience" and "tooling." Harness engineering is developer experience for the AI era. You're not writing code. You're building the system that makes everyone else's AI-generated code reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What should you review in a harness instead of just code?
&lt;/h2&gt;

&lt;p&gt;Code review catches bugs in implementation. Harness review catches bugs in the system that produces implementation. When AI-authored code reached 41% of all new code in 2026 (&lt;a href="https://modall.ca/blog/ai-in-software-development-trends-statistics" rel="noopener noreferrer"&gt;Modall, 2026&lt;/a&gt;), reviewing the system that generates it became as important as reviewing the code itself.&lt;/p&gt;

&lt;p&gt;Here's a harness review checklist. Use it alongside your existing code review process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Harness Review Checklist:

Memory:
[ ] CLAUDE.md reflects current tech stack and constraints
[ ] MEMORY.md has been pruned in the last 30 days
[ ] No stale entries pointing to removed files or old decisions

Hooks:
[ ] PostToolUse verification exists for file edits
[ ] Stop hook exists for destructive commands
[ ] Hook configs are committed to version control (not local-only)

Constraints:
[ ] Allowed commands list matches CI/CD requirements
[ ] No wildcard permissions on production-affecting tools
[ ] Sensitive files (.env, credentials) excluded from agent access

Cost:
[ ] Session cost alerts configured
[ ] Context window usage monitored
[ ] Unnecessary files excluded from context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add this checklist to your PR template. It takes 2 minutes to run and catches the class of bugs that code review can't see: configuration drift, missing enforcement, stale context.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build your first team harness
&lt;/h2&gt;

&lt;p&gt;The fastest path from zero to working team harness takes six steps and about 30 minutes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pick one repo your team uses daily&lt;/li&gt;
&lt;li&gt;Audit the CLAUDE.md: does it reflect current tech stack? Add 3 constraints from recent bugs using the &lt;a href="https://shipwithai.io/blog/claude-md-failure-log-pattern/" rel="noopener noreferrer"&gt;failure log pattern&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Add one PostToolUse hook: ESLint after file edits. Copy the config from the &lt;a href="https://shipwithai.io/blog/claude-code-self-verification-loop/" rel="noopener noreferrer"&gt;verification loop post&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create MEMORY.md with 5 pointer entries for active work&lt;/li&gt;
&lt;li&gt;Commit the harness files: &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;MEMORY.md&lt;/code&gt;, &lt;code&gt;.claude/settings.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run the harness review checklist above in your next PR review&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every &lt;code&gt;git pull&lt;/code&gt; now gives your entire team the same system. One afternoon of setup. Compounding returns from day 2.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is harness engineering for AI coding agents?
&lt;/h3&gt;

&lt;p&gt;Harness engineering is the practice of building the system around an AI model (memory, tools, permissions, hooks, observability) to make the agent reliable in production. The term was formalized by &lt;a href="https://martinfowler.com/articles/exploring-gen-ai/harness-engineering.html" rel="noopener noreferrer"&gt;Birgitta Bockeler on Martin Fowler's site&lt;/a&gt; and &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; in early 2026. The core formula: Agent = Model + Harness. The model is a commodity. The harness is your competitive advantage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do senior engineers still write code with AI agents?
&lt;/h3&gt;

&lt;p&gt;Yes. But the leverage point has shifted. Senior engineers spend more time building harnesses (CLAUDE.md, hooks, verification loops, MCP servers) that make every team member's AI output more reliable. Writing code is still part of the job. It's just no longer the highest-leverage activity.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does it take to set up a Claude Code harness?
&lt;/h3&gt;

&lt;p&gt;A basic harness (CLAUDE.md + one verification hook + MEMORY.md) takes about 30 minutes. A full 5-layer system takes 2-4 hours. For a team of 3+ developers saving 15 minutes per session each, the ROI is positive within 2 days.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can harness engineering work for any AI coding tool?
&lt;/h3&gt;

&lt;p&gt;The principles (persistent memory, verification loops, constraints, observability) apply to any agent. The implementation differs by tool. Claude Code has hooks and CLAUDE.md. GitHub Copilot has &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;. Cursor has &lt;code&gt;.cursorrules&lt;/code&gt;. The harness pattern is universal. The config files are tool-specific.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Pick one repo, add CLAUDE.md + one PostToolUse hook + MEMORY.md. Commit. Every &lt;code&gt;git pull&lt;/code&gt; gives your team the same harness. Setup: 30 minutes. ROI: day 2.&lt;/p&gt;

&lt;p&gt;What does your team's harness look like today? Drop it in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/harness-engineering-senior-developer-guide/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-harness-engineering-senior-developer-guide" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>claude</category>
    </item>
    <item>
      <title>How to Build a Self-Verification Loop in Claude Code (3 Layers, 20 Minutes)</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Tue, 28 Apr 2026 01:00:00 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/how-to-build-a-self-verification-loop-in-claude-code-3-layers-20-minutes-m1p</link>
      <guid>https://dev.to/shipwithaiio/how-to-build-a-self-verification-loop-in-claude-code-3-layers-20-minutes-m1p</guid>
      <description>&lt;p&gt;Claude Code's Stop hook blocks the agent from finishing until verification passes. Combine it with PostToolUse feedback injection to build a 3-layer verification loop (syntax, intent, regression) in 20 minutes. The result: the agent can't say "done" until it actually is.&lt;/p&gt;




&lt;p&gt;Two hook setups. Same Claude Code session. Different outcomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# What most devs have: a formatting hook&lt;/span&gt;
&lt;span class="c"&gt;# PostToolUse: runs prettier after file edits&lt;/span&gt;

&lt;span class="c"&gt;# What this post builds: a verification loop&lt;/span&gt;
&lt;span class="c"&gt;# PostToolUse: checks syntax on every file change&lt;/span&gt;
&lt;span class="c"&gt;# Stop: blocks completion until tests pass + intent verified&lt;/span&gt;
&lt;span class="c"&gt;# Result: agent can't say "done" until it actually is&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first catches formatting. The second catches logic errors, missed requirements, and broken tests before the agent claims it's finished.&lt;/p&gt;

&lt;p&gt;LangChain's &lt;code&gt;PreCompletionChecklistMiddleware&lt;/code&gt; is the most documented example of this pattern. It contributed to a &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/" rel="noopener noreferrer"&gt;13.7-point benchmark gain using harness changes alone&lt;/a&gt;. This post builds the Claude Code equivalent using hooks.&lt;/p&gt;




&lt;h2&gt;
  
  
  What does "verification" actually mean for an AI coding agent?
&lt;/h2&gt;

&lt;p&gt;Verification means checking that the agent's output matches the task's intent, not just that the code compiles. Only 3% of developers report high trust in AI-generated code (&lt;a href="https://www.qodo.ai/reports/state-of-ai-code-quality/" rel="noopener noreferrer"&gt;Qodo, State of AI Code Quality, 2025&lt;/a&gt;). Most developers stop at syntax checks (lint, format, type-check). Production verification needs two more layers.&lt;/p&gt;

&lt;p&gt;Three verification layers, each catching a different class of failure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Checks&lt;/th&gt;
&lt;th&gt;Catches&lt;/th&gt;
&lt;th&gt;Misses&lt;/th&gt;
&lt;th&gt;Hook&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Syntax&lt;/td&gt;
&lt;td&gt;Code compiles, formats&lt;/td&gt;
&lt;td&gt;Typos, type errors&lt;/td&gt;
&lt;td&gt;Logic bugs&lt;/td&gt;
&lt;td&gt;PostToolUse command&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Intent&lt;/td&gt;
&lt;td&gt;Output matches request&lt;/td&gt;
&lt;td&gt;Wrong approach, missing features&lt;/td&gt;
&lt;td&gt;Regressions&lt;/td&gt;
&lt;td&gt;Stop prompt/agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Regression&lt;/td&gt;
&lt;td&gt;Existing tests pass&lt;/td&gt;
&lt;td&gt;Broken functionality, side effects&lt;/td&gt;
&lt;td&gt;Untested requirements&lt;/td&gt;
&lt;td&gt;Stop command&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;"Run the tests" only covers Layer 3. Tests verify what you wrote tests for, not what you asked the agent to do. If you asked Claude to add pagination and it added sorting instead, every test still passes. Layer 2 catches that.&lt;/p&gt;

&lt;p&gt;Spotify's Honk system demonstrates this at scale: 1,500+ PRs merged through verification loops, handling roughly 50% of all PRs automatically (&lt;a href="https://engineering.atspotify.com/2025/12/feedback-loops-background-coding-agents-part-3" rel="noopener noreferrer"&gt;Spotify Engineering, Dec 2025&lt;/a&gt;). Their key design choice: the agent doesn't know how verification works. It just gets pass/fail feedback. That separation keeps the agent focused on the task, not on gaming the verifier.&lt;/p&gt;




&lt;h2&gt;
  
  
  How does Claude Code's Stop hook work?
&lt;/h2&gt;

&lt;p&gt;The Stop hook fires every time Claude finishes responding. Exit code 2 blocks Claude from stopping and forces it to continue working. This single mechanism prevents the agent from saying "done" when it isn't.&lt;/p&gt;

&lt;p&gt;Here's the critical part most tutorials skip: the &lt;code&gt;stop_hook_active&lt;/code&gt; field.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# .claude/hooks/verify-before-stop.sh&lt;/span&gt;
&lt;span class="nv"&gt;INPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# CRITICAL: prevent infinite verification loops&lt;/span&gt;
&lt;span class="c"&gt;# When true, Claude is already in a forced-continuation state&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.stop_hook_active'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"true"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0  &lt;span class="c"&gt;# Let Claude stop — don't loop forever&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Run tests — block stop if they fail&lt;/span&gt;
npm &lt;span class="nb"&gt;test &lt;/span&gt;2&amp;gt;&amp;amp;1 &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Tests failing. Fix before completing."&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without checking &lt;code&gt;stop_hook_active&lt;/code&gt;, the hook blocks every stop attempt. Claude fixes the tests, tries to stop, gets blocked again, fixes more, tries to stop, gets blocked again. Infinite loop. Always check this field.&lt;/p&gt;

&lt;p&gt;Two ways to send feedback back to the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exit 2 + stderr&lt;/strong&gt;: The stderr message appears as feedback. Claude reads it, acts on it, then tries to stop again.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit 0 + JSON with &lt;code&gt;additionalContext&lt;/code&gt;&lt;/strong&gt;: Inject context into the agent's next turn without blocking. Good for warnings that don't require immediate action.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feedback via &lt;code&gt;additionalContext&lt;/code&gt; is capped at 10,000 characters. If your test output is longer, filter it. &lt;a href="https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents" rel="noopener noreferrer"&gt;HumanLayer learned this the hard way&lt;/a&gt;: 4,000 lines of passing tests flooded the context window and the agent lost track of the task. Surface failures only.&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you build a 3-layer verification loop?
&lt;/h2&gt;

&lt;p&gt;Compose three hooks across two events: a PostToolUse command hook for syntax (Layer 1), a Stop command hook for regression (Layer 3), and a Stop prompt hook for intent (Layer 2). Each runs automatically. The agent gets feedback and self-corrects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Syntax verification (PostToolUse)
&lt;/h3&gt;

&lt;p&gt;Runs after every Write or Edit tool call. Checks lint and type errors on the changed file. Fast, deterministic, zero tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# .claude/hooks/verify-syntax.sh&lt;/span&gt;
&lt;span class="nv"&gt;INPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;FILE_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.tool_input.file_path // empty'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Skip non-JS/TS files&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;~ &lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;ts|tsx|js|jsx&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Run ESLint on the changed file, surface errors only&lt;/span&gt;
&lt;span class="nv"&gt;LINT_OUTPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;npx eslint &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--quiet&lt;/span&gt; 2&amp;gt;&amp;amp;1&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;LINT_EXIT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$?&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$LINT_EXIT&lt;/span&gt; &lt;span class="nt"&gt;-ne&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;additionalContext&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Lint errors in &lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="nv"&gt;$LINT_OUTPUT&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key detail: this hook returns exit 0, not exit 2. PostToolUse hooks can't undo the file write. Instead, the &lt;code&gt;additionalContext&lt;/code&gt; field injects the lint errors into Claude's next turn. Claude sees the errors and fixes them on its own.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Intent verification (Stop prompt hook)
&lt;/h3&gt;

&lt;p&gt;Runs when Claude tries to stop. Asks an LLM to check whether the original request was actually addressed. This is the Claude Code equivalent of LangChain's &lt;code&gt;PreCompletionChecklistMiddleware&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Review what was accomplished in this session. Check if all requirements from the user's original request were addressed. If anything is incomplete or missing, respond with {&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;decision&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;block&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;reason&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Incomplete: &amp;lt;what remains&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}. If everything looks complete, respond with {&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;decision&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;allow&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For complex tasks, swap the prompt hook for an agent hook. Agent hooks spawn a subagent that can Read files, Grep the codebase, and run Bash commands. More thorough, but adds 2-10 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Regression verification (Stop command hook)
&lt;/h3&gt;

&lt;p&gt;Runs when Claude tries to stop. Deterministic check: do the tests pass? Does the build succeed?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# .claude/hooks/verify-regression.sh&lt;/span&gt;
&lt;span class="nv"&gt;INPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Anti-loop protection, MANDATORY&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.stop_hook_active'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"true"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Run tests&lt;/span&gt;
&lt;span class="nv"&gt;TEST_OUTPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;npm &lt;span class="nb"&gt;test &lt;/span&gt;2&amp;gt;&amp;amp;1&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt; &lt;span class="nt"&gt;-ne&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nv"&gt;TRIMMED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TEST_OUTPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-50&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Tests failing. Fix before completing:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="nv"&gt;$TRIMMED&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Run build&lt;/span&gt;
&lt;span class="nv"&gt;BUILD_OUTPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;npm run build 2&amp;gt;&amp;amp;1&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt; &lt;span class="nt"&gt;-ne&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nv"&gt;TRIMMED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BUILD_OUTPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-30&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Build failing:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="nv"&gt;$TRIMMED&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The complete configuration
&lt;/h3&gt;

&lt;p&gt;All three layers in one &lt;code&gt;.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PostToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Write|Edit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/verify-syntax.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Stop"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/verify-regression.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Review what was accomplished. Check if all requirements from the user's original request were addressed. If incomplete, respond with {&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;decision&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;block&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;reason&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;what remains&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}. If complete, respond with {&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;decision&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;allow&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}."&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stop hooks run in definition order. Put the fast command hook (Layer 3) first. If tests fail, there's no point running the slower prompt hook (Layer 2).&lt;/p&gt;

&lt;p&gt;Boris Cherny, creator of Claude Code, reports that verification feedback loops improve quality significantly: "Give Claude a way to verify its work. If Claude has that feedback loop, it will 2-3x the quality of the final result" (&lt;a href="https://x.com/bcherny/status/2007179861115511237" rel="noopener noreferrer"&gt;X thread, 2026&lt;/a&gt;).&lt;/p&gt;




&lt;h2&gt;
  
  
  What's the cost of running verification hooks?
&lt;/h2&gt;

&lt;p&gt;Verification hooks add roughly 10-20% token overhead per session, primarily from the prompt/agent Stop hooks. Command hooks cost zero tokens and under 5 seconds of wall time. But skipping verification costs significantly more: teams lose an average of 7 hours per week per engineer to AI-related inefficiency, and AI code rework rates hit 20-30% when AI-generated code exceeds 40% of the codebase (&lt;a href="https://blog.exceeds.ai/industry-benchmarks-ai-code-productivity/" rel="noopener noreferrer"&gt;Exceeds AI, 2026&lt;/a&gt;).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Without Verification&lt;/th&gt;
&lt;th&gt;With Verification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Token cost per session&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;+10-20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rework rate&lt;/td&gt;
&lt;td&gt;20-30%&lt;/td&gt;
&lt;td&gt;~5-10% (estimated)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time lost per week&lt;/td&gt;
&lt;td&gt;~7 hours&lt;/td&gt;
&lt;td&gt;~2-3 hours (estimated)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Done" means done&lt;/td&gt;
&lt;td&gt;Sometimes&lt;/td&gt;
&lt;td&gt;Almost always&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You don't need all 3 layers at once. Layer 3 alone (the test-runner Stop hook) is the highest-ROI single addition. It's 15 lines of bash, costs zero tokens, and catches the most common failure: the agent says "done" while tests are broken.&lt;/p&gt;




&lt;h2&gt;
  
  
  When should you use each verification layer?
&lt;/h2&gt;

&lt;p&gt;Use Layer 1 (syntax) always. It's free, catches the obvious, and runs in under 2 seconds. Use Layer 3 (regression) when your project has a test suite. It's the highest-ROI single hook. Use Layer 2 (intent) for complex or multi-step tasks where the agent might solve the wrong problem entirely.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Layer 1 (Syntax)&lt;/th&gt;
&lt;th&gt;Layer 2 (Intent)&lt;/th&gt;
&lt;th&gt;Layer 3 (Regression)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prototyping&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Solo dev, daily work&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team project&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (prompt)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production hotfix&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (agent)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;How to adopt gradually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Week 1&lt;/strong&gt;: Add the Layer 3 Stop hook (test runner). Copy the &lt;code&gt;verify-regression.sh&lt;/code&gt; script above. This single hook catches the most common failure mode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Week 2&lt;/strong&gt;: Add the Layer 1 PostToolUse hook (syntax). Copy &lt;code&gt;verify-syntax.sh&lt;/code&gt;. Now lint errors get fixed automatically instead of piling up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When you hit an intent failure&lt;/strong&gt;: Add the Layer 2 prompt hook. You'll know you need it when Claude completes a task that passes all tests but doesn't match what you asked for.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This follows the &lt;a href="https://shipwithai.io/blog/claude-md-failure-log-pattern/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-md-failure-log-pattern" rel="noopener noreferrer"&gt;failure-first method&lt;/a&gt;: add constraints after real failures, not before imagined ones.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is a self-verification loop in Claude Code?
&lt;/h3&gt;

&lt;p&gt;A self-verification loop is a system of hooks that automatically checks Claude Code's output at multiple levels (syntax, intent, regression) before allowing the agent to finish. It uses PostToolUse hooks for per-file checks and Stop hooks for task-completion verification. The agent receives feedback and self-corrects without manual review.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does verification slow down Claude Code?
&lt;/h3&gt;

&lt;p&gt;Command hooks add under 5ms. Prompt hooks add 300-2000ms per Stop event. Agent hooks add 2-10 seconds. These fire once when Claude tries to stop, not on every tool call. The overhead is minimal compared to the 7 hours per week teams lose to AI-related rework.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the Stop hook in Claude Code?
&lt;/h3&gt;

&lt;p&gt;The Stop hook fires every time Claude finishes responding. Exit code 2 blocks Claude from stopping and forces it to continue with feedback from stderr. The &lt;code&gt;stop_hook_active&lt;/code&gt; field prevents infinite loops by signaling when Claude is already in a forced-continuation state.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I prevent infinite loops in verification hooks?
&lt;/h3&gt;

&lt;p&gt;Always check the &lt;code&gt;stop_hook_active&lt;/code&gt; field in your Stop hook. When the value is &lt;code&gt;true&lt;/code&gt;, Claude is already in a forced-continuation state from a previous block. Return exit 0 to let it stop. Without this check, the hook blocks every stop attempt indefinitely, creating an infinite loop that burns tokens until the session times out.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is harness engineering?
&lt;/h3&gt;

&lt;p&gt;Harness engineering is the discipline of building constraints, tools, feedback loops, and observability around an AI agent to make it reliable in production. The formula: Agent = Model + Harness. Self-verification loops are one harness engineering example. For the full framework, see &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-harness-engineering-claude-code" rel="noopener noreferrer"&gt;Harness Engineering: The System Around AI Matters More Than AI&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Copy &lt;code&gt;verify-regression.sh&lt;/code&gt; into &lt;code&gt;.claude/hooks/&lt;/code&gt;, add the Stop hook config to &lt;code&gt;.claude/settings.json&lt;/code&gt;, make it executable with &lt;code&gt;chmod +x&lt;/code&gt;, and ask Claude to make a code change. Watch the Stop hook fire when tests fail. Confirm the agent fixes the issue before completing.&lt;/p&gt;

&lt;p&gt;What layer would you add first — syntax, intent, or regression? Drop it in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/claude-code-self-verification-loop/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-claude-code-self-verification-loop" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>claude</category>
    </item>
    <item>
      <title>The Constraint Paradox: Why Less AI Freedom Produces Better Code</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Sun, 26 Apr 2026 01:00:00 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/the-constraint-paradox-why-less-ai-freedom-produces-better-code-7c1</link>
      <guid>https://dev.to/shipwithaiio/the-constraint-paradox-why-less-ai-freedom-produces-better-code-7c1</guid>
      <description>&lt;h2&gt;
  
  
  LangChain jumped from 52.8% to 66.5% on Terminal Bench 2.0 by constraining their agent, not upgrading the model. Running at maximum reasoning budget actually scored &lt;em&gt;worse&lt;/em&gt;. Three data points prove it: freedom is the enemy of AI agent reliability.
&lt;/h2&gt;

&lt;p&gt;Two approaches. Same model. Different results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Approach A: Give the agent more freedom&lt;/span&gt;
→ Upgrade model, add more tools, increase context window
→ Remove guardrails so it &lt;span class="s2"&gt;"moves faster"&lt;/span&gt;
→ Result: unpredictable, rolls back 3x per session

&lt;span class="c"&gt;# Approach B: Give the agent more constraints&lt;/span&gt;
→ Same model, same tools, same context
→ Add: verification loop, compute budget, context injection
→ Result: 52.8% → 66.5% on Terminal Bench 2.0 &lt;span class="o"&gt;(&lt;/span&gt;LangChain, 2026&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every time a team complains about Claude Code "doing the wrong thing," I ask the same question: what stopped it from doing that? The answer is always &lt;em&gt;nothing&lt;/em&gt;. The agent had the capability. Nothing prevented the action.&lt;/p&gt;

&lt;p&gt;The instinct is to want a smarter model. The fix is a tighter &lt;a href="https://shipwithai.io/blog/harness-engineering-constraint-paradox/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-harness-engineering-constraint-paradox" rel="noopener noreferrer"&gt;harness&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  This is the Constraint Paradox: &lt;strong&gt;the more you restrict what your AI agent can do, the better it performs at what it should do.&lt;/strong&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Why does everyone assume "smarter model" is the answer?
&lt;/h2&gt;

&lt;p&gt;Developers instinctively optimize for agent capability. Smarter model + more tools + fewer restrictions = better output. But this assumption conflates capability with reliability, and they're fundamentally not the same thing.&lt;/p&gt;

&lt;p&gt;A senior developer with no code review, no CI/CD, no linting, and full production access will ship worse code than a junior developer working inside a strict pipeline. Not because the senior is less capable. Because unrestricted capability doesn't self-organize toward correct behavior. It just has a larger surface area for mistakes.&lt;/p&gt;

&lt;p&gt;AI agents have the same problem, magnified. An LLM doesn't have intuition for "this feels wrong." It doesn't pause before a destructive command and think "wait, should I really do this?" Constraints provide that intuition externally.&lt;/p&gt;

&lt;p&gt;OpenAI demonstrated this at scale. Their Codex team shipped roughly one million lines of production code with zero human-written lines over five months. Codex didn't succeed because it used a smarter model. It succeeded because it ran inside one of the most constrained environments in the industry: AGENTS.md files, reproducible dev environments, CI invariants, and mechanical verification.&lt;/p&gt;

&lt;p&gt;The question isn't "how smart is your model?" The question is "how tight is your harness?"&lt;/p&gt;




&lt;h2&gt;
  
  
  Three data points that prove constraints beat capability
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Evidence 1: LangChain Terminal Bench 2.0
&lt;/h3&gt;

&lt;p&gt;LangChain improved their coding agent from 52.8% to 66.5% on &lt;a href="https://www.vals.ai/benchmarks/terminal-bench-2" rel="noopener noreferrer"&gt;Terminal Bench 2.0&lt;/a&gt; by changing only the harness. Same model (gpt-5.2-codex). No fine-tuning. No model swap. Three harness changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context injection&lt;/strong&gt; via &lt;code&gt;LocalContextMiddleware&lt;/code&gt; — map the environment upfront&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-verification loop&lt;/strong&gt; via &lt;code&gt;PreCompletionChecklistMiddleware&lt;/code&gt; — verify before marking complete&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute budget management&lt;/strong&gt; — cap reasoning to prevent timeouts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The counterintuitive part: running at maximum reasoning budget (xhigh) scored 53.9%, &lt;em&gt;worse than the original baseline&lt;/em&gt;. The high setting scored 63.6%.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline (no harness changes)&lt;/td&gt;
&lt;td&gt;52.8%&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Harness changes + high reasoning&lt;/td&gt;
&lt;td&gt;66.5%&lt;/td&gt;
&lt;td&gt;+13.7pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Harness changes + xhigh reasoning&lt;/td&gt;
&lt;td&gt;53.9%&lt;/td&gt;
&lt;td&gt;+1.1pp (timeouts)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;More thinking didn't help. Better constraints did.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evidence 2: Mitchell Hashimoto's AGENTS.md
&lt;/h3&gt;

&lt;p&gt;Mitchell Hashimoto (creator of Terraform, Vagrant, Ghostty) treats his AGENTS.md as a failure log. Every single line exists because the agent made that specific mistake at least once:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Each line in that file is based on a bad agent behavior, and it almost completely resolved them all" — &lt;a href="https://mitchellh.com/writing/my-ai-adoption-journey" rel="noopener noreferrer"&gt;mitchellh.com, 2026&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ghostty is one of the most productive AI-assisted codebases in the open source world. Hashimoto estimates agents run 10-20% of his working day in the background. And it runs on one of the most constrained harnesses. Not despite the constraints. Because of them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evidence 3: Claude Code's permission model
&lt;/h3&gt;

&lt;p&gt;Claude Code defaults to read-only. You must explicitly allow write access, file creation, and command execution. This isn't a limitation. It's a design decision.&lt;/p&gt;

&lt;p&gt;Instead of evaluating every possible action (including destructive ones), the agent operates within a bounded set of safe actions. When it needs to do something outside that set, it asks. That asking catches mistakes before they happen.&lt;/p&gt;

&lt;p&gt;Compare this to an agent with full file system access from the start. It never pauses. It never asks. It just does — including &lt;code&gt;rm -rf&lt;/code&gt; when it thinks cleanup is needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why do constraints actually improve AI agent output?
&lt;/h2&gt;

&lt;p&gt;Three mechanisms:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mechanism 1: Constraints reduce the search space.&lt;/strong&gt; An unconstrained agent evaluates every possible action, including destructive ones. A constrained agent only evaluates valid actions. Same reason chess engines play better with opening books: eliminating bad moves early means more compute spent on good ones.&lt;/p&gt;

&lt;p&gt;LangChain's &lt;code&gt;LocalContextMiddleware&lt;/code&gt; is search space reduction in practice. Instead of the agent spending steps figuring out its environment, the middleware injects that context upfront.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mechanism 2: Constraints clarify intent.&lt;/strong&gt; When you tell an agent "don't modify files in /config," you're not just preventing a bad action. You're giving the agent information about what matters. Constraints are communication that's harder to misinterpret than instructions.&lt;/p&gt;

&lt;p&gt;An instruction says: "Be careful with config files." That's ambiguous. A constraint says: Hook blocks all writes to &lt;code&gt;/config/**&lt;/code&gt;. No ambiguity. No interpretation required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mechanism 3: Hard stops beat soft warnings.&lt;/strong&gt; A &lt;a href="https://shipwithai.io/blog/claude-code-hooks-guide/" rel="noopener noreferrer"&gt;Hook&lt;/a&gt; that blocks &lt;code&gt;git push --force&lt;/code&gt; doesn't require the agent to "decide" whether to follow the rule. The rule is enforced. The agent doesn't waste tokens weighing the instruction against other context.&lt;/p&gt;

&lt;p&gt;LangChain's &lt;code&gt;PreCompletionChecklistMiddleware&lt;/code&gt; is a hard stop. The agent &lt;em&gt;cannot&lt;/em&gt; mark a task complete without running verification. It doesn't "decide" whether to verify. Verification is mandatory.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Enforcement&lt;/th&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;Compliance&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instruction&lt;/td&gt;
&lt;td&gt;Soft context, weighted by LLM&lt;/td&gt;
&lt;td&gt;60-70%&lt;/td&gt;
&lt;td&gt;"Don't force push"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hook&lt;/td&gt;
&lt;td&gt;Shell script, pre-action&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;Block force push&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Middleware&lt;/td&gt;
&lt;td&gt;Code in agent pipeline&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;Forced verification&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Won't constraints slow down development?
&lt;/h2&gt;

&lt;p&gt;No. Unconstrained agents waste more time recovering from mistakes than constrained agents spend on guardrail checks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Unconstrained session&lt;/span&gt;
Agent runs → mistake at min 15 → rollback → retry → 50 min total
Useful work: 15 min &lt;span class="o"&gt;(&lt;/span&gt;30% efficiency&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Constrained session&lt;/span&gt;
Agent runs → blocked at min 15 → redirects → completes → 25 min total
Useful work: 25 min &lt;span class="o"&gt;(&lt;/span&gt;100% efficiency&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single bad agent decision (deleted file, force push, broken migration) costs 30 minutes of recovery. A Hook check takes 5 milliseconds.&lt;/p&gt;

&lt;p&gt;LangChain's &lt;code&gt;LoopDetectionMiddleware&lt;/code&gt; makes this concrete. It detects when the agent is stuck in repetitive edits and forces it to reconsider its approach. Without this constraint, the agent burns through tokens re-editing the same file. With it, the agent backs up and tries a different strategy.&lt;/p&gt;

&lt;p&gt;The real cost isn't the constraint. The real cost is the recovery from what would have happened without it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where should you constrain (and where not)?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Constrain (high cost to undo)&lt;/th&gt;
&lt;th&gt;Don't constrain (low cost)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;File deletion, &lt;code&gt;rm -rf&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Variable naming choices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git push --force&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Algorithm selection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production database writes&lt;/td&gt;
&lt;td&gt;Refactoring approach&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;.env&lt;/code&gt; and secrets edits&lt;/td&gt;
&lt;td&gt;Comment style&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD pipeline changes&lt;/td&gt;
&lt;td&gt;Test structure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Over-constraining is a real risk. If every file is protected, every command requires approval, and every edit needs pre-authorization, you've built a system that accomplishes nothing. The goal isn't zero risk. The goal is zero &lt;em&gt;unrecoverable&lt;/em&gt; risk.&lt;/p&gt;

&lt;p&gt;Claude Code's permission model gets this balance right. Read is unrestricted. Write requires approval. Destructive commands require explicit allowlisting. The agent explores freely but can't break things without your sign-off.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the constraint paradox in AI agents?
&lt;/h3&gt;

&lt;p&gt;The constraint paradox is the counterintuitive finding that restricting an AI agent's capabilities produces better output than giving it more freedom. LangChain demonstrated this by gaining 13.7 benchmark points through harness constraints alone. The mechanism: constraints reduce the agent's search space, clarify intent, and enforce rules deterministically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does more compute always improve AI agent performance?
&lt;/h3&gt;

&lt;p&gt;No. LangChain's benchmark data shows running at maximum reasoning budget (xhigh) scored 53.9%, worse than the high setting at 63.6%. More compute caused timeouts that hurt overall performance. The optimal approach is budgeted compute with hard verification stops, not unlimited reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between constraining and limiting an AI agent?
&lt;/h3&gt;

&lt;p&gt;Constraining means removing dangerous or wasteful actions while preserving the ability to solve the problem. Limiting means reducing capability entirely. A Hook that blocks &lt;code&gt;rm -rf&lt;/code&gt; is a constraint. Removing file system access entirely is a limitation. Constraints improve reliability. Limitations reduce usefulness.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many constraints should an AI agent harness have?
&lt;/h3&gt;

&lt;p&gt;Enough to prevent unrecoverable mistakes, not so many the agent can't work. The rule of thumb: constrain any action that would take more than 5 minutes to undo. Leave everything else to the agent's judgment. Start with 3-5 constraints and add only after real failures, following the &lt;a href="https://shipwithai.io/blog/claude-md-failure-log-pattern/" rel="noopener noreferrer"&gt;failure log pattern&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Open &lt;code&gt;.claude/settings.json&lt;/code&gt; and check your current permission config. If the agent has unrestricted write access, add one PreToolUse Hook that blocks edits to &lt;code&gt;.env&lt;/code&gt; and &lt;code&gt;credentials&lt;/code&gt;. Test it: ask Claude Code to edit your &lt;code&gt;.env&lt;/code&gt; file and confirm the hook blocks it.&lt;/p&gt;

&lt;p&gt;What's your take — have you seen constraints improve your agent's output? Drop it in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/harness-engineering-constraint-paradox/?utm_source=copy&amp;amp;utm_medium=devto&amp;amp;utm_campaign=blog-harness-engineering-constraint-paradox" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Your CLAUDE.md Is an Instruction File. It Should Be a Failure Log.</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Fri, 24 Apr 2026 01:00:00 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/your-claudemd-is-an-instruction-file-it-should-be-a-failure-log-2i5c</link>
      <guid>https://dev.to/shipwithaiio/your-claudemd-is-an-instruction-file-it-should-be-a-failure-log-2i5c</guid>
      <description>&lt;h2&gt;
  
  
  CLAUDE.md instructions get followed ~60-70% of the time. Mitchell Hashimoto's AGENTS.md in Ghostty has zero aspirational lines — every entry traces to a real agent mistake. Use the Failure-to-Constraint Decision Tree: dangerous actions go to Hooks, repeatable workflows go to Commands, style/convention goes to CLAUDE.md.
&lt;/h2&gt;

&lt;p&gt;Two CLAUDE.md files. Same project. Different philosophies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ❌ Before: instruction-first CLAUDE.md (typical)&lt;/span&gt;
&lt;span class="c"&gt;# 47 lines of well-meaning rules&lt;/span&gt;
- &lt;span class="s2"&gt;"Be careful with production database."&lt;/span&gt;
- &lt;span class="s2"&gt;"Always write tests."&lt;/span&gt;
- &lt;span class="s2"&gt;"Use TypeScript strict mode."&lt;/span&gt;
- &lt;span class="s2"&gt;"Follow our naming conventions."&lt;/span&gt;
&lt;span class="c"&gt;# Claude reads these, weighs them against 200K tokens... follows ~65%.&lt;/span&gt;

&lt;span class="c"&gt;# ✅ After: failure-first CLAUDE.md (Hashimoto method)&lt;/span&gt;
&lt;span class="c"&gt;# 12 lines, each traced to a specific incident&lt;/span&gt;
- &lt;span class="s2"&gt;"NEVER use git push --force. Use --force-with-lease."&lt;/span&gt;
  &lt;span class="c"&gt;# Failure: 2026-03-12, force push overwrote teammate's commits on feature/auth&lt;/span&gt;
- &lt;span class="s2"&gt;"Run npm test before ANY git commit. No exceptions."&lt;/span&gt;
  &lt;span class="c"&gt;# Failure: 2026-02-28, broken import pushed to main, CI caught 20min later&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One file has 47 lines of advice. The other has 12 lines of scars. Which one does the agent actually follow?&lt;/p&gt;

&lt;p&gt;The answer isn't close. The 12-line file wins every time, because every line carries weight. Every line exists for a reason the model can evaluate. The 47-line file is a wishlist. The 12-line file is a &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/" rel="noopener noreferrer"&gt;harness&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why do most CLAUDE.md files fail?
&lt;/h2&gt;

&lt;p&gt;Most CLAUDE.md files fail because developers write them like job descriptions: aspirational, comprehensive, bloated. LLMs don't execute instructions like code executes functions. They &lt;em&gt;weigh&lt;/em&gt; each instruction against the full context window. More lines means more dilution, which means lower compliance per line.&lt;/p&gt;

&lt;p&gt;The data backs this up. An &lt;a href="https://arxiv.org/html/2602.11988v1" rel="noopener noreferrer"&gt;ETH Zurich study&lt;/a&gt; (Gloaguen et al., 2026) tested context files across 138 real GitHub issues and found that LLM-generated agentfiles actually &lt;em&gt;reduced&lt;/em&gt; success rates by 0.5-2% while increasing inference costs by 20-23%. Even developer-provided files only improved performance by ~4% on average. The typical developer-written file averaged 641 words across 9.7 sections.&lt;/p&gt;

&lt;p&gt;That's a lot of instructions for a 4% gain.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;200-line CLAUDE.md&lt;/th&gt;
&lt;th&gt;40-line CLAUDE.md&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instructions&lt;/td&gt;
&lt;td&gt;~200&lt;/td&gt;
&lt;td&gt;~40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance&lt;/td&gt;
&lt;td&gt;~60-70%&lt;/td&gt;
&lt;td&gt;~85-90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maintenance&lt;/td&gt;
&lt;td&gt;Monthly pruning needed&lt;/td&gt;
&lt;td&gt;Self-maintaining&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Frontier LLMs can follow approximately 150-200 instructions with reasonable consistency. Your 200-line CLAUDE.md already exceeds that budget &lt;em&gt;before&lt;/em&gt; counting the system prompt (another ~50 instructions). Community benchmarks put compliance at 60-70% for files over 200 lines. That's a coin flip for your most important rules.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is the Mitchell Hashimoto method for AGENTS.md?
&lt;/h2&gt;

&lt;p&gt;Mitchell Hashimoto (creator of Terraform, Vagrant, and now Ghostty) treats AGENTS.md as a failure log, not an instruction file. Every single line in Ghostty's AGENTS.md exists because the agent made that specific mistake at least once. No line is aspirational. Every line is a scar from a real incident.&lt;/p&gt;

&lt;p&gt;In his own words:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Each line in that file is based on a bad agent behavior, and it almost completely resolved them all" — &lt;a href="https://mitchellh.com/writing/my-ai-adoption-journey" rel="noopener noreferrer"&gt;mitchellh.com, 2026&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The mental model shift matters:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instruction-first&lt;/th&gt;
&lt;th&gt;Failure-first&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"What should the agent do?"&lt;/td&gt;
&lt;td&gt;"What has the agent broken?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proactive, aspirational&lt;/td&gt;
&lt;td&gt;Reactive, evidence-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High volume, low signal&lt;/td&gt;
&lt;td&gt;Low volume, high signal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Added before problems occur&lt;/td&gt;
&lt;td&gt;Added after problems occur&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dilutes over time&lt;/td&gt;
&lt;td&gt;Strengthens over time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Instructions are wishes. Constraints are lessons. LLMs don't need more wishes. They need fewer, sharper constraints with concrete context about &lt;em&gt;why&lt;/em&gt; each one exists.&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you build CLAUDE.md from failures instead of imagination?
&lt;/h2&gt;

&lt;p&gt;Start with a minimal CLAUDE.md containing only your project overview and tech stack. Run the agent on real tasks. When it breaks something, convert that failure into a constraint. Then route the constraint to the right layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Start minimal
&lt;/h3&gt;

&lt;p&gt;Your initial CLAUDE.md should be 5-10 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project: Acme SaaS&lt;/span&gt;
TypeScript, Next.js 15, Drizzle ORM, deployed on Vercel.

&lt;span class="gu"&gt;## Build&lt;/span&gt;
npm run build &amp;amp;&amp;amp; npm test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No rules. No conventions. No aspirational guidelines. Just enough context for the agent to understand what it's working on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Run the agent, observe failures
&lt;/h3&gt;

&lt;p&gt;Use the agent for real work. Don't preemptively add rules. When the agent makes a mistake, write down exactly what happened:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What&lt;/strong&gt;: force-pushed to main&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When&lt;/strong&gt;: 2026-03-12&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact&lt;/strong&gt;: overwrote teammate's commits on feature/auth&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Convert the failure into a constraint
&lt;/h3&gt;

&lt;p&gt;Turn the incident into a specific, testable rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;NEVER use &lt;span class="sb"&gt;`git push --force`&lt;/span&gt;. Use &lt;span class="sb"&gt;`--force-with-lease`&lt;/span&gt;.
&lt;span class="gh"&gt;# 2026-03-12: force push overwrote teammate's commits on feature/auth&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern is always the same: &lt;strong&gt;CONSTRAINT + REASON + FAILURE DATE&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Route it with the decision tree
&lt;/h3&gt;

&lt;p&gt;Not every constraint belongs in CLAUDE.md. This decision tree is the most important takeaway from this post:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent made a mistake
    │
    ├── Is the action irreversible or dangerous?
    │   YES → Hook (PreToolUse block)
    │   Examples: delete production files, force push, edit .env
    │
    ├── Is it a repeatable workflow the agent should automate?
    │   YES → Command or Skill (.claude/commands/)
    │   Examples: run tests after refactor, update changelog
    │
    └── Is it a style, convention, or context issue?
        YES → CLAUDE.md constraint
        Examples: naming conventions, test patterns, commit format
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you take one thing from this post, take the decision tree. It replaces the instinct of "something went wrong, let me add a line to CLAUDE.md" with a structured routing decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  What does a CLAUDE.md look like before and after?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Before: instruction-first (47 lines)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project: Acme SaaS&lt;/span&gt;

&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Be careful with production database.
&lt;span class="p"&gt;-&lt;/span&gt; Always write tests.
&lt;span class="p"&gt;-&lt;/span&gt; Use TypeScript strict mode.
&lt;span class="p"&gt;-&lt;/span&gt; Follow naming conventions.
&lt;span class="p"&gt;-&lt;/span&gt; Don't use deprecated APIs.
&lt;span class="p"&gt;-&lt;/span&gt; Keep functions under 50 lines.
&lt;span class="p"&gt;-&lt;/span&gt; Use ESLint and Prettier.
&lt;span class="p"&gt;-&lt;/span&gt; Comment complex logic.
&lt;span class="p"&gt;-&lt;/span&gt; Don't hardcode environment variables.
&lt;span class="p"&gt;-&lt;/span&gt; Use meaningful variable names.
&lt;span class="gh"&gt;# ... 37 more aspirational rules like these&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every line is reasonable. None is specific. The agent reads all 47, retains maybe 30, and consistently follows maybe 25.&lt;/p&gt;

&lt;h3&gt;
  
  
  After: failure-first (18 lines)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project: Acme SaaS&lt;/span&gt;
TypeScript, Next.js 15, Drizzle ORM, Vercel.

&lt;span class="gu"&gt;## Build&lt;/span&gt;
npm run build &amp;amp;&amp;amp; npm test

&lt;span class="gu"&gt;## Constraints (each from a real failure)&lt;/span&gt;

NEVER use &lt;span class="sb"&gt;`git push --force`&lt;/span&gt;. Use &lt;span class="sb"&gt;`--force-with-lease`&lt;/span&gt;.
&lt;span class="gh"&gt;# 2026-03-12: force push overwrote teammate's commits on feature/auth&lt;/span&gt;

Run &lt;span class="sb"&gt;`npm test`&lt;/span&gt; before ANY git commit.
&lt;span class="gh"&gt;# 2026-02-28: broken import shipped to main, CI caught 20min later&lt;/span&gt;

Schema migrations: always generate with &lt;span class="sb"&gt;`drizzle-kit generate`&lt;/span&gt;.
&lt;span class="gh"&gt;# 2026-03-05: hand-written migration missed NOT NULL, broke staging&lt;/span&gt;

API routes: validate input with zod schemas, never trust req.body.
&lt;span class="gh"&gt;# 2026-03-18: unvalidated input caused 500 errors for 2 hours&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;18 lines. 4 constraints. Each one backed by a real incident with a date. The agent knows not just &lt;em&gt;what&lt;/em&gt; to avoid but &lt;em&gt;why&lt;/em&gt;, which makes the constraint stickier in context.&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you categorize failures into the right layer?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Enforcement&lt;/th&gt;
&lt;th&gt;Compliance&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hook&lt;/td&gt;
&lt;td&gt;Deterministic (shell script)&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;Block &lt;code&gt;git push --force&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command&lt;/td&gt;
&lt;td&gt;Deterministic (executed)&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;Run tests after refactor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLAUDE.md&lt;/td&gt;
&lt;td&gt;Probabilistic (LLM context)&lt;/td&gt;
&lt;td&gt;60-90%&lt;/td&gt;
&lt;td&gt;Use camelCase naming&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Category A: Structural failures → Hook.&lt;/strong&gt; File deletion, sensitive config edits, force pushes. For irreversible actions, you need 100% enforcement, not 60-70%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Category B: Style and convention failures → CLAUDE.md.&lt;/strong&gt; Variable naming, comment style, test patterns, commit format. Low-stakes if violated occasionally.&lt;/p&gt;

&lt;p&gt;Write them as failure-derived constraints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; Use camelCase for variables, PascalCase for components.
  # 2026-03-20: agent used snake_case in 3 React components, broke style consistency
&lt;span class="p"&gt;-&lt;/span&gt; Test files go in __tests__/ next to the source file, not in a top-level test/ dir.
  # 2026-02-15: agent created test/api/users.test.ts, missed by our jest config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Category C: Workflow failures → Commands/Skills.&lt;/strong&gt; "Always run tests after refactor." "Always update the changelog after API changes." These are repeatable processes. Don't remind the agent. Automate it.&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you keep CLAUDE.md lean over time?
&lt;/h2&gt;

&lt;p&gt;Prune monthly. &lt;a href="https://www.humanlayer.dev/blog/writing-a-good-claude-md" rel="noopener noreferrer"&gt;HumanLayer's production CLAUDE.md is under 60 lines&lt;/a&gt;. Bloat is the number one killer of CLAUDE.md effectiveness.&lt;/p&gt;

&lt;p&gt;Monthly pruning checklist:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For each constraint in CLAUDE.md, ask:

1. Has the agent triggered this constraint in the past 3 months?
   NO → candidate for removal

2. Has this constraint graduated to a Hook?
   YES → remove from CLAUDE.md (now enforced, not suggested)

3. Is this a workflow that could be a Command instead?
   YES → move to .claude/commands/, remove from CLAUDE.md

4. Can I name the specific failure behind this line?
   NO → delete it (it's aspirational, not evidence-based)

5. Does the agent already do this correctly without the instruction?
   YES → delete it (you're wasting instruction budget)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I did this exercise on a 90-line CLAUDE.md last month. It dropped to 23 lines. The agent's compliance on the remaining rules went up noticeably within the first session. Fewer rules, better followed.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between CLAUDE.md and AGENTS.md?
&lt;/h3&gt;

&lt;p&gt;CLAUDE.md is Claude Code's project-level instruction file, loaded automatically at session start. AGENTS.md is an &lt;a href="https://agents.md/" rel="noopener noreferrer"&gt;emerging open standard&lt;/a&gt; backed by OpenAI Codex, Amp, Google Jules, and Cursor that serves the same purpose but is agent-agnostic. Both are repository-level context files. If you use Claude Code, write CLAUDE.md. If you want cross-agent compatibility, also add an AGENTS.md. The failure-first methodology applies to both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I start CLAUDE.md from scratch or use a template?
&lt;/h3&gt;

&lt;p&gt;Start from scratch with only three things: project name, tech stack, build commands. Then build it through the failure-first workflow: run the agent, observe mistakes, add constraints one at a time. Templates encourage instruction-first thinking, which is the exact problem this post addresses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can the agent override or ignore CLAUDE.md constraints?
&lt;/h3&gt;

&lt;p&gt;Yes. CLAUDE.md is "soft" context. The LLM weighs it against other context but can ignore it. Compliance runs 60-70% with large files, higher with lean files. For constraints that must be followed 100% of the time, use &lt;a href="https://shipwithai.io/blog/claude-code-hook-decision-guide/" rel="noopener noreferrer"&gt;Hooks&lt;/a&gt; instead. Hooks run as shell scripts and physically block the action. The model cannot bypass them.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many lines should CLAUDE.md have?
&lt;/h3&gt;

&lt;p&gt;As few as possible. Research suggests LLMs follow ~150-200 instructions consistently, but that budget is shared with the system prompt (~50 instructions). Aim for 30-60 lines of failure-derived constraints plus a minimal project overview. If your file exceeds 100 lines, audit it with the failure-first test: can you name the specific incident behind each line?&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Open your CLAUDE.md right now. For each line, write the specific failure that caused you to add it. If you can't name the incident, delete the line.&lt;/p&gt;

&lt;p&gt;How many lines survived? Drop your before/after count in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/claude-md-failure-log-pattern/" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;br&gt;
I had a 90-line CLAUDE.md last month. Rules for everything. Naming conventions, test patterns, git workflows, API design, deployment checklist. Carefully organized with headers and bullet points.&lt;/p&gt;

&lt;p&gt;Claude followed about 65% of it.&lt;/p&gt;

&lt;p&gt;Then I learned about Mitchell Hashimoto's approach to AGENTS.md in Ghostty. Every single line in his file traces to a real agent mistake. No aspirational rules. No "best practices." Just scars.&lt;/p&gt;

&lt;p&gt;So I tried it. I went through my 90-line file and asked one question for each line: &lt;strong&gt;"Can I name the specific failure that caused me to add this?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;23 lines survived.&lt;/p&gt;


&lt;h2&gt;
  
  
  The problem with instruction-first thinking
&lt;/h2&gt;

&lt;p&gt;Most of us write CLAUDE.md like a job description — comprehensive, aspirational, bloated. But LLMs don't execute instructions like code executes functions. They &lt;em&gt;weigh&lt;/em&gt; each instruction against everything else in the context window.&lt;/p&gt;

&lt;p&gt;ETH Zurich tested this across 138 real GitHub issues. LLM-generated context files actually &lt;em&gt;reduced&lt;/em&gt; success by 0.5-2% while increasing costs by 20-23%. Even developer-written files only improved things by ~4%.&lt;/p&gt;

&lt;p&gt;The math is brutal: 200 lines of instructions, shared with a system prompt that already has ~50 instructions, competing for a model's attention across 200K tokens. Your most important rule has the same weight as "use meaningful variable names."&lt;/p&gt;
&lt;h2&gt;
  
  
  The failure-first method
&lt;/h2&gt;

&lt;p&gt;Hashimoto's approach is the opposite. Start with almost nothing — project name, tech stack, build command. That's 5 lines. Then run the agent on real work. When it breaks something, you have three choices:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the action dangerous or irreversible?&lt;/strong&gt; → Don't put it in CLAUDE.md. Put it in a Hook. A PreToolUse hook that exits with code 2 physically blocks the action. 100% enforcement. No exceptions. Force pushes, file deletions, production edits — these need hooks, not suggestions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is it a repeatable workflow?&lt;/strong&gt; → Put it in &lt;code&gt;.claude/commands/&lt;/code&gt;. A command runs deterministically every time. A CLAUDE.md instruction runs when the model remembers it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is it a style or convention issue?&lt;/strong&gt; → Now it belongs in CLAUDE.md. But write it as: &lt;code&gt;CONSTRAINT + REASON + FAILURE DATE&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;NEVER use &lt;span class="sb"&gt;`git push --force`&lt;/span&gt;. Use &lt;span class="sb"&gt;`--force-with-lease`&lt;/span&gt;.
&lt;span class="gh"&gt;# 2026-03-12: force push overwrote teammate's commits on feature/auth&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The failure context makes the constraint stickier. The model doesn't just know &lt;em&gt;what&lt;/em&gt; to avoid — it knows &lt;em&gt;why&lt;/em&gt;. That matters for an LLM weighting instructions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The result
&lt;/h2&gt;

&lt;p&gt;My 90-line file → 23 lines. Compliance on the remaining rules went up noticeably in the first session. Fewer rules, better followed. The dangerous ones graduated to Hooks where they're enforced 100%. The workflows became Commands. What remained in CLAUDE.md was lean, specific, and battle-tested.&lt;/p&gt;

&lt;p&gt;The monthly pruning rule: for each line, can you name the incident? No? Delete it. Has it graduated to a Hook? Remove it. Is the agent already doing it right without being told? You're wasting instruction budget.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://shipwithai.io/blog/claude-md-failure-log-pattern/" rel="noopener noreferrer"&gt;Read the full breakdown with the decision tree, before/after examples, and pruning checklist →&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;This week's takeaway:&lt;/strong&gt; Your CLAUDE.md is probably too long. The fix isn't writing better instructions — it's deleting the ones without scars behind them.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How many lines is your CLAUDE.md right now? Reply — I'm genuinely curious about the range people are working with.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you know someone drowning in a 200-line CLAUDE.md, forward this. They'll thank you.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Beyond CLAUDE.md: 5 Layers Your AI Agent Harness Is Missing</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Wed, 22 Apr 2026 01:00:00 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/beyond-claudemd-5-layers-your-ai-agent-harness-is-missing-475h</link>
      <guid>https://dev.to/shipwithaiio/beyond-claudemd-5-layers-your-ai-agent-harness-is-missing-475h</guid>
      <description>&lt;p&gt;Most developers stop at CLAUDE.md. That's layer 1. A production Claude Code harness needs 5 layers: memory, tools, permissions, hooks, and observability. Here's the full setup guide.&lt;/p&gt;

&lt;p&gt;Claude Code harness has 5 layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt; — CLAUDE.md, MEMORY.md, .claude/commands/&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; — MCP servers (sweet spot: 2–3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Permissions&lt;/strong&gt; — settings.json allow/deny lists&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks&lt;/strong&gt; — PreToolUse/PostToolUse verification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; — Decision logging, cost tracking, anomaly detection&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most developers only have layer 1. &lt;strong&gt;Setup order: 1→4→2→3→5&lt;/strong&gt; (guardrails before capabilities).&lt;/p&gt;

&lt;p&gt;Why? Because LangChain gained +13.7 benchmark points from harness changes alone — jumping from 52.8% to 66.5% on the same model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 1: Memory (The Foundation)
&lt;/h2&gt;

&lt;p&gt;Your CLAUDE.md is the project rules file. Claude reads it every prompt and follows it consistently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What goes in memory:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLAUDE.md&lt;/strong&gt; — 40–60 lines max. Project context, conventions, constraints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MEMORY.md&lt;/strong&gt; — Long-term learning. "We discovered X fails without Y."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;.claude/commands/&lt;/strong&gt; — Reusable prompt templates as commands.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The ETH Zurich finding:&lt;/strong&gt; CLAUDE.md alone caps improvement at ~4%. It's necessary but not sufficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The HumanLayer benchmark:&lt;/strong&gt; Teams keeping CLAUDE.md under 60 lines saw better compliance than those writing 200-line manifestos. Shorter = clearer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Example CLAUDE.md structure&lt;/span&gt;

&lt;span class="gu"&gt;## Project Identity&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Framework: Next.js 15 + TypeScript
&lt;span class="p"&gt;-&lt;/span&gt; Package manager: pnpm
&lt;span class="p"&gt;-&lt;/span&gt; Architecture: API routes + React components

&lt;span class="gu"&gt;## You Are&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; A full-stack developer shipping features
&lt;span class="p"&gt;-&lt;/span&gt; Opinionated about patterns: prefer hooks &amp;gt; HOCs
&lt;span class="p"&gt;-&lt;/span&gt; Balancing speed with maintainability

&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Always include tests when modifying /lib
&lt;span class="p"&gt;2.&lt;/span&gt; Use conventional commits for all commits
&lt;span class="p"&gt;3.&lt;/span&gt; If suggesting breaking changes, warn first
&lt;span class="p"&gt;4.&lt;/span&gt; Database migrations need rollback logic

&lt;span class="gu"&gt;## Code Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Folder structure: /pages, /components, /lib, /styles
&lt;span class="p"&gt;-&lt;/span&gt; Component naming: PascalCase for React files
&lt;span class="p"&gt;-&lt;/span&gt; API routes: camelCase for endpoint handlers

&lt;span class="gu"&gt;## What NOT to do&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Don't refactor without atomic commits
&lt;span class="p"&gt;-&lt;/span&gt; Don't add dependencies without checking bundle impact
&lt;span class="p"&gt;-&lt;/span&gt; Don't commit .env files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Layer 2: Tools (Adding Capability)
&lt;/h2&gt;

&lt;p&gt;Tools are MCP servers. Claude uses them to read files, run commands, query databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The HumanLayer finding:&lt;/strong&gt; Too many tools cause agent confusion. Each tool is context overhead. Sweet spot: &lt;strong&gt;2–3 MCP servers per project&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not 20. Not "all available servers."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which 2–3 tools?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem tool&lt;/strong&gt; — read/write/execute (almost always)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One domain-specific tool&lt;/strong&gt; — database, API, CLI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optional: Observability tool&lt;/strong&gt; — logs, metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example for a Next.js project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filesystem (built-in)&lt;/li&gt;
&lt;li&gt;PostgreSQL client (query → fix migrations)&lt;/li&gt;
&lt;li&gt;GitHub API (check PR status → adjust approach)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More tools = more tokens + more decision fatigue for Claude.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 3: Permissions (The Guardrails)
&lt;/h2&gt;

&lt;p&gt;Permissions live in &lt;code&gt;settings.json&lt;/code&gt;. Specify exactly what Claude is allowed to do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Allowlist over denylist.&lt;/strong&gt; It's safer to say "Claude can only modify these files" than "Claude cannot do X."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"permissions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"filesystem"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/src/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/public/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"*.config.js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;".env.local"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/node_modules/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/.git/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/build/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;".env"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"execution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"npm run test"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npm run build"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"rm -rf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sudo *"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude won't accidentally delete node_modules (been there)&lt;/li&gt;
&lt;li&gt;Can't run destructive commands without review&lt;/li&gt;
&lt;li&gt;Enforced at runtime, not a suggestion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Check settings.json into git.&lt;/strong&gt; This becomes part of your project's DNA.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 4: Hooks (Deterministic Enforcement)
&lt;/h2&gt;

&lt;p&gt;Hooks are the most powerful layer. They run &lt;em&gt;before&lt;/em&gt; and &lt;em&gt;after&lt;/em&gt; Claude uses tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PreToolUse hook:&lt;/strong&gt; Intercept tool calls, validate them, reject bad ones.&lt;br&gt;
&lt;strong&gt;PostToolUse hook:&lt;/strong&gt; Inspect results, catch anomalies, trigger alerts.&lt;/p&gt;

&lt;p&gt;Boris Cherny, Anthropic, calls verification "the most important thing" for quality. Hooks are that verification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Runs before every tool use&lt;/span&gt;

&lt;span class="nv"&gt;TOOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;
&lt;span class="nv"&gt;PARAMS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nv"&gt;$TOOL&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
  &lt;span class="s2"&gt;"filesystem_write"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PARAMS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"(node_modules|&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;git|&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;env)"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
      &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"REJECTED: Protected path"&lt;/span&gt;
      &lt;span class="nb"&gt;exit &lt;/span&gt;1
    &lt;span class="k"&gt;fi&lt;/span&gt;
    &lt;span class="p"&gt;;;&lt;/span&gt;
  &lt;span class="s2"&gt;"command_execute"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PARAMS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"(rm -rf|:(){ :|:)"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
      &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"REJECTED: Dangerous command"&lt;/span&gt;
      &lt;span class="nb"&gt;exit &lt;/span&gt;1
    &lt;span class="k"&gt;fi&lt;/span&gt;
    &lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="k"&gt;esac&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"APPROVED"&lt;/span&gt;
&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Runs after every tool use&lt;/span&gt;

&lt;span class="nv"&gt;TOOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;
&lt;span class="nv"&gt;RESULT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;
&lt;span class="nv"&gt;DURATION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$3&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt; DURATION &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; 30 &lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⚠️  Slow tool: &lt;/span&gt;&lt;span class="nv"&gt;$TOOL&lt;/span&gt;&lt;span class="s2"&gt; took &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DURATION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;s"&lt;/span&gt;
&lt;span class="k"&gt;fi

if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RESULT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"error&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;failed&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;undefined"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🔴 Tool failed: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$RESULT&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Where to set hooks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;.claude/hooks/pre-tool-use.sh&lt;/li&gt;
&lt;li&gt;.claude/hooks/post-tool-use.sh&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hooks are not bypassed. They're enforcement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 5: Observability (Learning from Decisions)
&lt;/h2&gt;

&lt;p&gt;Observability means: logging decisions, tracking costs, detecting anomalies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to log:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tools Claude called and why&lt;/li&gt;
&lt;li&gt;Tokens used per session (cost tracking)&lt;/li&gt;
&lt;li&gt;Time spent on each decision&lt;/li&gt;
&lt;li&gt;Failures and retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The HumanLayer insight:&lt;/strong&gt; Surface only failures, not 4,000 lines of passing tests.&lt;/p&gt;

&lt;p&gt;Most developers log everything. Better: log strategically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Log Claude's decisions&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%d %H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; | Tool: &lt;/span&gt;&lt;span class="nv"&gt;$TOOL&lt;/span&gt;&lt;span class="s2"&gt; | Status: &lt;/span&gt;&lt;span class="nv"&gt;$STATUS&lt;/span&gt;&lt;span class="s2"&gt; | Tokens: &lt;/span&gt;&lt;span class="nv"&gt;$TOKENS&lt;/span&gt;&lt;span class="s2"&gt; | Duration: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DURATION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;s"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .claude/logs/decisions.log

&lt;span class="nv"&gt;TOTAL_COST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"Tokens:"&lt;/span&gt; .claude/logs/decisions.log | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{sum+=$NF} END {print sum}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TOTAL_COST&lt;/span&gt;&lt;span class="s2"&gt; &amp;gt; 5.00"&lt;/span&gt; | bc &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"💰 Cost alert: &lt;/span&gt;&lt;span class="nv"&gt;$TOTAL_COST&lt;/span&gt;&lt;span class="s2"&gt; USD today"&lt;/span&gt;
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nv"&gt;ERROR_RATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"FAILED"&lt;/span&gt; .claude/logs/decisions.log | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt; ERROR_RATE &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; 5 &lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🚨 High error rate detected: &lt;/span&gt;&lt;span class="nv"&gt;$ERROR_RATE&lt;/span&gt;&lt;span class="s2"&gt; failures in last hour"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Setup Order Matters: 1 → 4 → 2 → 3 → 5
&lt;/h2&gt;

&lt;p&gt;Why not 1 → 2 → 3 → 4 → 5?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong order: Capabilities before guardrails&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build CLAUDE.md ✅&lt;/li&gt;
&lt;li&gt;Add 10 MCP servers ⚠️&lt;/li&gt;
&lt;li&gt;Grant all permissions ⚠️&lt;/li&gt;
&lt;li&gt;No hooks (too late, broke things already)&lt;/li&gt;
&lt;li&gt;Now add observability (chaos already happened)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Right order: Guardrails first&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build CLAUDE.md ✅ (memory/rules)&lt;/li&gt;
&lt;li&gt;Add hooks ✅ (enforcement before tools exist)&lt;/li&gt;
&lt;li&gt;Add 2–3 MCP servers ✅ (now hooks guard them)&lt;/li&gt;
&lt;li&gt;Restrict permissions ✅ (layered safety)&lt;/li&gt;
&lt;li&gt;Add observability ✅ (track what's working)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Adding hooks after tools is like adding seatbelts after the crash.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production-Ready Harness: 10-Item Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] CLAUDE.md exists, 40–60 lines, checked into git&lt;/li&gt;
&lt;li&gt;[ ] MEMORY.md setup with "lessons learned"&lt;/li&gt;
&lt;li&gt;[ ] .claude/commands/ has 3+ reusable prompts&lt;/li&gt;
&lt;li&gt;[ ] Max 3 MCP servers chosen and documented&lt;/li&gt;
&lt;li&gt;[ ] settings.json has allowlist (filesystem, execution)&lt;/li&gt;
&lt;li&gt;[ ] .claude/hooks/pre-tool-use.sh validates calls&lt;/li&gt;
&lt;li&gt;[ ] .claude/hooks/post-tool-use.sh inspects results&lt;/li&gt;
&lt;li&gt;[ ] .claude/logs/ directory exists + observability hook running&lt;/li&gt;
&lt;li&gt;[ ] Cost tracking implemented (tokens/session)&lt;/li&gt;
&lt;li&gt;[ ] Team knows where each file lives + how to update it&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Which layer do I need first?&lt;/strong&gt;&lt;br&gt;
Layer 1 (CLAUDE.md). Everything depends on clear memory. Start there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this harness slow down Claude Code?&lt;/strong&gt;&lt;br&gt;
No. Hooks add ~100–300ms per tool use. Worth it for the safety. Observability has negligible cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are the most important hooks?&lt;/strong&gt;&lt;br&gt;
PreToolUse (validation) and PostToolUse (anomaly detection). Those two prevent 80% of issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How many MCP servers is "too many"?&lt;/strong&gt;&lt;br&gt;
More than 5 becomes noise. More than 3 means you're probably adding tools you won't use. Start with 1–2, add more only when they solve a real workflow problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I skip permissions and just use hooks?&lt;/strong&gt;&lt;br&gt;
Technically yes, but no. Permissions are defense-in-depth. Hooks catch mistakes. Permissions prevent them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I update CLAUDE.md over time?&lt;/strong&gt;&lt;br&gt;
Document it in MEMORY.md. "We added this rule because X failed." Over time, CLAUDE.md stabilizes.&lt;/p&gt;




&lt;p&gt;Originally published on &lt;a href="https://shipwithai.io/blog/claude-code-harness-5-layers/" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and building production systems with AI. Full blog + templates at shipwithai.io.&lt;/p&gt;

&lt;p&gt;What's your harness score? Drop it in the comments. Do you have all 5 layers, or are you still at layer 1?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>claudecode</category>
      <category>shipwithai</category>
    </item>
    <item>
      <title>Harness Engineering: Why the System Around AI Matters More Than the AI Itself</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Mon, 20 Apr 2026 12:03:07 +0000</pubDate>
      <link>https://dev.to/shipwithaiio/harness-engineering-why-the-system-around-ai-matters-more-than-the-ai-itself-1o9i</link>
      <guid>https://dev.to/shipwithaiio/harness-engineering-why-the-system-around-ai-matters-more-than-the-ai-itself-1o9i</guid>
      <description>&lt;p&gt;Harness engineering is everything around your AI agent except the model: memory, tools, permissions, hooks, observability. LangChain gained 13.7 benchmark points by changing only the harness (52.8% to 66.5%, same model). Most developers only have Layer 1 (CLAUDE.md). Production needs all 5.&lt;/p&gt;




&lt;p&gt;Two lines of config. Same AI model. Completely different reliability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# CLAUDE.md approach (can be ignored)&lt;/span&gt;
&lt;span class="s2"&gt;"Never delete production database tables."&lt;/span&gt;
&lt;span class="c"&gt;# Claude reads this, weighs it against 200K tokens of context, may ignore it.&lt;/span&gt;

&lt;span class="c"&gt;# Hook approach (always enforced)&lt;/span&gt;
&lt;span class="c"&gt;# PreToolUse hook: command contains "DROP TABLE" + env=production → exit 2 → BLOCKED.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first is advice. The second is enforcement.&lt;/p&gt;

&lt;p&gt;One lives in a markdown file that competes with thousands of other tokens for the model's attention. The other is a shell script that runs before every command and cannot be bypassed. The gap between these two approaches is the gap most teams don't know exists.&lt;/p&gt;

&lt;p&gt;That gap has a name now: &lt;strong&gt;harness engineering&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is harness engineering? (And why prompt engineering isn't enough)
&lt;/h2&gt;

&lt;p&gt;Harness engineering is the discipline of building constraints, tools, feedback loops, and observability around an AI agent to make it reliable in production. The formula, popularized by &lt;a href="https://blog.langchain.com/improving-deep-agents-with-harness-engineering/" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; and refined on &lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;Martin Fowler's site&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Agent = Model + Harness&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model is a commodity. The harness is your competitive advantage.&lt;/p&gt;

&lt;p&gt;Mitchell Hashimoto, creator of Terraform and Ghostty, defined the core idea: anytime you find an agent makes a mistake, you engineer a solution so the agent never makes that mistake again. In Ghostty's repository, each line in the AGENTS.md file corresponds to a specific past agent failure that's now prevented.&lt;/p&gt;

&lt;p&gt;The industry has moved through three distinct eras:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Years&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Key Question&lt;/th&gt;
&lt;th&gt;Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Engineering&lt;/td&gt;
&lt;td&gt;2022-2024&lt;/td&gt;
&lt;td&gt;Crafting better instructions&lt;/td&gt;
&lt;td&gt;"How do I phrase this?"&lt;/td&gt;
&lt;td&gt;Instructions get diluted in long contexts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Engineering&lt;/td&gt;
&lt;td&gt;2025&lt;/td&gt;
&lt;td&gt;Curating what the model sees&lt;/td&gt;
&lt;td&gt;"What information does it need?"&lt;/td&gt;
&lt;td&gt;Knowing isn't doing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Harness Engineering&lt;/td&gt;
&lt;td&gt;2026&lt;/td&gt;
&lt;td&gt;Building systems around the agent&lt;/td&gt;
&lt;td&gt;"What can it do, and what can't it?"&lt;/td&gt;
&lt;td&gt;Emerging discipline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Prompt engineering shapes what the agent &lt;em&gt;tries&lt;/em&gt;. Context engineering shapes what the agent &lt;em&gt;knows&lt;/em&gt;. Harness engineering shapes what the agent &lt;strong&gt;can and cannot do&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How did LangChain gain 13.7 benchmark points without changing the model?
&lt;/h2&gt;

&lt;p&gt;By improving three harness components, LangChain jumped from 52.8% to 66.5% on &lt;a href="https://www.tbench.ai/news/announcement-2-0" rel="noopener noreferrer"&gt;Terminal Bench 2.0&lt;/a&gt; (a benchmark of 89 real-world terminal tasks) while keeping the same model, gpt-5.2-codex. They went from Top 30 to Top 5. No fine-tuning. No model swap. Just harness changes.&lt;/p&gt;

&lt;p&gt;Here are the three changes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Context injection.&lt;/strong&gt; LangChain's &lt;code&gt;LocalContextMiddleware&lt;/code&gt; maps the environment upfront and injects it directly into the agent's context. Before this change, the agent wasted steps trying to understand its surroundings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Self-verification loops.&lt;/strong&gt; After each action, the agent verifies its output against task-specific criteria before moving on. Not just "run the tests." The agent checks whether the output matches what the task actually asked for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Compute allocation.&lt;/strong&gt; This one is counterintuitive: running at maximum reasoning budget (xhigh) scored only 53.9%, while the high setting scored 63.6%. More compute caused timeouts that hurt overall performance.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Before harness changes&lt;/td&gt;
&lt;td&gt;52.8%&lt;/td&gt;
&lt;td&gt;Baseline, Top 30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;After harness changes (high reasoning)&lt;/td&gt;
&lt;td&gt;66.5%&lt;/td&gt;
&lt;td&gt;Top 5, +13.7pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max reasoning (xhigh)&lt;/td&gt;
&lt;td&gt;53.9%&lt;/td&gt;
&lt;td&gt;Worse than baseline, timeouts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you're evaluating AI coding tools by comparing model benchmarks alone, you're measuring the wrong variable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are the 5 layers of an AI agent harness?
&lt;/h2&gt;

&lt;p&gt;A production harness has five layers. Most developers I talk to in the Claude Code community have Layer 1 and maybe part of Layer 2. That leaves three layers of reliability on the table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What It Is&lt;/th&gt;
&lt;th&gt;Problem It Solves&lt;/th&gt;
&lt;th&gt;Claude Code Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Memory&lt;/td&gt;
&lt;td&gt;Persistent context across sessions&lt;/td&gt;
&lt;td&gt;Agent "forgets" your conventions every session&lt;/td&gt;
&lt;td&gt;CLAUDE.md, MEMORY.md, .claude/commands/&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Tools&lt;/td&gt;
&lt;td&gt;Extended capabilities beyond built-ins&lt;/td&gt;
&lt;td&gt;Agent can't access your APIs, databases, or services&lt;/td&gt;
&lt;td&gt;MCP servers, custom tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Permissions&lt;/td&gt;
&lt;td&gt;What the agent is allowed to do&lt;/td&gt;
&lt;td&gt;Agent edits sensitive files or runs dangerous commands&lt;/td&gt;
&lt;td&gt;settings.json allow/deny lists&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Hooks&lt;/td&gt;
&lt;td&gt;Automated enforcement at lifecycle points&lt;/td&gt;
&lt;td&gt;Instructions get ignored under context pressure&lt;/td&gt;
&lt;td&gt;PreToolUse/PostToolUse hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Observability&lt;/td&gt;
&lt;td&gt;Knowing what the agent actually did&lt;/td&gt;
&lt;td&gt;No visibility into agent decisions or cost&lt;/td&gt;
&lt;td&gt;Session logs, cost tracking, action audit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Think of it like your CI/CD pipeline. You built that infrastructure once, and the whole team benefits on every push. A harness works the same way for AI agent sessions.&lt;/p&gt;

&lt;p&gt;OpenAI demonstrated this at scale. Their Codex team shipped roughly one million lines of production code, with zero lines written by human hands, over five months. Their harness included AGENTS.md files, reproducible dev environments, and mechanical invariants in CI. Development throughput was roughly one-tenth the time a human team would have needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where is your harness right now?
&lt;/h2&gt;

&lt;p&gt;Run this checklist:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Do you have a CLAUDE.md with project conventions and constraints?&lt;/td&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Do you have MCP servers connecting Claude Code to external tools?&lt;/td&gt;
&lt;td&gt;Tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Do you have settings.json with explicit allow/deny lists?&lt;/td&gt;
&lt;td&gt;Permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Do you have at least one PreToolUse hook that blocks dangerous actions?&lt;/td&gt;
&lt;td&gt;Hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Can you see what Claude did in each session and how much it cost?&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Your score:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1/5&lt;/strong&gt;: You're in the majority. Most developers stop at CLAUDE.md.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2-3/5&lt;/strong&gt;: Ahead of most. You've started building real infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4-5/5&lt;/strong&gt;: Production-ready. You're doing harness engineering whether you knew the name or not.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Be honest about question 4. If the answer is no, your agent can still &lt;code&gt;rm -rf&lt;/code&gt; your project directory. CLAUDE.md says "don't do that." A hook actually prevents it.&lt;/p&gt;

&lt;p&gt;Here's why this matters: an ETH Zurich study (Feb 2026) tested context files across 138 real-world tasks from 12 Python repositories. Human-written context files improved agent success by only about 4%. LLM-generated ones actually &lt;em&gt;reduced&lt;/em&gt; success by ~3% while increasing inference costs by over 20%. Instructions alone aren't enough. You need enforcement layers.&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you start building a harness today?
&lt;/h2&gt;

&lt;p&gt;You don't need all 5 layers at once. Start with three high-impact changes that take less than 30 minutes total.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Win 1: Create a MEMORY.md (5 minutes)
&lt;/h3&gt;

&lt;p&gt;MEMORY.md is a lightweight index that points to where knowledge lives in your project. Unlike CLAUDE.md (which holds static rules), MEMORY.md tracks evolving state: recent decisions, architectural changes, active work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Auth&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;src/lib/auth/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Clerk, not NextAuth. Migrated March 2026.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;prisma/schema.prisma&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — PostgreSQL on Supabase. All queries via Prisma.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Deploy&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;docs/deploy.md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Vercel preview for PRs, production on main.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Testing&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;vitest.config.ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Vitest unit, Playwright E2E. Min 80% coverage.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;API&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;src/app/api/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Server Actions preferred over API routes for mutations.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Win 2: Add one PreToolUse guardrail hook (15 minutes)
&lt;/h3&gt;

&lt;p&gt;This hook blocks Claude Code from editing sensitive files. Copy-paste ready:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# .claude/hooks/block-sensitive-files.sh&lt;/span&gt;
&lt;span class="c"&gt;# Blocks edits to .env, credentials, and CI config&lt;/span&gt;

&lt;span class="nv"&gt;INPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;FILE_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.tool_input.file_path // empty'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;SENSITIVE&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s1"&gt;'.env'&lt;/span&gt; &lt;span class="s1"&gt;'credentials'&lt;/span&gt; &lt;span class="s1"&gt;'.github/workflows'&lt;/span&gt; &lt;span class="s1"&gt;'secrets'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;pattern &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SENSITIVE&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$pattern&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"BLOCKED: Cannot edit sensitive file: &lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
  &lt;span class="k"&gt;fi
done

&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register it in &lt;code&gt;.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Edit|Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/block-sensitive-files.sh"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Win 3: Enable cost awareness (10 minutes)
&lt;/h3&gt;

&lt;p&gt;Track what each session costs so you notice anomalies early. Boris Cherny, creator of Claude Code, calls verification "probably the most important thing" for quality:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Give Claude a way to verify its work. If Claude has that feedback loop, it will 2-3x the quality of the final result."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Start simple: review &lt;code&gt;~/.claude/projects/&lt;/code&gt; after each session to check what Claude did and how much it cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between harness engineering and prompt engineering?
&lt;/h3&gt;

&lt;p&gt;Prompt engineering shapes what the agent tries. Context engineering shapes what the agent knows. Harness engineering shapes what the agent can and cannot do. They're not replacements — they're layers. A production AI workflow uses all three, but harness engineering provides the strongest reliability guarantees because it uses enforcement (hooks, permissions) rather than suggestions (prompts, context).&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need harness engineering for Claude Code?
&lt;/h3&gt;

&lt;p&gt;Yes. Claude Code is itself a harness that Anthropic built around their model. But it's the &lt;em&gt;inner&lt;/em&gt; harness. You need an &lt;em&gt;outer&lt;/em&gt; harness tailored to your project: CLAUDE.md for conventions, hooks for guardrails, MCP servers for tools, permissions for safety boundaries, and observability for cost control.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is harness engineering only for Claude Code?
&lt;/h3&gt;

&lt;p&gt;No. The principles apply to any AI coding agent: Cursor, GitHub Copilot, OpenAI Codex, Windsurf, Cline. Claude Code happens to offer the most programmable harness surface (17 hook events, MCP protocol, skills system), which is why examples here use it. The concepts transfer directly to other tools.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Pick one quick win above and implement it before your next Claude Code session. Quick Win 2 is copy-paste ready and takes 3 minutes.&lt;/p&gt;

&lt;p&gt;What's your harness score right now? Drop it in the comments — I'm curious how many devs have gone beyond Layer 1.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
