<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Penloom Studio</title>
    <description>The latest articles on DEV Community by Penloom Studio (@penloom_studio_829b7817d3).</description>
    <link>https://dev.to/penloom_studio_829b7817d3</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4005603%2F59d74cdd-defd-42ac-889b-3ed5c07b4f13.png</url>
      <title>DEV Community: Penloom Studio</title>
      <link>https://dev.to/penloom_studio_829b7817d3</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/penloom_studio_829b7817d3"/>
    <language>en</language>
    <item>
      <title>Your CLAUDE.md is too long — and that's why Claude Code ignores it</title>
      <dc:creator>Penloom Studio</dc:creator>
      <pubDate>Sat, 27 Jun 2026 18:15:51 +0000</pubDate>
      <link>https://dev.to/penloom_studio_829b7817d3/your-claudemd-is-too-long-and-thats-why-claude-code-ignores-it-1c2e</link>
      <guid>https://dev.to/penloom_studio_829b7817d3/your-claudemd-is-too-long-and-thats-why-claude-code-ignores-it-1c2e</guid>
      <description>&lt;p&gt;Everyone hits the same wall with Claude Code. You add a rule to &lt;code&gt;CLAUDE.md&lt;/code&gt;. It works. You add ten more. They mostly work. You add forty more — and now Claude is cheerfully ignoring the rule you care about most, the one that's been sitting there since day one. So you make it &lt;strong&gt;LOUDER&lt;/strong&gt;, in caps, with three exclamation points. It still gets skipped.&lt;/p&gt;

&lt;p&gt;The instinct is to write more. The fix is almost always to write &lt;em&gt;less&lt;/em&gt;. Here's why, and what to do instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instruction-following has a budget, and you're overdrawn
&lt;/h2&gt;

&lt;p&gt;This is the part most CLAUDE.md guides skip. Frontier models reliably follow on the order of &lt;strong&gt;150–200 instructions at once&lt;/strong&gt; — and adherence to any single rule &lt;em&gt;drops&lt;/em&gt; as you stack more on top of it. It isn't a cliff; it's a slow tax. Every line you add makes every other line a little less likely to be honored.&lt;/p&gt;

&lt;p&gt;Now subtract what you don't control: Claude Code's own system prompt already spends a chunk of that budget before your file is even read. So the working budget for &lt;em&gt;your&lt;/em&gt; project rules is smaller than the headline number — and a sprawling 300-line CLAUDE.md isn't 300 rules followed, it's maybe the first 150 followed well and the rest treated as ambience.&lt;/p&gt;

&lt;p&gt;The mental model that fixes everything downstream: &lt;strong&gt;CLAUDE.md is a budget, not a wishlist.&lt;/strong&gt; You are not writing documentation. You are spending a scarce attention allowance, and every line competes with every other line.&lt;/p&gt;

&lt;h2&gt;
  
  
  The test for every line: would you bet $5 it's followed?
&lt;/h2&gt;

&lt;p&gt;Go through your file line by line and ask one question of each rule: &lt;em&gt;would I bet money this fires every time it's relevant?&lt;/em&gt; Three outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Yes, and it's load-bearing&lt;/strong&gt; — keep it. This is what the budget is for.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nice to have, but I wouldn't bet on it&lt;/strong&gt; — cut it, or move it to a referenced file (below). It's diluting the rules you &lt;em&gt;would&lt;/em&gt; bet on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It absolutely must happen every time&lt;/strong&gt; — then it doesn't belong in CLAUDE.md at all. Make it a hook.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That third category is the one people get wrong, so let's be concrete about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advisory vs. deterministic: know which one you need
&lt;/h2&gt;

&lt;p&gt;A CLAUDE.md rule is a &lt;em&gt;request&lt;/em&gt;. A good model honors it most of the time — call it ~80%. For style preferences, naming conventions, "explain before you refactor," 80% is completely fine; the cost of a miss is low.&lt;/p&gt;

&lt;p&gt;But some rules can't tolerate a 1-in-5 miss: never commit secrets, never touch &lt;code&gt;production.env&lt;/code&gt;, always run the test suite before declaring done. For those, prose is the wrong tool no matter how forcefully you phrase it. You want a &lt;strong&gt;hook&lt;/strong&gt; — a shell command Claude Code runs automatically at a fixed point in its lifecycle. A hook doesn't ask the model to cooperate; it fires deterministically, every time, whether the model "felt like it" or not.&lt;/p&gt;

&lt;p&gt;Here's the graduation in practice. The CLAUDE.md version — advisory, ~80%:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# in CLAUDE.md&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Never edit .env files or anything under .git/. Never commit secrets.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The hook version — deterministic, 100%. A &lt;code&gt;PreToolUse&lt;/code&gt; hook that &lt;em&gt;blocks&lt;/em&gt; the write before it happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Edit|Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node -e &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;const p=process.env.CLAUDE_TOOL_INPUT_FILE_PATH||''; if(/(^|&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/)&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.env|(^|&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/)&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.git&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/|secrets/i.test(p)){console.error('Blocked: protected path '+p);process.exit(2)}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exit code &lt;code&gt;2&lt;/code&gt; tells Claude Code the action was refused and feeds the reason back to the model, so it adapts instead of failing silently. The rule it replaced can now leave CLAUDE.md entirely — you just bought back a line of budget &lt;em&gt;and&lt;/em&gt; upgraded a must-happen rule from 80% to 100%. That's the trade you're looking for on every critical line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Every rule needs a reason — or it doesn't generalize
&lt;/h2&gt;

&lt;p&gt;"Use tabs, not spaces" is a rule that applies exactly once, literally. "Use tabs, not spaces — our linter rejects spaces and the build fails" is a rule the model can &lt;em&gt;reason from&lt;/em&gt;: it now knows the underlying constraint and will make the right call in situations you never spelled out. A rule with a reason generalizes. A bare command gets followed only in the exact case you wrote, and quietly dropped the moment the context shifts an inch. Reasons aren't padding — they're what makes a rule worth its place in the budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Progressive disclosure: keep the hot path short
&lt;/h2&gt;

&lt;p&gt;You don't actually have to choose between "short file" and "thorough guidance." Keep CLAUDE.md to the rules that apply to &lt;em&gt;most&lt;/em&gt; tasks, and push the rest into focused files you reference by path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md (the always-loaded hot path — keep it lean)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Run &lt;span class="sb"&gt;`npm test`&lt;/span&gt; before declaring any task done.
&lt;span class="p"&gt;-&lt;/span&gt; API conventions live in docs/api-guidelines.md — read it before touching routes/.
&lt;span class="p"&gt;-&lt;/span&gt; DB migrations: follow docs/migrations.md exactly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude pulls &lt;code&gt;docs/migrations.md&lt;/code&gt; into context only when it's actually doing a migration. The detailed guidance still exists, fully — it just isn't taxing your budget on the 90% of tasks where it's irrelevant. Your CLAUDE.md stays a tight, high-adherence core; the depth lives one hop away, loaded on demand.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 20-minute audit
&lt;/h2&gt;

&lt;p&gt;Open your CLAUDE.md right now and do this pass:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Count the lines.&lt;/strong&gt; Past ~150 and climbing? That alone explains a lot of "it ignored me."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delete every rule you wouldn't bet $5 on.&lt;/strong&gt; Be ruthless. A diluted file follows nothing well.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Find the must-happen rules and make them hooks.&lt;/strong&gt; Secrets, destructive commands, test-before-done. Move them out of prose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a reason to every surviving rule.&lt;/strong&gt; If you can't articulate why, you don't need the rule.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move task-specific depth into referenced files.&lt;/strong&gt; Keep the hot path lean.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Almost everyone who does this honestly ends up with a &lt;em&gt;shorter&lt;/em&gt; file that Claude follows &lt;em&gt;more&lt;/em&gt; reliably. That's not a paradox — it's the budget working as intended.&lt;/p&gt;




&lt;h2&gt;
  
  
  The one-paragraph version
&lt;/h2&gt;

&lt;p&gt;Instruction-following is a scarce budget, not a wishlist. Models reliably follow ~150–200 rules and adherence drops as you pile on more, so every line in CLAUDE.md should be one you'd bet money on. Cut what you wouldn't bet on, give every survivor a reason so it generalizes, push task-specific depth into referenced files, and graduate the must-happen rules out of advisory prose into deterministic hooks. Shorter file, higher adherence.&lt;/p&gt;

&lt;p&gt;If you want CLAUDE.md templates that already follow this — plus five ready-to-paste hooks (including the secrets guard above), subagents, and slash-command skills — I put together a &lt;strong&gt;&lt;a href="https://penloomstudio.com/index.html" rel="noopener noreferrer"&gt;Claude Code Power-User Pack&lt;/a&gt;&lt;/strong&gt;. And if you're just getting started, the &lt;strong&gt;&lt;a href="https://penloomstudio.com/field-guide.html" rel="noopener noreferrer"&gt;free field guide&lt;/a&gt;&lt;/strong&gt; covers seven reliability rules and three paste-able guardrails, no email required.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What's the one rule your CLAUDE.md keeps ignoring? Reply and tell me — the patterns repeat more than you'd think.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why your AI agent is flaky — and 7 rules that make it reliable</title>
      <dc:creator>Penloom Studio</dc:creator>
      <pubDate>Sat, 27 Jun 2026 16:44:23 +0000</pubDate>
      <link>https://dev.to/penloom_studio_829b7817d3/why-your-ai-agent-is-flaky-and-7-rules-that-make-it-reliable-481d</link>
      <guid>https://dev.to/penloom_studio_829b7817d3/why-your-ai-agent-is-flaky-and-7-rules-that-make-it-reliable-481d</guid>
      <description>&lt;p&gt;You built an AI agent. In the demo it was magic. In the wild it loops, hallucinates a tool call, "forgets" the format you asked for twice, and occasionally does something mildly alarming with your filesystem.&lt;/p&gt;

&lt;p&gt;Here's the uncomfortable truth after shipping a lot of these: &lt;strong&gt;a flaky agent is almost never a model problem.&lt;/strong&gt; It's a &lt;em&gt;specification&lt;/em&gt; problem. The model is doing exactly what your prompt, your tools, and your control loop told it to do — which turns out to be far less than you thought you said.&lt;/p&gt;

&lt;p&gt;Below are seven rules that consistently move an agent from "cool demo" to "I trust this on real work." None of them require a bigger model. Three of them are paste-able guardrails for Claude Code specifically.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Make the success criteria machine-checkable, not vibes
&lt;/h2&gt;

&lt;p&gt;"Summarize this well" is unfalsifiable. The agent can't tell when it's done, and neither can you. Replace every vibe with something a script could check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ "Return a good summary."&lt;/li&gt;
&lt;li&gt;✅ "Return 3–5 bullet points, each ≤ 20 words, each starting with a verb. Output only the bullets, no preamble."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reliability win isn't the format — it's that &lt;em&gt;failure becomes detectable&lt;/em&gt;. If you can write an &lt;code&gt;assert&lt;/code&gt; for the output, you can build a retry around it. If you can't, you have no idea how often it's wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Give tools narrow contracts and loud errors
&lt;/h2&gt;

&lt;p&gt;Most "the agent called the wrong tool" bugs are actually "the tool description was ambiguous" bugs. A tool named &lt;code&gt;search&lt;/code&gt; that sometimes means web search and sometimes means database lookup &lt;em&gt;will&lt;/em&gt; get confused. Two rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One tool, one job, one unambiguous name (&lt;code&gt;search_web&lt;/code&gt;, not &lt;code&gt;search&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;When a tool fails, return &lt;em&gt;why&lt;/em&gt; in plain language the model can act on: &lt;code&gt;"error: 'date' must be YYYY-MM-DD, got '6/27'"&lt;/code&gt; beats a stack trace or a silent &lt;code&gt;null&lt;/code&gt;. The model can recover from a sentence; it cannot recover from &lt;code&gt;undefined&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Bound the loop — always
&lt;/h2&gt;

&lt;p&gt;Every autonomous agent needs three hard limits or it will eventually run forever: a &lt;strong&gt;max step count&lt;/strong&gt;, a &lt;strong&gt;wall-clock timeout&lt;/strong&gt;, and a &lt;strong&gt;no-progress detector&lt;/strong&gt; (if the last two actions are identical, stop). The model will not reliably stop itself. This is your job, in code, outside the prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Prefer deterministic guardrails over polite requests
&lt;/h2&gt;

&lt;p&gt;Anything you put in the prompt is a &lt;em&gt;request&lt;/em&gt;. The model usually honors it. "Usually" is not a security model. If an action is dangerous or irreversible, gate it in code, not in English.&lt;/p&gt;

&lt;p&gt;Concretely, in &lt;strong&gt;Claude Code&lt;/strong&gt; you can do this with hooks — deterministic scripts that run before a tool call and can block it. Here are the three I put in almost every project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Guardrail A — block edits to sensitive paths.&lt;/strong&gt; A &lt;code&gt;PreToolUse&lt;/code&gt; hook that refuses any write to &lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;.git/&lt;/code&gt;, or secrets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Edit|Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node -e &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;const p=process.env.CLAUDE_TOOL_INPUT_FILE_PATH||''; if(/(^|&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/)&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.env|(^|&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/)&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.git&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/|secrets/i.test(p)){console.error('Blocked: protected path '+p);process.exit(2)}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exit code &lt;code&gt;2&lt;/code&gt; tells Claude Code the action was blocked and feeds the reason back to the model — so it adjusts instead of silently failing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Guardrail B — never let "rm -rf" through.&lt;/strong&gt; A &lt;code&gt;PreToolUse&lt;/code&gt; matcher on &lt;code&gt;Bash&lt;/code&gt; that hard-fails on obviously destructive commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node -e &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;const c=process.env.CLAUDE_TOOL_INPUT_COMMAND||''; if(/rm&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s+-rf|git&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s+reset&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s+--hard|:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;(&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;{/.test(c)){console.error('Blocked destructive command');process.exit(2)}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Guardrail C — auto-format after every edit&lt;/strong&gt;, so the agent's output is consistent without you asking each time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Edit|Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prettier --write &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;$CLAUDE_TOOL_INPUT_FILE_PATH&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; 2&amp;gt;/dev/null || true"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern that matters here: &lt;strong&gt;the model proposes, deterministic code disposes.&lt;/strong&gt; Hooks run whether or not the model "felt like" honoring an instruction.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Make state explicit and external
&lt;/h2&gt;

&lt;p&gt;If your agent's "memory" is just the growing chat transcript, it will drift — early instructions get diluted, and cost climbs every turn. Keep the durable facts (the task, the constraints, what's been done) in a small structured object you re-inject each step, and let the transcript be disposable. An agent that re-reads its own goal every loop stays on task far longer than one trusting a 40-message context to hold.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Test it like software, because it is
&lt;/h2&gt;

&lt;p&gt;You wouldn't ship a function with zero tests; don't ship an agent with zero either. Build a tiny eval set — even 10–15 representative inputs with checkable expected properties (rule #1 makes this possible). Run it on every prompt change. The first time a "harmless" prompt tweak silently breaks 3 of 12 cases, you'll understand why this is the highest-leverage 30 minutes in the whole project.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Fail loudly to a human, never silently to the user
&lt;/h2&gt;

&lt;p&gt;When the agent is uncertain or a guardrail trips, the worst outcome is confidently shipping a wrong answer. Design an explicit "I'm not sure, here's why" path. A reliable agent that occasionally says &lt;em&gt;"I couldn't verify X, stopping"&lt;/em&gt; earns more trust than a confident one that's wrong 5% of the time on work that matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  The one-paragraph version
&lt;/h2&gt;

&lt;p&gt;Reliability is not a model you buy; it's a discipline you impose. Make success checkable (#1, #6), make tools unambiguous and loud (#2), bound the loop (#3), enforce the dangerous stuff in code rather than in prose (#4), externalize state (#5), and fail loud to a human (#7). Do those and an ordinary model behaves like a dependable one.&lt;/p&gt;

&lt;p&gt;If you want this as a printable checklist plus three more ready-to-paste Claude Code guardrails, I wrote a free field guide — no email required: &lt;strong&gt;&lt;a href="https://earnest-sundae-f82754.netlify.app/field-guide.html" rel="noopener noreferrer"&gt;Ship a Reliable AI Agent — the Free Field Guide&lt;/a&gt;&lt;/strong&gt;. It's the 10-minute version of everything above.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What's the failure mode that bit you hardest building agents? I collect these — the patterns repeat more than you'd think.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>claude</category>
    </item>
  </channel>
</rss>
