<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Peter Huang</title>
    <description>The latest articles on DEV Community by Peter Huang (@peterverse180).</description>
    <link>https://dev.to/peterverse180</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3960793%2F190b55f0-4768-4527-8d4b-4aea9a5369f7.png</url>
      <title>DEV Community: Peter Huang</title>
      <link>https://dev.to/peterverse180</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/peterverse180"/>
    <language>en</language>
    <item>
      <title>Writing a CLAUDE.md that Claude actually follows</title>
      <dc:creator>Peter Huang</dc:creator>
      <pubDate>Wed, 10 Jun 2026 06:19:16 +0000</pubDate>
      <link>https://dev.to/peterverse180/writing-a-claudemd-that-claude-actually-follows-4llo</link>
      <guid>https://dev.to/peterverse180/writing-a-claudemd-that-claude-actually-follows-4llo</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — CLAUDE.md is powerful, but most people fill it with vague preferences that Claude acknowledges and then ignores. The instructions that stick are specific, verifiable, and binary — not stylistic. This post shows the difference, with examples.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The CLAUDE.md that got ignored
&lt;/h2&gt;

&lt;p&gt;My first CLAUDE.md had 30 lines. Things like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Write clean, readable code.
- Keep functions small and focused.
- Follow best practices.
- Be concise in explanations.
- Don't over-engineer solutions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude read it. Every session it would acknowledge the file. And then it would write 400-line explanations, functions that did four things, and solutions with three layers of abstraction nobody asked for.&lt;/p&gt;

&lt;p&gt;I blamed Claude. But the real problem was me: I'd written instructions that I couldn't have enforced even if I were reviewing a junior engineer's PR. "Be concise" isn't actionable. It's an aspiration. And aspirations don't constrain LLM behavior — they just get absorbed into the same ambient politeness that produces confident, plausible, wrong output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why most CLAUDE.md instructions don't stick
&lt;/h2&gt;

&lt;p&gt;An LLM follows instructions the way a very eager-to-please person follows vague advice: by interpreting them in whatever way is consistent with what they were already going to do. "Write clean code" is consistent with almost anything. "Don't over-engineer" depends entirely on what counts as over-engineering, which the model will assess using its own judgment, which is the thing you were trying to override.&lt;/p&gt;

&lt;p&gt;The instructions that actually change behavior share a few properties:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. They're binary, not scalar.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Be concise" is scalar — there's a spectrum and the model decides where on it to land. "Do not write an introduction paragraph that summarizes what you're about to do" is binary. Either there's an intro paragraph or there isn't. Binary instructions are enforceable; scalar ones get interpreted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. They name a specific artifact or action.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Keep functions small" is vague. "If a function exceeds 40 lines, split it before returning" names a specific thing Claude can check. The more you describe the output rather than the quality, the more Claude can follow it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. They have no wiggle room for judgment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Prefer composition over inheritance" invites Claude to decide when composition is preferable. "Do not use class inheritance in this repo — use composition" closes the judgment loop. Instructions that leave room for discretion will be filled with the model's discretion.&lt;/p&gt;

&lt;h2&gt;
  
  
  A before/after on common instructions
&lt;/h2&gt;

&lt;p&gt;Here's a CLAUDE.md rewrite that applies those three properties:&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; &lt;code&gt;- Write concise explanations.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; &lt;code&gt;- When I ask a question, answer in 3 sentences or fewer unless I explicitly ask you to elaborate. No bullet lists in conversational responses.&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; &lt;code&gt;- Follow the project's code style.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; &lt;code&gt;- Before writing any code, read the nearest existing file in that directory and match its import style, spacing, and naming exactly. Do not introduce new patterns.&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; &lt;code&gt;- Don't add features I didn't ask for.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; &lt;code&gt;- Do not modify any file I didn't mention in my request. If you think an adjacent change would help, describe it in a sentence at the end — do not make it.&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; &lt;code&gt;- Be careful with database changes.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; &lt;code&gt;- Never generate a migration without a rollback. Every migration file must have both an \&lt;/code&gt;up&lt;code&gt;and a \&lt;/code&gt;down&lt;code&gt;function before you show it to me.&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;The right column isn't longer because it's more thorough. It's longer because specificity takes more words. But each instruction on the right has exactly one interpretation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two categories of CLAUDE.md instructions
&lt;/h2&gt;

&lt;p&gt;After rewriting mine a few times, I've found that instructions fall into two useful categories:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constraints on output&lt;/strong&gt; — things Claude must or must not produce. These work well when they're binary and verifiable at a glance. "Do not add comments to code unless I ask" is a constraint on output. So is "all functions must have return type annotations" or "never produce a file over 200 lines without splitting."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Procedural instructions&lt;/strong&gt; — steps Claude must take before doing the main task. "Before writing a function, read the test file for that module" is procedural. So is "before answering a question about the codebase, run grep for the relevant symbol and cite the file:line." Procedural instructions work because they give Claude a concrete first step that shapes what it does next.&lt;/p&gt;

&lt;p&gt;What doesn't work well: value statements, quality adjectives, relative preferences ("prefer X", "try to Y", "consider Z"). These are absorbed. They don't constrain.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to put at the top
&lt;/h2&gt;

&lt;p&gt;CLAUDE.md instructions have an effective range. The further down the file, the more likely they are to be deprioritized under a long task. Put the most important constraints at the top — not as an introduction or preamble, but as the first actual rules Claude encounters.&lt;/p&gt;

&lt;p&gt;My current CLAUDE.md opens with three lines that are binary, specific, and cover the mistakes I'd otherwise see every session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Rules (always follow)&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Do not modify files I did not explicitly mention. If a related change seems useful, list it at the end and wait.
&lt;span class="p"&gt;2.&lt;/span&gt; Do not add code comments unless I ask. Well-named variables explain themselves.
&lt;span class="p"&gt;3.&lt;/span&gt; When showing diffs or edits, show only the changed lines plus 2 lines of context — no full-file reprints.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those three lines have changed my sessions more than the other 40 instructions combined.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing whether it works
&lt;/h2&gt;

&lt;p&gt;You'll know your CLAUDE.md is working when you stop seeing the behavior you wrote it to prevent. That sounds obvious, but most people write CLAUDE.md once, never check whether it changed anything, and assume it did.&lt;/p&gt;

&lt;p&gt;A practical test: after updating CLAUDE.md, trigger the exact scenario the new instruction was meant to handle. Write a request that would previously have produced the unwanted behavior. If it still appears, the instruction isn't binary enough, isn't specific enough, or is too far down the file. Revise until it's gone.&lt;/p&gt;

&lt;p&gt;CLAUDE.md is not documentation. It's configuration. Treat it like a test suite: if the behavior you're trying to prevent still appears, the config is wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The free agent + where to go deeper
&lt;/h2&gt;

&lt;p&gt;If you're building a real CLAUDE.md-driven workflow, the same design philosophy applies to subagents: narrow scope, binary constraints, explicit refusal rules. The free &lt;code&gt;shipping-coach&lt;/code&gt; agent in this repo is a working example of that approach — it's a 50-line &lt;code&gt;.md&lt;/code&gt; file worth reading for its structure, not just its functionality:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/allcanprophesy-ops/claude-code-shipping-coach" rel="noopener noreferrer"&gt;github.com/allcanprophesy-ops/claude-code-shipping-coach&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A few more agents built the same way (&lt;code&gt;pr-surgeon&lt;/code&gt;, &lt;code&gt;regression-sentinel&lt;/code&gt;) are linked from the README if you want to see how the pattern scales beyond a single agent. But start by reading your own CLAUDE.md and asking: which of these instructions is binary? Which ones leave room for the model to decide? Fix the second category first.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's the one instruction in your CLAUDE.md that you've tried multiple phrasings for and it still doesn't work? Drop it in the comments — I'll suggest a binary rewrite.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Why Claude Code never runs your subagent</title>
      <dc:creator>Peter Huang</dc:creator>
      <pubDate>Wed, 03 Jun 2026 06:30:07 +0000</pubDate>
      <link>https://dev.to/peterverse180/why-claude-code-never-runs-your-subagent-4afm</link>
      <guid>https://dev.to/peterverse180/why-claude-code-never-runs-your-subagent-4afm</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; ‚Äî Claude Code's &lt;code&gt;description&lt;/code&gt; field is not a human-readable summary. It's the routing signal the dispatcher reads to decide whether to invoke your subagent. Write it like documentation and your agent never fires. Write it in trigger phrases and it runs when it should.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The agent that sat there doing nothing
&lt;/h2&gt;

&lt;p&gt;I wrote a subagent I was proud of. Solid instructions, tight scope, the works. Dropped it in &lt;code&gt;~/.claude/agents/&lt;/code&gt;, opened a project, did exactly the kind of task it was built for ‚Äî and nothing happened. Claude just did the work itself, inline, ignoring the agent entirely.&lt;/p&gt;

&lt;p&gt;I assumed I'd misconfigured something. I hadn't. The agent was fine. The problem was the one field I'd treated as an afterthought: the &lt;code&gt;description&lt;/code&gt;. I'd written it the way you'd write a README line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;An expert code reviewer that analyzes your code for quality and bugs.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sentence is for a human browsing a list. It is almost useless to the thing that actually decides whether to call my agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The description is a routing signal, not a summary
&lt;/h2&gt;

&lt;p&gt;Here's the part the docs don't emphasize enough: Claude Code has a &lt;strong&gt;dispatcher&lt;/strong&gt; that, on each turn, looks at what you asked for and decides whether any subagent should handle it. The only thing it has to make that decision ‚Äî before reading the agent's body at all ‚Äî is the &lt;code&gt;description&lt;/code&gt; field.&lt;/p&gt;

&lt;p&gt;So the description isn't documentation. It's a &lt;strong&gt;classifier input&lt;/strong&gt;. Its entire job is to help the dispatcher answer one question: &lt;em&gt;"Does this user's request match this agent?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;"An expert code reviewer that analyzes your code for quality and bugs" answers that badly. It describes an &lt;em&gt;identity&lt;/em&gt; ("an expert reviewer"), not a &lt;em&gt;trigger&lt;/em&gt;. The dispatcher is trying to match your request ‚Äî "can you check this before I push?" ‚Äî against the description, and "expert code reviewer" gives it very little surface to match on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a triggering description looks like
&lt;/h2&gt;

&lt;p&gt;Compare. Here's the same agent, rewritten so the dispatcher can actually route to it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Use when the user is about to commit, push, or open a PR and&lt;/span&gt;
  &lt;span class="s"&gt;wants a final check on their changes. Triggers on "review my changes",&lt;/span&gt;
  &lt;span class="s"&gt;"is this ready", "check before I push", "anything I missed". Reviews the&lt;/span&gt;
  &lt;span class="s"&gt;current diff for bugs and risk ‚Äî not style or naming.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look at what changed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;It leads with *when to use it&lt;/strong&gt;* ‚Äî "Use when the user is about to commit, push, or open a PR." The dispatcher is matching situations, so describe the situation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It lists literal trigger phrases&lt;/strong&gt; ‚Äî "review my changes", "is this ready", "check before I push". These are the actual words a user types. Giving the dispatcher real phrasings to match against is the single biggest improvement you can make.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It states an anti-trigger&lt;/strong&gt; ‚Äî "not style or naming." This stops the agent from grabbing requests that belong to a different agent (or to Claude itself). Scoping &lt;em&gt;out&lt;/em&gt; is as valuable as scoping in.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the whole trick. The body of the agent can stay exactly the same. Just the description determines whether it ever gets a chance to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  The anatomy of a description that fires
&lt;/h2&gt;

&lt;p&gt;A reliable pattern, in order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;description: Use [PROACTIVELY] when [situation]. Triggers on [literal user
phrases]. [What it does in one clause]. [Anti-trigger: when NOT to use it].
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Use PROACTIVELY when...&lt;/code&gt;&lt;/strong&gt; ‚Äî the word &lt;code&gt;PROACTIVELY&lt;/code&gt; is a real signal. It nudges the dispatcher to offer the agent without the user explicitly asking, for situations where that's wanted (a pre-commit check, say). Use it deliberately; don't sprinkle it on everything or every agent becomes pushy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Triggers on "..."&lt;/code&gt;&lt;/strong&gt; ‚Äî the highest-leverage part. Put 3‚Äì5 real phrasings a user would actually type. Think about how &lt;em&gt;you&lt;/em&gt; would ask for this thing on a tired Friday, not how you'd formally describe it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One clause on what it does&lt;/strong&gt; ‚Äî enough for the dispatcher to disambiguate between two similar agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An anti-trigger&lt;/strong&gt; ‚Äî "not X", "does not handle Y". Prevents overlap and false fires.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to test whether it routes
&lt;/h2&gt;

&lt;p&gt;Don't guess. After writing the description, open Claude Code and &lt;em&gt;say the trigger phrase you put in the description.&lt;/em&gt; If the agent fires, the routing works. If it doesn't, your description and your real phrasing have drifted apart ‚Äî fix the description to match how you actually talk.&lt;/p&gt;

&lt;p&gt;A good loop: use the agent for a week, notice the times you &lt;em&gt;wanted&lt;/em&gt; it and it didn't fire, and add those exact phrasings to the &lt;code&gt;Triggers on&lt;/code&gt; list. The description is a living classifier; tune it against reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  See it in a real agent
&lt;/h2&gt;

&lt;p&gt;If you want a working reference, my free, MIT-licensed Claude Code agent has a description written exactly this way ‚Äî situation, trigger phrases, anti-triggers, and a deliberate &lt;code&gt;PROACTIVELY&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;üëâ &lt;strong&gt;&lt;a href="https://github.com/allcanprophesy-ops/claude-code-shipping-coach" rel="noopener noreferrer"&gt;github.com/allcanprophesy-ops/claude-code-shipping-coach&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;shipping-coach.md ~/.claude/agents/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the file and read the frontmatter ‚Äî the &lt;code&gt;description&lt;/code&gt; is doing real routing work, and it's the part worth copying before anything else. A few more agents built the same way (&lt;code&gt;pr-surgeon&lt;/code&gt;, &lt;code&gt;regression-sentinel&lt;/code&gt;, &lt;code&gt;test-gap-hunter&lt;/code&gt;) are linked from the README, but start by reading one description closely. It's the most important line in the file.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you written a Claude Code agent that just never fires? Paste its &lt;code&gt;description&lt;/code&gt; in the comments and I'll suggest how to make it trigger.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Your AI writes PR descriptions from your commit messages. That's the bug.</title>
      <dc:creator>Peter Huang</dc:creator>
      <pubDate>Sun, 31 May 2026 06:37:43 +0000</pubDate>
      <link>https://dev.to/peterverse180/your-ai-writes-pr-descriptions-from-your-commit-messages-thats-the-bug-795</link>
      <guid>https://dev.to/peterverse180/your-ai-writes-pr-descriptions-from-your-commit-messages-thats-the-bug-795</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; ‚Äî Commit messages describe your &lt;em&gt;intentions&lt;/em&gt;. The diff describes &lt;em&gt;reality&lt;/em&gt;. They drift apart over the life of a branch, and most AI PR-description tools summarize the wrong one. A good PR agent reads the diff. Here's the design, and a free agent to start from.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The PR description that lied
&lt;/h2&gt;

&lt;p&gt;A reviewer pinged me on a PR last year: &lt;em&gt;"The description says this adds rate limiting, but I'm looking at the diff and it also changes how we hash session tokens. Was that intentional?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It was intentional. I'd done both on the same branch. But my PR description only mentioned the rate limiting ‚Äî because I'd generated it from my commit messages, and my commit messages were &lt;code&gt;wip&lt;/code&gt;, &lt;code&gt;rate limit middleware&lt;/code&gt;, &lt;code&gt;fix&lt;/code&gt;, &lt;code&gt;fix again&lt;/code&gt;, and &lt;code&gt;tweak&lt;/code&gt;. The token-hashing change rode in under &lt;code&gt;fix again&lt;/code&gt;. The tool that wrote my description faithfully summarized my commits, and my commits faithfully hid half of what I'd actually done.&lt;/p&gt;

&lt;p&gt;The reviewer caught it. But the whole point of a PR description is that the reviewer &lt;em&gt;shouldn't have to&lt;/em&gt; reconstruct the change from the diff themselves. That's the job I was supposed to have done.&lt;/p&gt;

&lt;h2&gt;
  
  
  Commits are intentions; the diff is reality
&lt;/h2&gt;

&lt;p&gt;Here's the core problem with generating PR descriptions from commit messages: &lt;strong&gt;a commit message is what you believed you were doing at the moment you typed it.&lt;/strong&gt; It's written before the change is finished, often mid-thought, frequently as &lt;code&gt;fix&lt;/code&gt; or &lt;code&gt;address review comments&lt;/code&gt;. It captures intent, badly, at a point in time.&lt;/p&gt;

&lt;p&gt;The diff against the base branch is different. It's the complete, current, factual statement of &lt;em&gt;what will actually change when this merges.&lt;/em&gt; Every renamed function, every altered return shape, every migration, every accidental &lt;code&gt;console.log&lt;/code&gt; ‚Äî all of it is in the diff, and none of it lies, because the diff is the thing being merged.&lt;/p&gt;

&lt;p&gt;When an AI tool paraphrases your commit messages, it's summarizing a summary ‚Äî one written hastily, by you, before you were done. Garbage in, plausible-sounding garbage out. The description reads fine. It's just not true.&lt;/p&gt;

&lt;h2&gt;
  
  
  A PR agent should read the diff
&lt;/h2&gt;

&lt;p&gt;The fix is almost embarrassingly simple to state: have the agent read &lt;code&gt;git diff &amp;lt;base&amp;gt;...HEAD&lt;/code&gt;, not &lt;code&gt;git log&lt;/code&gt;. But there are a few design choices that separate a PR agent you trust from one that produces filler.&lt;/p&gt;

&lt;p&gt;Here's the shape of one (this is a real Claude Code subagent; the structure matters more than the exact wording):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pr-surgeon&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;when&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;about&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;open&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;push&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pull&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;request.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Reads&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the"&lt;/span&gt;
  &lt;span class="s"&gt;actual diff against the base branch and writes a tight PR title + body&lt;/span&gt;
  &lt;span class="s"&gt;with a real test plan. Triggers on "open a PR", "PR description",&lt;/span&gt;
  &lt;span class="s"&gt;"ready to merge".&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bash, Read, Grep, Glob&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inherit&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

You write PR descriptions that reviewers actually read.

&lt;span class="gu"&gt;## Procedure&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Find the base branch (gh pr view --json baseRefName, else origin HEAD,
   else main).
&lt;span class="p"&gt;2.&lt;/span&gt; Read the FULL diff: &lt;span class="sb"&gt;`git diff &amp;lt;base&amp;gt;...HEAD`&lt;/span&gt;. Also read the commit
   list ‚Äî but the diff is the source of truth. Commit messages lie; diffs
   don't.
&lt;span class="p"&gt;3.&lt;/span&gt; Identify the ONE thing this PR does. If it does more than one thing,
   say so explicitly under a "Note to reviewer" heading ‚Äî do not hide it.

&lt;span class="gu"&gt;## Output&lt;/span&gt;

&lt;span class="gu"&gt;### Title  ‚Äî &amp;lt;70 chars, imperative, matches the repo's recent PR style&amp;gt;&lt;/span&gt;
&lt;span class="gu"&gt;### What changed ‚Äî 2‚Äì5 bullets, each a concrete behavior change, not a file&lt;/span&gt;
&lt;span class="gu"&gt;### Why ‚Äî the user-visible reason. If you can't find it, ASK. Don't invent.&lt;/span&gt;
&lt;span class="gu"&gt;### Test plan ‚Äî a checklist a reviewer can actually run. Real commands.&lt;/span&gt;
&lt;span class="gu"&gt;### Risk / rollback ‚Äî one line: what breaks if this is wrong, how to revert.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things in there are doing the real work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. "The diff is the source of truth. Commit messages lie."&lt;/strong&gt; This single instruction is the whole thesis. The agent is allowed to &lt;em&gt;read&lt;/em&gt; the commits for context, but it must reconcile them against the diff and trust the diff. That's what would have caught my token-hashing change ‚Äî it's in the diff whether or not a commit mentions it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. "If it does more than one thing, say so explicitly."&lt;/strong&gt; Agents, like people, want to present a clean single-purpose story. An honest PR agent surfaces scope creep instead of smoothing it over. The "Note to reviewer" heading is where the token-hashing change would have been forced into the open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. "Why ‚Äî if you can't find it, ASK. Don't invent."&lt;/strong&gt; This is the anti-hallucination guard. The diff tells you &lt;em&gt;what&lt;/em&gt; changed but rarely &lt;em&gt;why&lt;/em&gt;. A weaker agent fills that vacuum with confident fiction ("this refactor improves maintainability"). A good one admits the gap and asks you, because a made-up rationale is worse than a blank.&lt;/p&gt;

&lt;h2&gt;
  
  
  The section everyone skips: the test plan
&lt;/h2&gt;

&lt;p&gt;Most PR descriptions ‚Äî human or AI ‚Äî stop at "what changed." But the highest-value part of a PR description for the person reviewing it is the &lt;strong&gt;test plan&lt;/strong&gt;: a concrete, runnable checklist of how to verify the change does what it claims.&lt;/p&gt;

&lt;p&gt;Not "tested locally." That's noise. A real test plan looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- [ ] `npm run test:rate-limit` passes
- [ ] Hit `/api/login` 6√ó in 10s ‚Üí 6th returns 429
- [ ] Existing session tokens still validate after deploy (no forced logout)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last line is exactly the kind of thing a diff-reading agent can generate and a commit-summarizing agent never could ‚Äî because the token-hashing change is &lt;em&gt;in the diff&lt;/em&gt;, so the agent knows to tell the reviewer to check that existing sessions survive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build your own, or start from a free one
&lt;/h2&gt;

&lt;p&gt;The pattern generalizes to any "summarize a change" agent: &lt;strong&gt;read the artifact that represents reality (the diff, the schema, the built output), not the artifact that represents intention (commit messages, ticket titles, your own memory).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want a starting point, my free, MIT-licensed Claude Code agent is a good template for the &lt;em&gt;structure&lt;/em&gt; of a focused agent ‚Äî tool scoping, fixed output format, explicit refusal rules:&lt;/p&gt;

&lt;p&gt;üëâ &lt;strong&gt;&lt;a href="https://github.com/allcanprophesy-ops/claude-code-shipping-coach" rel="noopener noreferrer"&gt;github.com/allcanprophesy-ops/claude-code-shipping-coach&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;shipping-coach.md ~/.claude/agents/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's a pre-merge checker rather than a PR writer, but it's built on the same bones, and reading one well-structured agent file teaches you more than any amount of theory. The &lt;code&gt;pr-surgeon&lt;/code&gt; agent sketched above ‚Äî plus a few others (&lt;code&gt;regression-sentinel&lt;/code&gt;, &lt;code&gt;test-gap-hunter&lt;/code&gt;) ‚Äî are linked from that repo's README if you'd rather not build from scratch. But honestly: read the free one first, then decide.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's the worst PR description you've ever had to review ‚Äî or write? I'm collecting examples of where "summarize the commits" goes wrong. Drop one in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>git</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The best Claude Code agents are defined by what they refuse to do</title>
      <dc:creator>Peter Huang</dc:creator>
      <pubDate>Sun, 31 May 2026 06:09:35 +0000</pubDate>
      <link>https://dev.to/peterverse180/the-best-claude-code-agents-are-defined-by-what-they-refuse-to-do-13p2</link>
      <guid>https://dev.to/peterverse180/the-best-claude-code-agents-are-defined-by-what-they-refuse-to-do-13p2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; ‚Äî When I write a Claude Code subagent, the most important part isn't the instructions for what it &lt;em&gt;should&lt;/em&gt; do. It's the list of what it must &lt;strong&gt;refuse&lt;/strong&gt; to do. This post explains why, and walks through a real ~50-line agent (free, MIT) that catches the embarrassing stuff in your diff before you merge.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The agent that was too helpful
&lt;/h2&gt;

&lt;p&gt;The first useful-sounding Claude Code subagent I ever wrote was a "code reviewer." The prompt was the obvious thing: &lt;em&gt;"You are a senior engineer. Review this diff thoroughly. Comment on bugs, style, naming, architecture, performance, security, and test coverage."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It worked, technically. It produced a review. A long one. Every single time.&lt;/p&gt;

&lt;p&gt;And that was the problem. A review that flags 23 things trains you to read zero of them. The signal ‚Äî &lt;em&gt;"you left a hardcoded API key in here"&lt;/em&gt; ‚Äî was buried on line 14 between &lt;em&gt;"consider extracting this into a helper"&lt;/em&gt; and &lt;em&gt;"this variable name could be more descriptive."&lt;/em&gt; I started skimming its output. Then I started ignoring it. An agent you ignore is worse than no agent, because you've paid the latency and convinced yourself you "have review covered."&lt;/p&gt;

&lt;p&gt;The fix wasn't a better "what to do" list. It was the opposite.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refusal lists
&lt;/h2&gt;

&lt;p&gt;Here's the reframe that made my agents actually useful: &lt;strong&gt;a good agent has exactly one job, and an explicit list of things it will not do ‚Äî even when it could.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The "will not do" list is the load-bearing part. LLMs are eager. Given any opening to be more thorough, more helpful, more comprehensive, they take it. Left unconstrained, every agent drifts toward the same bloated generalist that comments on everything. The refusal list is what holds the agent to a sharp edge.&lt;/p&gt;

&lt;p&gt;Concretely, a refusal list looks like this (from a pre-merge check agent):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Do not autofix. The user fixes; you verify.
&lt;span class="p"&gt;-&lt;/span&gt; Do not comment on naming, design, architecture, or "could be cleaner."
  Other tools do that. Your job is narrower.
&lt;span class="p"&gt;-&lt;/span&gt; Do not pad the report. If there are no blockers, say so in one line.
  A short honest report beats a long padded one.
&lt;span class="p"&gt;-&lt;/span&gt; Do not run anything destructive (db resets, --fix flags that rewrite
  files) without explicit user request.
&lt;span class="p"&gt;-&lt;/span&gt; If a check tool isn't installed, say "skipped: &lt;span class="nt"&gt;&amp;lt;reason&amp;gt;&lt;/span&gt;" ‚Äî do not
  fake a pass.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every line there is closing a door the model would otherwise wander through. "Don't pad the report" exists because the model &lt;em&gt;wants&lt;/em&gt; to look thorough. "Don't fake a pass" exists because the model &lt;em&gt;wants&lt;/em&gt; to give you good news. You are not describing a task; you are fencing in a behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  A real example: a 90-second pre-merge check
&lt;/h2&gt;

&lt;p&gt;Let me make this concrete with an agent I actually use on every project. It has one job: catch the stuff that would embarrass you in PR review, before you open the PR. Leftover &lt;code&gt;console.log&lt;/code&gt;. A hardcoded key. A &lt;code&gt;.skip&lt;/code&gt; you forgot to remove. A &lt;code&gt;.DS_Store&lt;/code&gt; in the diff.&lt;/p&gt;

&lt;p&gt;Here's the shape of it (the full file is &lt;a href="https://github.com/allcanprophesy-ops/claude-code-shipping-coach" rel="noopener noreferrer"&gt;on GitHub, MIT&lt;/a&gt; ‚Äî link at the end):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;shipping-coach&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;as&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;final&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pre-merge&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pre-deploy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;check.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Runs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fast,"&lt;/span&gt;
  &lt;span class="s"&gt;opinionated checklist over the diff. Triggers on "ready to ship",&lt;/span&gt;
  &lt;span class="s"&gt;"pre-flight", "before I merge", "final check".&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bash, Read, Grep, Glob&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inherit&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

You are the last set of eyes before code ships. Be fast, be specific,
be hard to argue with.

&lt;span class="gu"&gt;## Checklist (run in parallel where possible)&lt;/span&gt;

&lt;span class="gu"&gt;### 1. Debug residue&lt;/span&gt;
Search the diff for: console.log, print(, debugger, .only(, .skip(,
new TODO/FIXME. Report only matches ADDED by this diff.

&lt;span class="gu"&gt;### 2. Secret leaks&lt;/span&gt;
Search for API key shapes (sk-, ghp_, AKIA, AIza), .env contents,
credentials in URLs, private keys. Treat any match as stop-the-line.

&lt;span class="gu"&gt;### 3. Type / lint / test status&lt;/span&gt;
Detect the project's check commands from package.json / Makefile /
pyproject.toml. Run typecheck, then lint, then tests. Stop on first fail.

&lt;span class="gu"&gt;### 4. Tracked junk&lt;/span&gt;
Check for .DS_Store, .env, node_modules/, build output that snuck
past .gitignore.

&lt;span class="gu"&gt;## Output format&lt;/span&gt;

&lt;span class="gu"&gt;## Pre-ship report (took &amp;lt;Xs&amp;gt;)&lt;/span&gt;
&lt;span class="gu"&gt;### Blockers (N)        &amp;lt;- empty heading if none, never omit it&lt;/span&gt;
&lt;span class="gu"&gt;### Worth a look (N)&lt;/span&gt;
&lt;span class="gu"&gt;### Passed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three design choices in there are worth calling out, because they're the difference between an agent you trust and one you mute:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. "Report only matches ADDED by this diff."&lt;/strong&gt; Without this, the agent flags every pre-existing &lt;code&gt;console.log&lt;/code&gt; in the repo and the report is instantly noise. Scope is the diff, not the codebase. One sentence, huge signal difference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. A fixed output format with a "Blockers" section that's never omitted.&lt;/strong&gt; Even when there are zero blockers, the heading stays (showing "Blockers (0)"). This sounds pedantic but it's a trust mechanism ‚Äî you learn the report's shape, so you can read it in two seconds and know exactly where to look. Variable-shape output forces re-reading every time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. &lt;code&gt;tools: Bash, Read, Grep, Glob&lt;/code&gt; ‚Äî and nothing else.&lt;/strong&gt; No &lt;code&gt;Write&lt;/code&gt;, no &lt;code&gt;Edit&lt;/code&gt;. The agent &lt;em&gt;cannot&lt;/em&gt; modify your files even if it wanted to, because you didn't give it the tools. Tool scoping is a guardrail you enforce at the schema level, not a promise you hope the prompt keeps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try building your own
&lt;/h2&gt;

&lt;p&gt;The pattern generalizes. Pick any narrow job ‚Äî writing a PR description from the actual diff, finding which behaviors your change might break, auditing dependencies ‚Äî and write the agent as:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One job&lt;/strong&gt;, stated in a sentence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A &lt;code&gt;description&lt;/code&gt; that the dispatcher will actually route to&lt;/strong&gt; ‚Äî write it in trigger phrases ("when the user says X"), not abstract capability claims.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool scoping&lt;/strong&gt; ‚Äî give it only the tools the job needs. Withhold &lt;code&gt;Write&lt;/code&gt;/&lt;code&gt;Edit&lt;/code&gt; from anything that should only &lt;em&gt;report&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An output format&lt;/strong&gt; ‚Äî so results are pipeable and skimmable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A refusal list&lt;/strong&gt; ‚Äî the doors you're closing. This is the part everyone skips and it's the part that matters.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Drop the file in &lt;code&gt;~/.claude/agents/&lt;/code&gt; and Claude Code picks it up automatically. No framework, no config.&lt;/p&gt;

&lt;h2&gt;
  
  
  The free agent + where to go deeper
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;shipping-coach&lt;/code&gt; agent above is free and MIT-licensed ‚Äî the whole thing is one ~50-line &lt;code&gt;.md&lt;/code&gt; file you can read, fork, and modify:&lt;/p&gt;

&lt;p&gt;üëâ &lt;strong&gt;&lt;a href="https://github.com/allcanprophesy-ops/claude-code-shipping-coach" rel="noopener noreferrer"&gt;github.com/allcanprophesy-ops/claude-code-shipping-coach&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Install is one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;shipping-coach.md ~/.claude/agents/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in any repo with uncommitted changes, just say &lt;em&gt;"run the pre-flight check on my diff."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If the pattern clicks and you want more agents built the same way ‚Äî &lt;code&gt;pr-surgeon&lt;/code&gt; (writes PR descriptions from the actual diff), &lt;code&gt;regression-sentinel&lt;/code&gt; (reads a diff asking only "what could this break?"), &lt;code&gt;test-gap-hunter&lt;/code&gt;, and a few others ‚Äî they're linked from the repo's README. But the free one is genuinely standalone; start there.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's the one task in your workflow you wish was automated but isn't? I'm collecting edge cases where a narrow agent would help ‚Äî drop them in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
