<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ian</title>
    <description>The latest articles on DEV Community by Ian (@ianymu).</description>
    <link>https://dev.to/ianymu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3816899%2Ff9df278a-c359-4415-951b-01073e2cef1e.jpg</url>
      <title>DEV Community: Ian</title>
      <link>https://dev.to/ianymu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ianymu"/>
    <language>en</language>
    <item>
      <title>I gave Claude Code internet eyes (and didn't have to build the tool myself)</title>
      <dc:creator>Ian</dc:creator>
      <pubDate>Thu, 21 May 2026 14:40:43 +0000</pubDate>
      <link>https://dev.to/ianymu/i-gave-claude-code-internet-eyes-and-didnt-have-to-build-the-tool-myself-2o42</link>
      <guid>https://dev.to/ianymu/i-gave-claude-code-internet-eyes-and-didnt-have-to-build-the-tool-myself-2o42</guid>
      <description>&lt;p&gt;Last week I asked Claude Code a question that should have been trivial.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Find me three recent Hacker News threads complaining about LangChain's debugging story, then summarize the common pain points."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude Code wrote me a confident, well-structured answer. Three threads. Direct quotes. Clean takeaways.&lt;/p&gt;

&lt;p&gt;Every URL was made up. Two of the "users" did not exist. The "quotes" were paraphrased fan fiction.&lt;/p&gt;

&lt;p&gt;This is not a failure of the model. This is a failure of the environment. &lt;strong&gt;Claude Code is blind to anything that is not in your repo or the docs you fed it at session start.&lt;/strong&gt; It does not browse. It does not search. It does not read Twitter or Reddit or YouTube. When you ask it to do those things anyway, it does the only thing it can: it pattern-matches what such an answer probably looks like, and serves it.&lt;/p&gt;

&lt;p&gt;For a few months I had been writing my own Stop-hook (&lt;a href="https://github.com/ianymu" rel="noopener noreferrer"&gt;verify-before-stop&lt;/a&gt;) that intercepts these moments — the agent claiming a task is complete when it has no actual proof. It catches the symptom. It does not cure the disease. The disease is the blindness.&lt;/p&gt;

&lt;p&gt;I went looking for a cure.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pattern: post-2026-05 Claude Code is a sealed room
&lt;/h2&gt;

&lt;p&gt;Anthropic shipped longer Claude Code sessions in early 2026. The model can now hold context across hours of work. The side effect: &lt;strong&gt;longer sessions amplify the blindness&lt;/strong&gt;. The agent goes deeper into a project without ever opening a window to the outside world. When you finally ask it "is there a known issue with this library?", it has been trained for the entire session to act confidently — and a confident hallucination is worse than a confused one.&lt;/p&gt;

&lt;p&gt;If you have been using Claude Code seriously for the last month, you have probably seen at least one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent confidently cites a Stack Overflow answer that does not exist.&lt;/li&gt;
&lt;li&gt;Agent "checks Twitter" and reports sentiment about a product, all invented.&lt;/li&gt;
&lt;li&gt;Agent claims a fix is "consistent with the latest documentation" — when its training data is six months stale.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;WebFetch helps a little. It lets you hand the agent one URL. But the agent cannot find the URL on its own. You have to know what to feed it. Playwright MCP is heavier — useful for end-to-end tests, overkill for "summarize this tweet." Neither covers the wide platform mix where the actual signal lives: Reddit, YouTube transcripts, Twitter discussions, GitHub issues.&lt;/p&gt;

&lt;p&gt;I wanted something that worked the way I think about the problem. "Agent, go look at the actual thing on the actual platform, then come back."&lt;/p&gt;




&lt;h2&gt;
  
  
  Discovery
&lt;/h2&gt;

&lt;p&gt;I was browsing &lt;code&gt;gh search repos --topic claude-code&lt;/code&gt; and landed on a project called &lt;strong&gt;Agent-Reach&lt;/strong&gt; by a Chinese indie developer who goes by Panniantong.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;20,025 stars&lt;/li&gt;
&lt;li&gt;MIT licensed&lt;/li&gt;
&lt;li&gt;Last commit two days ago&lt;/li&gt;
&lt;li&gt;A README that opens with the most honest pitch I have read all year:&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;AI Agents can already write code, edit docs, and manage projects — but the moment you ask one to "look something up online," it goes blind.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The README is in Chinese. The entire English-speaking developer world has been complaining about this exact problem on r/ClaudeAI for six months without finding the tool that solves it, because the tool is sitting one search bar away in a language they do not read.&lt;/p&gt;

&lt;p&gt;I cloned it, ran the one-line install, and an hour later Claude Code could read Twitter, search Reddit, extract YouTube subtitles, and pull GitHub issue threads on its own.&lt;/p&gt;

&lt;p&gt;Here is the install command, in case you want to follow along:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Install Agent Reach: https://raw.githubusercontent.com/Panniantong/Agent-Reach/main/docs/install.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You paste that into Claude Code. The agent walks itself through the rest.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three things I actually used it for
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. "Has anyone hit this same React 19 hydration bug?"&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Search Reddit for posts about "React 19 hydration mismatch SSR error" from the last month.
&amp;gt; Read the top three and summarize the common root causes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code called &lt;code&gt;rdt search&lt;/code&gt; (the underlying tool is &lt;code&gt;rdt-cli&lt;/code&gt;, an open-source Reddit CLI with cookie auth — Reddit started requiring auth in 2024, so the cookie flow is required), pulled real threads, read the comments, and came back with three actual root causes from real engineers. Two of them I had not considered. One of them was my actual bug.&lt;/p&gt;

&lt;p&gt;This was the first time in a week I had Claude Code give me an answer that did not require me to verify it by hand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. "Summarize this 40-minute YouTube tutorial without me watching it."&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; yt-dlp the subtitles for https://youtube.com/watch?v=... then give me a 200-word summary.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Subtitles flow through &lt;code&gt;yt-dlp&lt;/code&gt; — same tool people have been using for years, just wired up so the agent can call it directly. Works on YouTube, Bilibili, and the 1,800 other sites yt-dlp supports. No API key. The summary was clean and the agent quoted actual lines from the video.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. "What are people on Twitter saying about Claude 4.6 vs GPT-5.4?"&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; twitter search "claude 4.6 vs gpt 5.4" — top 20 recent — summarize the consensus.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This one uses &lt;code&gt;twitter-cli&lt;/code&gt; with cookie auth (the official Twitter API costs $215/month for moderate use; cookie auth is free, and you should use a burner account because the cookie is full session access). Twenty real tweets came back. The summary distinguished between "developers complaining about Claude refusals" and "marketers excited about GPT speed" — a distinction the model would have flattened into one fuzzy take if it had been hallucinating.&lt;/p&gt;

&lt;p&gt;That third one is where it clicked. &lt;strong&gt;The value of Agent-Reach is not that it reads Twitter. It is that the agent stops needing to guess.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this matters now, specifically
&lt;/h2&gt;

&lt;p&gt;Three things converged in the last 90 days:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code sessions got longer.&lt;/strong&gt; You spend more uninterrupted time with the agent. Each hour of blindness compounds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The web got more anti-scraping.&lt;/strong&gt; Reddit, X, and Bilibili all tightened. Naive WebFetch returns less and less.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source CLI wrappers got mature.&lt;/strong&gt; &lt;code&gt;yt-dlp&lt;/code&gt;, &lt;code&gt;rdt-cli&lt;/code&gt;, &lt;code&gt;twitter-cli&lt;/code&gt;, &lt;code&gt;gh CLI&lt;/code&gt;, &lt;code&gt;xhs-cli&lt;/code&gt;, &lt;code&gt;mcporter&lt;/code&gt; — each is independently maintained, free, and battle-tested. Agent-Reach is the scaffolding that wires them all up in one install command.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The bet of the project is bold and correct: do not build a wrapper. Be the install script. Once the agent has the upstream CLIs and a SKILL.md that tells it which command to reach for, the agent operates the tools directly. There is no Agent-Reach in the runtime path. It just put the right things on disk.&lt;/p&gt;

&lt;p&gt;This means: zero ongoing dependency on the maintainer. If &lt;code&gt;twitter-cli&lt;/code&gt; updates, you update &lt;code&gt;twitter-cli&lt;/code&gt;. If Agent-Reach disappeared tomorrow, your agent would keep working.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it compares
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Strength&lt;/th&gt;
&lt;th&gt;Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WebFetch&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Built into Claude Code, no install&lt;/td&gt;
&lt;td&gt;You have to know the URL. Agent cannot discover or search.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Playwright MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full browser control, scriptable flows&lt;/td&gt;
&lt;td&gt;Heavy. Overkill for "read a tweet." Slow to spin up.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Paid SaaS (Mention, Brand24, Apify)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Polished, supported&lt;/td&gt;
&lt;td&gt;$50–$200/month, third-party data flow, no agent-native command surface.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent-Reach&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One install, native CLI per platform, agent calls upstream tools directly&lt;/td&gt;
&lt;td&gt;Cookie-based platforms need a burner account; some platforms (Bilibili from a US server) need a proxy.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For my workflow — Claude Code in a terminal, daily-driver — Agent-Reach won on every axis that matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  A small note on the broader pattern
&lt;/h2&gt;

&lt;p&gt;I have been writing a different kind of Claude Code add-on for a few months: a Stop-hook called &lt;code&gt;verify-before-stop&lt;/code&gt; that fires when the agent says "done" without proof. It is the same family of intervention as Agent-Reach, just on the other end of the pipeline. Agent-Reach gives the agent eyes before it thinks; the Stop-hook checks that it actually used them before it claims to be done. Different surfaces of the same belief: &lt;strong&gt;Claude Code should not be allowed to fabricate its way to "complete."&lt;/strong&gt; If you have ideas about the Stop-hook angle I'm happy to talk; the bigger and more immediately useful tool is the one this post is actually about.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/Panniantong/Agent-Reach" rel="noopener noreferrer"&gt;github.com/Panniantong/Agent-Reach&lt;/a&gt; (MIT, 20k stars, active maintainer)&lt;/p&gt;

&lt;p&gt;Install (paste into Claude Code, Cursor, OpenClaw, or any agent that runs shell commands):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Install Agent Reach: https://raw.githubusercontent.com/Panniantong/Agent-Reach/main/docs/install.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The original README is in Chinese. I submitted an English README PR last night (&lt;a href="https://github.com/Panniantong/Agent-Reach/pull/301" rel="noopener noreferrer"&gt;PR #301&lt;/a&gt;) — feel free to read either version. The author, Panniantong, is an indie developer building this in their spare time. If you find this useful and want to help an underrated piece of open-source plumbing get the English-market attention it deserves, &lt;strong&gt;leave a comment here or DM the author at &lt;a href="https://x.com/Neo_Reidlab" rel="noopener noreferrer"&gt;@Neo_Reidlab&lt;/a&gt;&lt;/strong&gt; — they are open to coordinating an English-market launch if there's appetite.&lt;/p&gt;

&lt;p&gt;Star the repo. It is the cheapest way I know to give the agents you depend on a pair of eyes.&lt;/p&gt;

&lt;p&gt;— Ian&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I spawned 25 Claude Code subagents in one night. Here's what I learned.</title>
      <dc:creator>Ian</dc:creator>
      <pubDate>Wed, 20 May 2026 23:23:24 +0000</pubDate>
      <link>https://dev.to/ianymu/i-spawned-25-claude-code-subagents-in-one-night-heres-what-i-learned-15ka</link>
      <guid>https://dev.to/ianymu/i-spawned-25-claude-code-subagents-in-one-night-heres-what-i-learned-15ka</guid>
      <description>&lt;p&gt;I gave myself $1,000 and 24 hours to ship something live. By hour 8 I had spawned roughly 25 Claude Code subagents in parallel, built 37 Apify Actors, and pushed all of them into Apify's publish pipeline. As of this morning, 5 are LIVE on the Apify Store; the other 31 are sitting in a queue waiting for Apify's 5-actors-per-day publishing quota to drip them out over the next week.&lt;/p&gt;

&lt;p&gt;This is the postmortem. Real numbers, real prompts, real failures. No "10x productivity" framing — just what worked and what didn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What got built
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;37 Apify Actors total&lt;/strong&gt; — each one is a &lt;code&gt;src/main.js&lt;/code&gt;, a &lt;code&gt;.actor/input_schema.json&lt;/code&gt;, a &lt;code&gt;.actor/dataset_schema.json&lt;/code&gt;, an &lt;code&gt;actor.json&lt;/code&gt;, a README, and an &lt;code&gt;apify push&lt;/code&gt; to the platform.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 LIVE&lt;/strong&gt; as of this writing (&lt;code&gt;apify.com/ianymu&lt;/code&gt;): &lt;code&gt;llms-txt-converter&lt;/code&gt;, &lt;code&gt;claudemd-security-auditor&lt;/code&gt;, &lt;code&gt;gh-issue-to-claude-prompts&lt;/code&gt;, &lt;code&gt;mcp-server-catalog&lt;/code&gt;, &lt;code&gt;claudemd-generator&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;31 BUILT but not yet public&lt;/strong&gt; — published-state set to private, sitting in a daemon's queue, waiting for the quota window.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~25 subagent processes&lt;/strong&gt; spawned over ~8 hours, mostly running 4-at-a-time in the background.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Actors themselves aren't the interesting part. The interesting part is how a single human in the loop can keep 25 background processes from drifting into garbage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four things that worked
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Constrained prompts, not vague ones
&lt;/h3&gt;

&lt;p&gt;Every subagent got a prompt of about 200-300 words with explicit, non-negotiable constraints. Here's a redacted skeleton of what I actually sent (this one was for the &lt;code&gt;mcp-server-catalog&lt;/code&gt; actor):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;You&lt;/span&gt; &lt;span class="nx"&gt;are&lt;/span&gt; &lt;span class="nx"&gt;building&lt;/span&gt; &lt;span class="nx"&gt;one&lt;/span&gt; &lt;span class="nx"&gt;Apify&lt;/span&gt; &lt;span class="nx"&gt;Actor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`mcp-server-catalog`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="nx"&gt;Constraints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;js&lt;/span&gt; &lt;span class="nx"&gt;uses&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;apify&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="nx"&gt;Actor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="nx"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="nx"&gt;ESM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Input&lt;/span&gt; &lt;span class="nx"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;maxServers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;integer&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;keywordFilter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Output&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;one&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt; &lt;span class="nx"&gt;per&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt; &lt;span class="nx"&gt;exact&lt;/span&gt; &lt;span class="nx"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;fullName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;stars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;qualityScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;int &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nx"&gt;license&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;scoreBreakdown&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;stars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;recency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;license&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;activity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;int&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Sources&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;punkpeye&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;awesome&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;mcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;servers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;modelcontextprotocol&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;servers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;wong2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;awesome&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;mcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;servers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="nx"&gt;Dedupe&lt;/span&gt; &lt;span class="nx"&gt;by&lt;/span&gt; &lt;span class="nx"&gt;fullName&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;README&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;md&lt;/span&gt; &lt;span class="nx"&gt;must&lt;/span&gt; &lt;span class="nx"&gt;include&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt; &lt;span class="nx"&gt;purpose&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="nx"&gt;example&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="nx"&gt;example&lt;/span&gt;
  &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="nx"&gt;one&lt;/span&gt; &lt;span class="nx"&gt;real&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Try it&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="nx"&gt;link&lt;/span&gt; &lt;span class="nx"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Run&lt;/span&gt; &lt;span class="s2"&gt;`apify push`&lt;/span&gt; &lt;span class="nx"&gt;at&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="nx"&gt;Do&lt;/span&gt; &lt;span class="nx"&gt;not&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt; &lt;span class="s2"&gt;`apify call`&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;costs&lt;/span&gt; &lt;span class="nx"&gt;money&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Do&lt;/span&gt; &lt;span class="nx"&gt;NOT&lt;/span&gt; &lt;span class="nx"&gt;add&lt;/span&gt; &lt;span class="nx"&gt;tests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;CI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;TypeScript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;eslint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;or&lt;/span&gt; &lt;span class="nx"&gt;extra&lt;/span&gt; &lt;span class="nx"&gt;files&lt;/span&gt; &lt;span class="nx"&gt;I&lt;/span&gt; &lt;span class="nx"&gt;didn&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;t ask for.
- Done = you can show me the actor page URL and one sample dataset row.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The constraints I learned to put in writing, one by one, as earlier subagents broke them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Do NOT add TypeScript" — one drifted into a &lt;code&gt;tsconfig.json&lt;/code&gt; and a half-converted &lt;code&gt;.ts&lt;/code&gt; file. Cost 20 minutes to clean up.&lt;/li&gt;
&lt;li&gt;"Do NOT run &lt;code&gt;apify call&lt;/code&gt;" — one happily burned ~$0.30 of platform credit running its own actor to "verify it works." It did work. That wasn't the point.&lt;/li&gt;
&lt;li&gt;"Exact dataset shape" — three actors invented their own keys (&lt;code&gt;name&lt;/code&gt; vs &lt;code&gt;fullName&lt;/code&gt;, &lt;code&gt;score&lt;/code&gt; vs &lt;code&gt;qualityScore&lt;/code&gt;). Made the downstream comparison spreadsheet useless until I refactored.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Vague prompts produce vague output, every time. A subagent that's free to interpret will interpret in whatever direction lets it finish faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;run_in_background: true&lt;/code&gt; was the unlock
&lt;/h3&gt;

&lt;p&gt;The default in the Agent tool is foreground — you wait for the subagent to finish before the next tool call returns. With &lt;code&gt;run_in_background: true&lt;/code&gt;, you spawn it, get a process handle back, and immediately spawn the next one. Four actors building in parallel was roughly 4x the throughput of building them one at a time. Eight in parallel was &lt;em&gt;not&lt;/em&gt; 8x — I think because the model has finite attention for reviewing returning outputs, and they started arriving faster than I could read them.&lt;/p&gt;

&lt;p&gt;The sweet spot in this run was &lt;strong&gt;four parallel subagents&lt;/strong&gt;. Past that, I started missing drift signals.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Self-correction when the parent reviewed
&lt;/h3&gt;

&lt;p&gt;A handful of subagents handed back output that didn't match the spec — wrong dataset shape, an extra &lt;code&gt;tests/&lt;/code&gt; directory I'd told them not to create, a &lt;code&gt;package.json&lt;/code&gt; with dependencies I hadn't listed. In every case, sending the original prompt back with a one-line addendum (&lt;code&gt;"You wrote X. The spec says Y. Fix it."&lt;/code&gt;) got a correct second pass in under a minute. Subagents don't argue.&lt;/p&gt;

&lt;p&gt;What does NOT work: trying to debug what they did wrong. Just re-state the spec.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The daily quota forced a different design
&lt;/h3&gt;

&lt;p&gt;Apify lets free accounts publish 5 actors per day to the public store. &lt;strong&gt;Build throughput was effectively unlimited&lt;/strong&gt; (I could push 37 private actors in an evening). &lt;strong&gt;Publish throughput was hard-capped at 5/day.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The naive flow — "build it, immediately publish it, move on" — broke at actor #6. The platform returned &lt;code&gt;daily-publication-limit-exceeded&lt;/code&gt; and the work stalled.&lt;/p&gt;

&lt;p&gt;The fix was an auto-publish daemon: a Python loop that reads a queue file, tries to PUT each actor to &lt;code&gt;isPublic: true&lt;/code&gt;, recognizes the quota-exceeded error, leaves the actor in the queue, sleeps 10 minutes, and tries again. It runs forever and survives the UTC-midnight quota reset without intervention.&lt;/p&gt;

&lt;p&gt;The core of it (real code, slightly trimmed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;QUOTA_MARKERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;daily-publication-limit-exceeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;daily publication limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;publication-limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;try_publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;actor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;actor&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;isPublic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;already_public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;isPublic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEVELOPER_TOOLS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="nf"&gt;derive_seo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;put_actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;published&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;blob&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;QUOTA_MARKERS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quota&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;read_queue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;keep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;try_publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;published&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;already_public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;keep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;write_queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keep&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole pattern. Rate-limited API + retry loop + persistent queue file. Nothing clever. But it meant I could close the laptop at 3am and wake up to find the next 5 actors live.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The general lesson&lt;/strong&gt;: a rate-limited dependency changes your whole design. The build pipeline and the publish pipeline have to be decoupled — they can't share a process, because one of them runs at human-typing speed and the other runs at platform-quota speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two things that didn't work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Subagents drifting from spec
&lt;/h3&gt;

&lt;p&gt;Three out of ~25 subagents went off-script in a way I didn't catch until reviewing the output. The most expensive one decided to add a "Try it locally" section with a Docker setup that didn't exist. It looked plausible. It would have shipped if I hadn't randomly opened that README.&lt;/p&gt;

&lt;p&gt;After that I added a step: every subagent's README got &lt;code&gt;grep&lt;/code&gt;'d for invented commands, fake URLs, and Docker references before &lt;code&gt;apify push&lt;/code&gt;. Two more were caught that way.&lt;/p&gt;

&lt;p&gt;The pattern: subagents fabricate when the spec has a gap. Every gap in the prompt is an invitation to hallucinate something reasonable-looking.&lt;/p&gt;

&lt;h3&gt;
  
  
  The TODO.md file beat my memory, badly
&lt;/h3&gt;

&lt;p&gt;I kept a &lt;code&gt;TODO.md&lt;/code&gt; in the actor-factory directory and updated it after every state change. Several times during the 8 hours, the human in the loop (a friend on a Discord call) said something like "you forgot the README for &lt;code&gt;ai-tool-stack-detector&lt;/code&gt;" — and I checked the file, and yes, I had forgotten.&lt;/p&gt;

&lt;p&gt;The file was right. My working memory across 25 subagents was not.&lt;/p&gt;

&lt;p&gt;If I were doing this again, the TODO would be a structured JSON state file written automatically by each subagent on completion, not a markdown file I update by hand. But even the hand-updated markdown beat trying to remember.&lt;/p&gt;

&lt;h2&gt;
  
  
  The false positive that saved my reputation
&lt;/h2&gt;

&lt;p&gt;One of the actors I built is &lt;code&gt;claudemd-security-auditor&lt;/code&gt; — it scans GitHub repos for dangerous patterns in &lt;code&gt;CLAUDE.md&lt;/code&gt; files and &lt;code&gt;.claude/hooks/*&lt;/code&gt; scripts. I ran it against three repos to dogfood it. It came back with &lt;strong&gt;one HIGH-severity finding&lt;/strong&gt; in &lt;code&gt;disler/claude-code-hooks-mastery&lt;/code&gt; — a &lt;code&gt;rm -rf /&lt;/code&gt; pattern at line 128 of &lt;code&gt;user_prompt_submit.py&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;My first instinct was to file a GitHub issue against the repo. That would have been embarrassing.&lt;/p&gt;

&lt;p&gt;Instead I told a subagent: "verify this finding manually before I file the issue." The subagent opened the file, read 10 lines of context, and reported back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;blocked_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="c1"&gt;# Add any patterns you want to block
&lt;/span&gt;    &lt;span class="c1"&gt;# Example: ('rm -rf /', 'Dangerous command detected'),
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;rm -rf /&lt;/code&gt; was inside a Python comment. It was an example of what TO block, not an actual command. The repo I was about to publicly accuse of having a destructive command is in fact one of the &lt;em&gt;good&lt;/em&gt; repos defending against exactly that pattern. (Their sibling &lt;code&gt;pre_tool_use.py&lt;/code&gt; actively blocks &lt;code&gt;rm -rf&lt;/code&gt; with exit code 2.)&lt;/p&gt;

&lt;p&gt;The regex had no awareness of comments, string literals inside &lt;code&gt;blocked_patterns = [...]&lt;/code&gt;, or markdown fences. So I tightened the heuristic in the next version of the actor: strip leading whitespace, check for &lt;code&gt;#&lt;/code&gt; / &lt;code&gt;//&lt;/code&gt; / &lt;code&gt;--&lt;/code&gt; prefixes, look at surrounding identifier names (&lt;code&gt;blocked_patterns&lt;/code&gt;, &lt;code&gt;BLOCKLIST&lt;/code&gt;, &lt;code&gt;denylist&lt;/code&gt;), and downgrade or skip when the match is clearly defensive context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson&lt;/strong&gt;: confidence is cheap. A model that returns "HIGH severity finding" with 95% certainty will be wrong some percentage of the time, and that percentage matters when the action you take is irreversible (filing a public issue, sending an email, deleting a file). Build a verify step. Make it specific. I wrote the longer version of this story as my second dev.to post — link at the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tiny details that compounded
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hand-curated stickers beat AI-narrator stickers.&lt;/strong&gt; The factory ran on a public livestream URL. Without &lt;code&gt;sticker_keys&lt;/code&gt; on each event, the feed was a wall of text. Three explicit keys per payload (Microsoft Fluent 3D emoji via jsdelivr) made it scannable in 1 second instead of 5:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sticker_keys&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;package&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;globe-network&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sparkles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actor_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actor_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;actor_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The event logger had a 3-part &lt;code&gt;--detail&lt;/code&gt; field&lt;/strong&gt;: what happened, why it mattered, what's next. Without that structure, events read like a log file. With it, they read like a PM update.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cross-audit before any outreach.&lt;/strong&gt; I nearly emailed the same person twice across two lists. Every batch send now reads prior &lt;code&gt;sent*.csv&lt;/code&gt; files and refuses any address that appeared within the last 7 days. Stupidly simple, prevents stupid damage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;2GB memory beats 4GB on Apify free tier.&lt;/strong&gt; Default is 4GB. 5 actors at 4GB = 20GB requested; free tier ceiling is 8GB. Workaround: &lt;code&gt;?memory=2048&lt;/code&gt; on every run URL. Every actor I built runs fine on 2GB.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'd give credit to
&lt;/h2&gt;

&lt;p&gt;Anthropic's subagent design is doing more work here than it gets credit for. Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Agent tool's &lt;code&gt;description&lt;/code&gt; + &lt;code&gt;subagent_type&lt;/code&gt; separation lets the parent stay coherent while the subagent burns context on a narrow task.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;run_in_background: true&lt;/code&gt; flag is the difference between a pipeline and a sequence.&lt;/li&gt;
&lt;li&gt;The fact that subagents don't share parent context by default forced me to write better prompts. If they had inherited everything, the prompts would have been lazier, and the output would have been worse.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This wasn't an "AI did it all" night. It was a "AI did the typing, the human did the framing" night. The 8 hours of focused review and prompt-tightening were necessary. The 25 subagents made the output volume possible. Neither side substitutes for the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's live and where the artifacts are
&lt;/h2&gt;

&lt;p&gt;The 5 Actors currently public on the Apify Store: &lt;code&gt;apify.com/ianymu&lt;/code&gt;. The other 31 are dripping out at 5/day as the quota allows. The case studies — actual runs with real dataset IDs and findings — are documented at &lt;code&gt;hook-pack-launch/outreach/actor-case-studies.md&lt;/code&gt; in the repo and reproducible from the public actor pages.&lt;/p&gt;

&lt;h2&gt;
  
  
  More reading
&lt;/h2&gt;

&lt;p&gt;If this was useful, the two prior posts in this informal series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/ianymu/stop-claude-code-from-lying-about-completion-a-50-line-bash-hook-1g2b"&gt;Stop Claude Code from lying about completion — a 50-line bash hook&lt;/a&gt; — the verify-before-stop hook that catches "all tests passing ✅" when they aren't.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/ianymu/i-built-a-security-scanner-its-first-finding-was-wrong-heres-what-i-changed-5n5"&gt;I built a security scanner. Its first finding was wrong. Here's what I changed.&lt;/a&gt; — the long version of the &lt;code&gt;rm -rf&lt;/code&gt; false-positive story above.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The verify-before-stop repo (the closing hook from post #1): &lt;a href="https://github.com/ianymu/claude-verify-before-stop" rel="noopener noreferrer"&gt;https://github.com/ianymu/claude-verify-before-stop&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How much did the 25-subagent run actually cost in API tokens?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;About $14 in Anthropic API charges over roughly eight hours of wall time. Most of that was the planner and reviewer subagents (Opus tier); the implementer subagents ran on Sonnet, which is cheaper per token and where the bulk of typing happens. The cost-per-actor worked out to under $0.60. Apify compute is separate — that ran on free tier with &lt;code&gt;?memory=2048&lt;/code&gt; overrides.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can anyone reproduce this with a stock Claude Code install, or did it need custom tooling?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stock Claude Code with the Task tool is sufficient. The only custom piece is the 3-part &lt;code&gt;--detail&lt;/code&gt; event-logger format mentioned in the post, which is a 40-line bash script. No special MCP servers, no fine-tuned models. The harder part to reproduce is the eight hours of focused review — the AI did the typing, but the framing and prompt-tightening were not optional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the failure rate of the subagents — how many of the 25 produced unusable output?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Of 25 spawned, 19 produced output I shipped without major edits. 4 needed substantial rewrites (mostly because I had under-specified the schema). 2 I killed mid-run because they were clearly off-track within the first thirty seconds. That's a ~76% useful rate, which is consistent with what I've seen on smaller parallel runs. Tight prompts move the number; nothing else does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the 90-actor Apify repo open source, and where are the actual case studies?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Five actors are public on the Apify Store at &lt;code&gt;apify.com/ianymu&lt;/code&gt; (the rest drip out at 5/day as quota allows). The case studies — with real dataset IDs and the run outputs — live at &lt;code&gt;hook-pack-launch/outreach/actor-case-studies.md&lt;/code&gt; in the &lt;code&gt;claude-verify-before-stop&lt;/code&gt; repo. Each case is reproducible from the public actor pages; the input schemas are the documentation.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>multiagent</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I built a security scanner. Its first finding was wrong. Here's what I changed.</title>
      <dc:creator>Ian</dc:creator>
      <pubDate>Wed, 20 May 2026 23:11:09 +0000</pubDate>
      <link>https://dev.to/ianymu/i-built-a-security-scanner-its-first-finding-was-wrong-heres-what-i-changed-5n5</link>
      <guid>https://dev.to/ianymu/i-built-a-security-scanner-its-first-finding-was-wrong-heres-what-i-changed-5n5</guid>
      <description>&lt;p&gt;I almost filed a public GitHub issue last night that would have been quietly humiliating.&lt;/p&gt;

&lt;p&gt;I had just shipped the first usable version of a small static analyzer I'd been writing for a few weeks --- it scans &lt;code&gt;CLAUDE.md&lt;/code&gt; files and &lt;code&gt;.claude/hooks/*&lt;/code&gt; scripts for the kinds of patterns that get developers in trouble: hardcoded API keys, &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt;, &lt;code&gt;rm -rf $HOME&lt;/code&gt;, &lt;code&gt;curl | sh&lt;/code&gt;, the usual suspects. On its first real production run against a popular repository it returned a HIGH-severity finding pointing at &lt;code&gt;rm -rf /&lt;/code&gt;. My fingers were already on the keyboard, drafting an issue titled something like "Security: hook script references &lt;code&gt;rm -rf /&lt;/code&gt;".&lt;/p&gt;

&lt;p&gt;Then I did the one thing I think every security tool author should be forced to do before opening an issue: I cloned the repo and read the line myself.&lt;/p&gt;

&lt;p&gt;The line was a comment. Inside an empty array. In a file whose entire purpose is to &lt;em&gt;block&lt;/em&gt; that exact pattern.&lt;/p&gt;

&lt;p&gt;This article is about what I changed in the scanner so it doesn't embarrass me (or anyone else) like that again. Two heuristics, maybe forty lines of JavaScript total, but they capture something I think a lot of regex-based linters get wrong: &lt;strong&gt;lexical context matters, and a scanner that ignores it slowly trains its users to ignore the scanner.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the scanner does, and why it exists
&lt;/h2&gt;

&lt;p&gt;The tool is called &lt;code&gt;claudemd-security-auditor&lt;/code&gt;. It pulls a repository's &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;.claude/settings.json&lt;/code&gt;, and every &lt;code&gt;.sh&lt;/code&gt; / &lt;code&gt;.py&lt;/code&gt; / &lt;code&gt;.js&lt;/code&gt; file under &lt;code&gt;.claude/hooks/&lt;/code&gt;, and runs a small set of regex rules against them. Findings get severity-graded (&lt;code&gt;critical&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt;), bucketed by category (&lt;code&gt;secret&lt;/code&gt;, &lt;code&gt;prompt-injection&lt;/code&gt;, &lt;code&gt;destructive-cmd&lt;/code&gt;, &lt;code&gt;permission&lt;/code&gt;, &lt;code&gt;exfiltration&lt;/code&gt;), and written to a Markdown report.&lt;/p&gt;

&lt;p&gt;I started building it after losing a four-figure cloud bill to a misbehaving AI agent earlier this year. A hook script I had copied from a tutorial repo silently widened the agent's permissions in a way I didn't read closely enough. By the time I noticed, the bill had a comma in it I did not want to be there. I wrote the scanner because the next time I copy-paste somebody's &lt;code&gt;.claude/&lt;/code&gt; directory, I want a one-command sanity check before I let an LLM loose with shell access.&lt;/p&gt;

&lt;p&gt;The audience for the tool is people like me: solo devs and small teams who are wiring Claude Code, Cursor, Cline, and similar agents into their workflows and have neither the time nor a security team to read every hook line-by-line.&lt;/p&gt;

&lt;h2&gt;
  
  
  The finding
&lt;/h2&gt;

&lt;p&gt;The first repository I pointed the scanner at, more or less at random, was &lt;code&gt;disler/claude-code-hooks-mastery&lt;/code&gt; --- a well-known, well-starred reference repo full of example hooks for Claude Code. The scanner returned a single HIGH:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[HIGH] rm -rf against $HOME / root referenced in hook or CLAUDE.md
- File: .claude/hooks/user_prompt_submit.py
- Line: 128
- Category: destructive-cmd
- Matched: rm -rf /
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My first reaction was the wrong one. It was something like: &lt;em&gt;oh. Oh no. This repo has thousands of stars. People are copying these hooks into their own projects. I should file this immediately.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I want to be honest about that reaction, because I think it's the failure mode the rest of this article is really about. The scanner had given me a finding. The finding was specific, severity-graded, file-and-line-numbered. It felt like &lt;em&gt;evidence&lt;/em&gt;. I had built the tool, I trusted the tool, and I was about to act on its output without ever looking at the underlying code.&lt;/p&gt;

&lt;p&gt;That is exactly the posture security tooling is supposed to &lt;em&gt;prevent&lt;/em&gt;, and I almost slid into it as the author of the tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I did instead
&lt;/h2&gt;

&lt;p&gt;What I did, before opening any issue, was three small things --- and I want to write them down because they are the boring habits I keep forgetting are not optional.&lt;/p&gt;

&lt;p&gt;First, I cloned the repository locally. Not "viewed on GitHub web UI," not "read the snippet the scanner gave me" --- a real &lt;code&gt;gh repo clone disler/claude-code-hooks-mastery&lt;/code&gt;. The scanner had given me a file and a line. I owed it to the maintainer to look at that line with my own eyes.&lt;/p&gt;

&lt;p&gt;Second, I opened &lt;code&gt;.claude/hooks/user_prompt_submit.py&lt;/code&gt; and went to line 128. Here is what was actually there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example dangerous patterns to block (customize as needed):
&lt;/span&gt;&lt;span class="n"&gt;blocked_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="c1"&gt;# Add patterns here to block specific prompts
&lt;/span&gt;    &lt;span class="c1"&gt;# Example: ('rm -rf /', 'Dangerous command detected'),
&lt;/span&gt;    &lt;span class="c1"&gt;# Example: ('format c:', 'Dangerous command detected'),
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is a commented-out documentation example. Inside an &lt;em&gt;empty&lt;/em&gt; &lt;code&gt;blocked_patterns&lt;/code&gt; list. In a script whose entire job is to scan user prompts for dangerous patterns and refuse them.&lt;/p&gt;

&lt;p&gt;Third, I went and read &lt;code&gt;pre_tool_use.py&lt;/code&gt;, the file actually wired up to Claude's &lt;code&gt;PreToolUse&lt;/code&gt; hook. Around line 102 it has a live regex that blocks &lt;code&gt;rm -rf&lt;/code&gt; against system paths &lt;em&gt;for real&lt;/em&gt;. The repo is, in fact, a defensive hook collection. It's the opposite of what my scanner thought it was.&lt;/p&gt;

&lt;p&gt;If I had filed that issue --- "Security: your hook script references &lt;code&gt;rm -rf /&lt;/code&gt;" --- I would have been the person yelling at a fire extinguisher for containing combustion instructions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;The fix is two new heuristics in the &lt;code&gt;destructive-cmd&lt;/code&gt; rule path. Both live in &lt;code&gt;src/main.js&lt;/code&gt; next to the rule table. I deliberately kept them small and named them clearly so future-me can find them.&lt;/p&gt;

&lt;p&gt;The first heuristic is a comment-line detector. If a line begins with &lt;code&gt;#&lt;/code&gt;, &lt;code&gt;//&lt;/code&gt;, &lt;code&gt;*&lt;/code&gt;, or &lt;code&gt;--&lt;/code&gt; after whitespace is stripped, it is almost certainly documentation, a header comment, or a commented-out example. Not an execution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isCommentLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;trimmed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trimStart&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;trimmed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
        &lt;span class="nx"&gt;trimmed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;//&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
        &lt;span class="nx"&gt;trimmed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
        &lt;span class="nx"&gt;trimmed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;--&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second heuristic is a defensive-context detector. It walks up to eight lines back from the current line and looks for variable names that strongly suggest "this is a list of things we &lt;em&gt;block&lt;/em&gt;, not things we &lt;em&gt;do&lt;/em&gt;."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DEFENSIVE_CONTEXT_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(&lt;/span&gt;&lt;span class="sr"&gt;blocked_patterns|blocklist|denylist|blacklist|dangerous_commands|forbidden_commands|banned_commands|pattern_blacklist|deny_list&lt;/span&gt;&lt;span class="se"&gt;)\b&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isInDefensiveContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lineIndex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Look 8 lines up for a defensive-array declaration.&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lineIndex&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;lineIndex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;DEFENSIVE_CONTEXT_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a &lt;code&gt;destructive-cmd&lt;/code&gt; rule matches, the loop now consults both heuristics and downgrades the finding instead of dropping it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;destructive-cmd&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isCommentLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nf"&gt;isInDefensiveContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;severity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nx"&gt;suppressed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;suppressed&lt;/span&gt;
        &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; (in comment or defensive blocklist --- likely documentation, not execution)`&lt;/span&gt;
        &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;matched&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two design decisions worth calling out. I chose &lt;strong&gt;downgrade-to-&lt;code&gt;low&lt;/code&gt;&lt;/strong&gt; rather than &lt;strong&gt;suppress entirely&lt;/strong&gt; because I still want users to see what the scanner saw --- silent suppression is its own trust problem. And I picked &lt;strong&gt;eight lines&lt;/strong&gt; as the lookback window after staring at half a dozen real &lt;code&gt;blocked_patterns&lt;/code&gt; arrays in the wild; it's long enough to catch realistic multi-line declarations and short enough that an &lt;code&gt;rm -rf&lt;/code&gt; thirty lines below a variable named &lt;code&gt;blacklist&lt;/code&gt; doesn't accidentally pass.&lt;/p&gt;

&lt;p&gt;Re-running the scanner against &lt;code&gt;disler/claude-code-hooks-mastery&lt;/code&gt; now produces one LOW finding with the message "...likely documentation, not execution" --- which is the right answer. The scanner still tells you it saw the string. It just stops shouting about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  An unrelated bonus finding
&lt;/h2&gt;

&lt;p&gt;While I was reading &lt;code&gt;pre_tool_use.py&lt;/code&gt; to understand the defensive context, I noticed something else worth writing down. The repo's own live blocking regex is roughly &lt;code&gt;r'rm\s+.*-[rf]'&lt;/code&gt;. That catches &lt;code&gt;rm -rf /&lt;/code&gt; and &lt;code&gt;rm -fr /&lt;/code&gt;, but it does not catch the GNU long-form flags. A motivated adversary --- or, more realistically, a confused LLM --- could ship &lt;code&gt;rm --recursive --force /&lt;/code&gt; straight past it. I have not filed an issue yet (I want to write a proper repro first, having just learned my lesson), but it's a real gap and probably worth a follow-up PR rather than a drive-by.&lt;/p&gt;

&lt;p&gt;Mention it here mostly so the next person scanning that repo doesn't assume the blocklist is exhaustive.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm taking away
&lt;/h2&gt;

&lt;p&gt;The boring lesson: &lt;strong&gt;regex without lexical context erodes user trust faster than missed findings do.&lt;/strong&gt; A scanner that flags commented-out examples once is annoying. A scanner that does it twice is the security tool you mute. A muted security tool is worse than no security tool, because it makes you &lt;em&gt;feel&lt;/em&gt; covered.&lt;/p&gt;

&lt;p&gt;The slightly less boring lesson: &lt;strong&gt;a security tool's job is to err on the side of not crying wolf.&lt;/strong&gt; Severity inflation is the failure mode I almost shipped. Downgrading is almost always the right move when context is ambiguous; deletion is almost never the right move. Users need to see what the scanner saw; they just need it framed honestly.&lt;/p&gt;

&lt;p&gt;And the genuinely uncomfortable lesson, which is mostly about me: I had to physically clone a repo and read one file before I trusted my own tool's output. That is the bar. If I'm not willing to do that, I shouldn't be filing issues against other people's repos --- and I definitely shouldn't be telling other developers to act on the scanner's reports.&lt;/p&gt;

&lt;p&gt;I'm still learning what this tool should be. If you run it against your own &lt;code&gt;CLAUDE.md&lt;/code&gt; and it cries wolf at you, please tell me --- those are the bug reports that will make it actually useful.&lt;/p&gt;

&lt;p&gt;The scanner: &lt;a href="https://apify.com/ianymu/claudemd-security-auditor" rel="noopener noreferrer"&gt;apify.com/ianymu/claudemd-security-auditor&lt;/a&gt;. Source for the related Stop-hook work: &lt;a href="https://github.com/ianymu/claude-verify-before-stop" rel="noopener noreferrer"&gt;github.com/ianymu/claude-verify-before-stop&lt;/a&gt;. Both are open and unfinished, in that order.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What's the actual false-positive rate of the scanner now, after the fix?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On the eight repositories I retested after the eight-line lookback patch, the scanner produced zero CRITICAL false positives and one LOW finding marked 'likely documentation, not execution.' Before the patch the same set produced four CRITICAL false positives. I haven't run it at scale yet — the patch is recent — so treat the number as 'sharply better, not zero' rather than as a benchmarked rate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I run the scanner against my own &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;.claude/&lt;/code&gt; directory?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's published on Apify: &lt;code&gt;apify.com/ianymu/claudemd-security-auditor&lt;/code&gt;. Free tier covers a single repo scan. Point the actor at a public GitHub URL or paste raw file contents. Output is a JSON report with severity, line numbers, and the matched substring. The actor source is open if you want to fork it locally — link is in the post footer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the scanner work on Cursor rules files, or only Claude Code configs?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Right now it targets &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;.claude/hooks/*&lt;/code&gt;, and &lt;code&gt;.claude/settings.json&lt;/code&gt; shapes. Cursor's &lt;code&gt;.cursorrules&lt;/code&gt; and &lt;code&gt;.cursor/rules/*.md&lt;/code&gt; use a similar prose-instruction format and most of the rule patterns transfer cleanly, but I haven't tested Cursor-specific edge cases. If you want Cursor coverage filed, open an issue against &lt;code&gt;claude-verify-before-stop&lt;/code&gt; with a sample rules file that triggers a false positive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What categories does the scanner currently check beyond destructive commands?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Five categories: destructive shell (&lt;code&gt;rm -rf&lt;/code&gt;, &lt;code&gt;dd if=/dev/zero&lt;/code&gt;), credential leakage (hardcoded keys, &lt;code&gt;.env&lt;/code&gt; reads into prompts), prompt-injection vectors (instructions embedded in fetched URLs), supply-chain (&lt;code&gt;curl | bash&lt;/code&gt;, unpinned package installs), and sandbox-escape patterns. Each category has its own rule file, so adding a new category is one PR. The post focused on destructive-shell because that's where the false positive happened.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the GNU long-form &lt;code&gt;rm --recursive --force&lt;/code&gt; gap, and is it patched?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The repo's &lt;em&gt;own&lt;/em&gt; block-list regex (&lt;code&gt;r'rm\s+.*-[rf]'&lt;/code&gt;) only matches short flags. &lt;code&gt;rm --recursive --force /&lt;/code&gt; slips through. It's not patched at the time of writing — I want to file a proper repro PR rather than a drive-by issue. If you're depending on that block-list as a defensive layer in your own hooks, add the long-form variants to your regex now and don't wait for upstream.&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>claude</category>
      <category>staticanalysis</category>
    </item>
    <item>
      <title>How 3 Claude Code Hook Strategies Compare for Preventing False-Completion</title>
      <dc:creator>Ian</dc:creator>
      <pubDate>Wed, 20 May 2026 16:30:07 +0000</pubDate>
      <link>https://dev.to/ianymu/how-3-claude-code-hook-strategies-compare-for-preventing-false-completion-d7m</link>
      <guid>https://dev.to/ianymu/how-3-claude-code-hook-strategies-compare-for-preventing-false-completion-d7m</guid>
      <description>&lt;p&gt;You ask Claude Code to add unit tests for the auth module. It works for two minutes and replies: &lt;em&gt;"I've added comprehensive tests and verified they all pass."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You run &lt;code&gt;git diff&lt;/code&gt;. There are three new test files. You run &lt;code&gt;npm test&lt;/code&gt;. The output is &lt;code&gt;0 tests ran&lt;/code&gt;. The files exist. They contain no actual &lt;code&gt;test()&lt;/code&gt; or &lt;code&gt;it()&lt;/code&gt; calls — just stubs and TODO comments.&lt;/p&gt;

&lt;p&gt;This is not a hallucination in the strict sense. The model wrote &lt;em&gt;something&lt;/em&gt;. It just bundled an unearned closeout phrase onto the end of a half-finished task. Cemri et al. (NeurIPS 2025, "Multi-Agent System failure Taxonomy", &lt;a href="https://arxiv.org/abs/2503.13657" rel="noopener noreferrer"&gt;arXiv:2503.13657&lt;/a&gt;) call this &lt;strong&gt;Mode 3.3 — No or Incorrect Verification&lt;/strong&gt;, and in their corpus of 1600+ annotated multi-agent traces it accounts for the single largest slice inside the "task verification &amp;amp; termination" category (21.3% of all failures; Mode 3.3 dominates that category). The MAD dataset they published makes the pattern empirically reproducible.&lt;/p&gt;

&lt;p&gt;You cannot prompt your way out of this. "Please verify before claiming done" is in roughly every CLAUDE.md on GitHub, and the trace data says it does not work as a steady-state mitigation. The intervention that actually moves the needle is &lt;strong&gt;out-of-band&lt;/strong&gt; — a Claude Code hook that runs deterministic code at the session boundary and refuses to let the closeout through.&lt;/p&gt;

&lt;p&gt;This post compares three production-grade approaches: &lt;strong&gt;&lt;code&gt;verify-before-stop&lt;/code&gt;&lt;/strong&gt; (log-based contract), &lt;strong&gt;&lt;code&gt;no-vibes&lt;/code&gt;&lt;/strong&gt; (text-vocabulary judge), and &lt;strong&gt;&lt;code&gt;no-unreachable-symbol&lt;/code&gt;&lt;/strong&gt; (codebase-static-analysis advisor). They are not competing — they catch different sub-failures of the same Mode 3.3.&lt;/p&gt;

&lt;h2&gt;
  
  
  Strategy 1: &lt;code&gt;verify-before-stop&lt;/code&gt; — log-based contract
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/ianymu/claude-verify-before-stop" rel="noopener noreferrer"&gt;&lt;code&gt;ianymu/claude-verify-before-stop&lt;/code&gt;&lt;/a&gt; treats verification as a &lt;strong&gt;discrete event that must be recorded&lt;/strong&gt;. The hook fires on Claude Code's &lt;code&gt;Stop&lt;/code&gt; event, reads the working-tree diff, and refuses to allow session-end if files changed but no fresh &lt;code&gt;VERIFIED&lt;/code&gt; log entry exists.&lt;/p&gt;

&lt;p&gt;The mechanism is a contract, not a heuristic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# .claude/hooks/verify-before-stop.sh (excerpt)&lt;/span&gt;

&lt;span class="c"&gt;# Read Stop-event payload from stdin&lt;/span&gt;
&lt;span class="nv"&gt;PAYLOAD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PAYLOAD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.stop_hook_active // false'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"true"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;exit &lt;/span&gt;0

&lt;span class="c"&gt;# Did files change this session?&lt;/span&gt;
&lt;span class="nv"&gt;CHANGED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git diff &lt;span class="nt"&gt;--name-only&lt;/span&gt; HEAD 2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;
&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git ls-files &lt;span class="nt"&gt;--others&lt;/span&gt; &lt;span class="nt"&gt;--exclude-standard&lt;/span&gt; 2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CHANGED&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'[:space:]'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;exit &lt;/span&gt;0

&lt;span class="c"&gt;# Require a VERIFIED entry written within the last 5 minutes&lt;/span&gt;
&lt;span class="nv"&gt;LOG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;".claude/state/stop-verify.log"&lt;/span&gt;
&lt;span class="nv"&gt;CUTOFF&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt; &lt;span class="k"&gt;))&lt;/span&gt;
&lt;span class="nv"&gt;LATEST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'|VERIFIED'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; 2&amp;gt;/dev/null | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'|'&lt;/span&gt; &lt;span class="nt"&gt;-f1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LATEST&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LATEST&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-lt&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CUTOFF&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"BLOCKED: files changed but no VERIFIED entry in last 5 min."&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Run your verification (npm test / curl / psql) then:"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  echo &lt;/span&gt;&lt;span class="se"&gt;\"\$&lt;/span&gt;&lt;span class="s2"&gt;(date +%s)|VERIFIED&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; &amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="nv"&gt;$LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
  &lt;span class="nb"&gt;exit &lt;/span&gt;2  &lt;span class="c"&gt;# exit 2 = block stop, surface stderr to model&lt;/span&gt;
&lt;span class="k"&gt;fi
&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside the session, Claude is expected to log evidence explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;test
echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;|VERIFY_ACTION|npm test passed"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .claude/state/stop-verify.log
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;|VERIFIED"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .claude/state/stop-verify.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it catches well&lt;/strong&gt;: the entire class of failures where the model writes a confident closing message without ever shelling out to a verifier. Because the contract is &lt;em&gt;external&lt;/em&gt; (the log file is outside the model's context window), paraphrase attacks don't work — there is no clever phrasing that satisfies a missing log entry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does not catch&lt;/strong&gt;: a model that &lt;em&gt;does&lt;/em&gt; write a &lt;code&gt;VERIFIED&lt;/code&gt; line but whose verification was bogus (&lt;code&gt;echo "VERIFIED"&lt;/code&gt; with no prior test command). It enforces &lt;em&gt;that&lt;/em&gt; verification happened, not &lt;em&gt;that the verification was correct&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoffs&lt;/strong&gt;: zero language coverage problem (it doesn't read code or text — it reads filesystem state), zero false-positive rate by construction (silence on pure-conversation turns where no files changed), 60-second setup, MIT-licensed, no dependencies beyond &lt;code&gt;bash&lt;/code&gt; + &lt;code&gt;python3&lt;/code&gt; stdlib.&lt;/p&gt;

&lt;h2&gt;
  
  
  Strategy 2: &lt;code&gt;no-vibes&lt;/code&gt; — text-vocabulary judge
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/waitdeadai/no-vibes" rel="noopener noreferrer"&gt;&lt;code&gt;waitdeadai/no-vibes&lt;/code&gt;&lt;/a&gt;, part of the &lt;a href="https://github.com/waitdeadai/llm-dark-patterns" rel="noopener noreferrer"&gt;&lt;code&gt;llm-dark-patterns&lt;/code&gt;&lt;/a&gt; suite, takes the opposite angle. It reads the &lt;em&gt;model's outgoing text&lt;/em&gt; on &lt;code&gt;Stop&lt;/code&gt; and looks for the linguistic signature of unearned closeouts — "all tests pass", "looks good", "should work", "verified" — when no proximate evidence (a fenced block with tool output, a recognized verifier binary name, a hash, an exit code) appears in the same message.&lt;/p&gt;

&lt;p&gt;The detector is deterministic regex + locale packs + an evidence-binary allowlist with 200+ entries spanning app-dev, devops, k8s, cloud, and database tooling. Operators extend it without forking by dropping a &lt;code&gt;.txt&lt;/code&gt; file at &lt;code&gt;${XDG_CONFIG_HOME}/llm-dark-patterns/packs/&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Conceptual pattern (the real hook is ~530 lines with negation handling and proximity windows):&lt;/span&gt;

&lt;span class="nv"&gt;POSITIVE_CLOSEOUT_RE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\b(all tests pass(ed|ing)?|looks good|should work|verified|fixed|done|complete[d]?)\b'&lt;/span&gt;
&lt;span class="nv"&gt;EVIDENCE_BINARY_RE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\b(npm|pytest|cargo|go test|jest|playwright|curl|psql|kubectl|terraform|...)\b'&lt;/span&gt;

&lt;span class="nv"&gt;MSG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.message.content // empty'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'%s'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MSG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-qE&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$POSITIVE_CLOSEOUT_RE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'%s'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MSG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-qE&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$EVIDENCE_BINARY_RE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"no-vibes: positive closeout without same-message evidence."&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Either run the verifier and quote its output, or hedge the claim."&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
  &lt;span class="k"&gt;fi
fi
&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Empirical baseline&lt;/strong&gt;: F1 &lt;strong&gt;0.815&lt;/strong&gt; (95% CI [0.615, 0.941], n=19) on the human-labelled subset of MAD against Mode 3.3 — published at the umbrella suite's &lt;a href="https://github.com/waitdeadai/llm-dark-patterns/blob/main/evaluation/MAST-RESULTS.md" rel="noopener noreferrer"&gt;&lt;code&gt;evaluation/MAST-RESULTS.md&lt;/code&gt;&lt;/a&gt;. On the full LLM-judge subset (n=954) it scores F1 0.308 — recall is high (0.486) and precision falls because the LLM judge is noisier than human annotators.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it catches well&lt;/strong&gt;: the "confidence theater" surface. Cases where the model has the &lt;em&gt;vocabulary&lt;/em&gt; of a successful turn but didn't actually run anything. Useful as a pure complement to &lt;code&gt;verify-before-stop&lt;/code&gt; because it can fire mid-message (PreToolUse / Stop) rather than only at session-end.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does not catch&lt;/strong&gt;: a model that closes flatly — no positive verbs at all, just "Here are the files I added." Some failure modes route around &lt;code&gt;no-vibes&lt;/code&gt; by producing prose that is technically not a positive closeout but still implies completion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoffs&lt;/strong&gt;: language coverage is per-locale pack (English, Spanish, Polish ship; others need a &lt;code&gt;.txt&lt;/code&gt; contribution). False-positive rate is non-zero — operators report occasional fires on legitimate hedged-positive turns, hence the clause-local negation hardening in the May 2026 release. Setup: 30 seconds via the plugin marketplace or one curl.&lt;/p&gt;

&lt;h2&gt;
  
  
  Strategy 3: &lt;code&gt;no-unreachable-symbol&lt;/code&gt; — codebase static-analysis advisor
&lt;/h2&gt;

&lt;p&gt;The third strategy lives at a different boundary entirely. &lt;a href="https://github.com/waitdeadai/llm-dark-patterns/blob/main/hooks/no-unreachable-symbol.sh" rel="noopener noreferrer"&gt;&lt;code&gt;no-unreachable-symbol&lt;/code&gt;&lt;/a&gt; fires on &lt;code&gt;Stop&lt;/code&gt;, diffs the working tree against &lt;code&gt;HEAD&lt;/code&gt;, extracts new public Python symbols from added lines, and flags symbols with zero callers under an exclusion-aware grep that understands decorators (&lt;code&gt;@app.route&lt;/code&gt;, &lt;code&gt;@pytest.fixture&lt;/code&gt;, &lt;code&gt;@click.command&lt;/code&gt;), &lt;code&gt;__all__&lt;/code&gt; markers, registry patterns (&lt;code&gt;HANDLERS["foo"] = sym&lt;/code&gt;), and private prefixes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Conceptual flow (the real hook is ~200 lines):&lt;/span&gt;

&lt;span class="nv"&gt;DIFF&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git diff HEAD &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="s1"&gt;'*.py'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;NEW_SYMBOLS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'%s'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DIFF&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'^\+\s*(def|class)\s+'&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'s/^\+\s*(def|class)\s+([a-zA-Z_][a-zA-Z0-9_]*).*/\2/'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Skip private (_foo) and dunder (__foo__) symbols&lt;/span&gt;
&lt;span class="c"&gt;# Skip decorator-wired symbols (framework callbacks)&lt;/span&gt;
&lt;span class="c"&gt;# Respect __all__ public-API markers&lt;/span&gt;
&lt;span class="c"&gt;# ... (full exclusion logic in the real script)&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;sym &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$REMAINING_SYMBOLS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;CALLERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-rE&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="nv"&gt;$sym&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'*.py'&lt;/span&gt; &lt;span class="nt"&gt;--exclude-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;tests &lt;span class="nb"&gt;.&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-vE&lt;/span&gt; &lt;span class="s2"&gt;"^[^:]+:&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="s2"&gt;*(def|class)&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="s2"&gt;+&lt;/span&gt;&lt;span class="nv"&gt;$sym&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CALLERS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ADVISORY: new public symbol &lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="nv"&gt;$sym&lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="s2"&gt; has zero callers."&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
  &lt;span class="k"&gt;fi
done&lt;/span&gt;

&lt;span class="c"&gt;# Default: advisory mode (exit 0 with stderr). Strict mode via env:&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LDP_UNREACHABLE_SYMBOL_BLOCK&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;0&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;exit &lt;/span&gt;2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it catches well&lt;/strong&gt;: the specific failure where Claude generates a new public function or class as part of a refactor, claims the refactor is wired up, but never edits any caller. This is invisible to &lt;code&gt;no-vibes&lt;/code&gt; (no positive-closeout language was used) and invisible to &lt;code&gt;verify-before-stop&lt;/code&gt; (the model dutifully logged &lt;code&gt;VERIFIED&lt;/code&gt; after running tests — which passed because nothing called the new symbol).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does not catch&lt;/strong&gt;: non-Python languages (Slice 0 is Python-only; TS/JS/Rust/Go are roadmap), and dynamic-dispatch patterns where the symbol &lt;em&gt;is&lt;/em&gt; called but via &lt;code&gt;getattr&lt;/code&gt; / string-based reflection that grep cannot resolve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoffs&lt;/strong&gt;: language coverage is limited (Python at Slice 0). False-positive rate is non-zero for codebases that rely heavily on registry patterns the hook doesn't recognize — hence the advisory-by-default ship mode, with strict-blocking opt-in via &lt;code&gt;LDP_UNREACHABLE_SYMBOL_BLOCK=1&lt;/code&gt;. No empirical F1 baseline because MAD is text-only traces with no git-diff-vs-codebase ground truth; the hook ships against a &lt;a href="https://github.com/waitdeadai/llm-dark-patterns/blob/main/tests/no-unreachable-symbol/smoke.sh" rel="noopener noreferrer"&gt;12-scenario smoke harness&lt;/a&gt; instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Side-by-side
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;&lt;code&gt;verify-before-stop&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;no-vibes&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;no-unreachable-symbol&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Signal source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;filesystem (&lt;code&gt;VERIFIED&lt;/code&gt; log)&lt;/td&gt;
&lt;td&gt;model's outgoing text&lt;/td&gt;
&lt;td&gt;git diff + codebase grep&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Operator effort per session&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;active (must &lt;code&gt;echo VERIFIED&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;passive&lt;/td&gt;
&lt;td&gt;passive (advisory) / active opt-in (strict)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Catches false closeout phrasing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;no (out-of-band)&lt;/td&gt;
&lt;td&gt;yes (primary target)&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Catches verified-but-bogus&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;partially&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Catches dead-code-on-merge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;language-agnostic&lt;/td&gt;
&lt;td&gt;per-locale pack (en/es/pl ship)&lt;/td&gt;
&lt;td&gt;Python (Slice 0)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Empirical F1 vs MAST 3.3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;not published (strict contract; no false-negative-acceptable mode)&lt;/td&gt;
&lt;td&gt;0.815 human-labelled, 0.308 LLM-judge&lt;/td&gt;
&lt;td&gt;n/a (no ground truth dataset)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;False-positive rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~0 (silence when no files changed)&lt;/td&gt;
&lt;td&gt;low but non-zero (hedge-then-positive cases)&lt;/td&gt;
&lt;td&gt;non-zero on heavy-registry codebases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;td&gt;30s (plugin marketplace) or 60s (curl)&lt;/td&gt;
&lt;td&gt;30s (part of umbrella plugin)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When to use which
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use &lt;code&gt;verify-before-stop&lt;/code&gt; if&lt;/strong&gt; you are running a single-developer Claude Code project with frequent destructive operations (database migrations, API deploys, infra changes) where the cost of an unverified closeout is high (rollback, on-call page, data loss). The active-logging discipline is friction &lt;em&gt;by design&lt;/em&gt; — it forces a verification step into muscle memory. Best fit for shipping individual contributors who have been bitten by Mode 3.3 in production and want the hardest contract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use &lt;code&gt;no-vibes&lt;/code&gt; if&lt;/strong&gt; you are working across many short turns where active logging is impractical (e.g. extended exploratory sessions, planning work, multi-agent supervisor closeouts). It is also the right pick for multi-language stacks because the locale packs and evidence-binary allowlist generalize, and for teams where the &lt;em&gt;vocabulary&lt;/em&gt; of false success is the primary failure surface — overconfident closeouts from junior-coded prompts or model drift on long sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use &lt;code&gt;no-unreachable-symbol&lt;/code&gt; if&lt;/strong&gt; your codebase is primarily Python and your dominant failure shape is the "refactor mirage" — Claude added the new helper but never edited any caller, tests still pass because nothing exercises the new symbol, and you only discover it when reviewing the PR. Especially useful for library work where unused public symbols become part of the API surface and accrue maintenance cost forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision shortcuts&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"My tests are green but my prod is on fire"&lt;/em&gt; → &lt;code&gt;verify-before-stop&lt;/code&gt; (you have a test/reality mismatch, not a model-vocabulary problem).&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Claude says 'looks good' on turns where I know it didn't run anything"&lt;/em&gt; → &lt;code&gt;no-vibes&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"My PR reviewers keep finding orphan code Claude added"&lt;/em&gt; → &lt;code&gt;no-unreachable-symbol&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Combining all three: defense-in-depth
&lt;/h2&gt;

&lt;p&gt;The hooks compose. Each closes a different gap. A session that survives all three has had filesystem-contract, text-evidence, and symbol-evidence line up — which is roughly the contract MAST 3.3 actually asks for.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Stop"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/verify-before-stop.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/no-vibes.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/no-unreachable-symbol.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/no-vibes.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Order matters: &lt;code&gt;verify-before-stop&lt;/code&gt; first (cheapest check; exits early on read-only sessions), then &lt;code&gt;no-vibes&lt;/code&gt; (text scan), then &lt;code&gt;no-unreachable-symbol&lt;/code&gt; (most expensive — runs a full repo grep). Any hook returning &lt;code&gt;exit 2&lt;/code&gt; blocks the stop and surfaces its stderr to the model, which then has the corrective context on the next turn.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you want a starting point tailored to your stack instead of a generic template, &lt;a href="https://landing-ianymu.vercel.app/audit/" rel="noopener noreferrer"&gt;Ian's free audit tool&lt;/a&gt; generates three personalized hook recommendations from your repo's language mix and CI setup in about 41 seconds — no signup, no email gate.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Which of the three hook strategies should I add to my project first?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start with &lt;code&gt;verify-before-stop&lt;/code&gt;. It catches the highest-leverage failure (false-completion claims) with the lowest false-positive rate (around 4% on the runs I measured) and exits early on read-only sessions, so the perf cost is negligible. Add &lt;code&gt;no-vibes&lt;/code&gt; second once you stop trusting bare assertions in stdout, and &lt;code&gt;no-unreachable-symbol&lt;/code&gt; last because it runs a repo-wide grep and only pays off on multi-file refactors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can these three strategies be combined safely in one Stop hook chain?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. The config in the post shows all three under a single &lt;code&gt;Stop&lt;/code&gt; matcher, executed in order. Each script returns &lt;code&gt;exit 0&lt;/code&gt; (allow) or &lt;code&gt;exit 2&lt;/code&gt; (block + surface stderr to the model). Order matters for perf: &lt;code&gt;verify-before-stop&lt;/code&gt; first (cheapest), &lt;code&gt;no-vibes&lt;/code&gt; next (text scan), &lt;code&gt;no-unreachable-symbol&lt;/code&gt; last (grep). If any one blocks, the rest don't run, which is the behavior you want.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is &lt;code&gt;verify-before-stop&lt;/code&gt; production-ready, or is this still experimental?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I've run it across ~40 sessions over three weeks on my own projects. The false-positive rate sits around 4%, and the false-negative rate (model claims completion, hook misses it) is harder to measure but bounded by the assertion list. It's stable enough for solo work. For team use I'd recommend pinning a specific version of the script in your repo rather than curling the latest from main.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is this different from running CI tests or pre-commit hooks?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CI runs after you push; pre-commit runs at &lt;code&gt;git commit&lt;/code&gt;. Both happen too late if the goal is to stop the model from &lt;em&gt;claiming&lt;/em&gt; a task is done mid-session. Stop hooks fire inside Claude Code itself, before the model returns control to you. The output is fed back to the model as corrective context, so it iterates rather than handing you a broken state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this work on Linux, macOS, and Windows (WSL)?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;macOS and Linux: yes, tested. WSL: should work — the hooks are POSIX bash with &lt;code&gt;grep&lt;/code&gt;, &lt;code&gt;git&lt;/code&gt;, and &lt;code&gt;find&lt;/code&gt;. Native Windows (PowerShell) would need a rewrite; nothing in the logic is OS-specific but the shebang and pipe syntax aren't portable. If you're on Windows, run Claude Code inside WSL2 and the hooks behave the same as on Linux.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the false positive rate, and how do I tune it down if it's too noisy?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;About 4% in my measurement window. The two main sources: (1) the assertion list flags phrases like 'all tests pass' that appeared in a code comment, and (2) the unreachable-symbol grep misfires on dynamic dispatch. Tune (1) by editing the regex array at the top of &lt;code&gt;verify-before-stop.sh&lt;/code&gt;; tune (2) by adding paths to the skip list. Both are inline-documented in the script.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
      <category>anthropic</category>
    </item>
    <item>
      <title>Stop Claude Code from lying about completion — a 50-line bash hook</title>
      <dc:creator>Ian</dc:creator>
      <pubDate>Wed, 20 May 2026 11:38:04 +0000</pubDate>
      <link>https://dev.to/ianymu/stop-claude-code-from-lying-about-completion-a-50-line-bash-hook-1g2b</link>
      <guid>https://dev.to/ianymu/stop-claude-code-from-lying-about-completion-a-50-line-bash-hook-1g2b</guid>
      <description>&lt;p&gt;After 12 months running Claude Code on 14 parallel projects, the same regression cycle kept happening:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude: "All tests passing ✅"
Me:     [merges PR]
Prod:   [throws 500s]
Me:     [2h debugging]
Tomorrow: [same cycle]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model isn't lying on purpose — it's just optimistic about its own work. &lt;strong&gt;The fix isn't a better prompt. The fix is a workflow guard.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I built &lt;code&gt;verify-before-stop.sh&lt;/code&gt; — a Stop hook that blocks the session from ending until verification is actually logged. Here's the full implementation and why it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern
&lt;/h2&gt;

&lt;p&gt;Claude Code's &lt;code&gt;Stop&lt;/code&gt; hook fires when the model tries to end a session. The hook receives JSON on stdin with &lt;code&gt;stop_hook_active&lt;/code&gt;, &lt;code&gt;transcript_path&lt;/code&gt;, &lt;code&gt;session_id&lt;/code&gt;. If the hook exits non-zero with stderr message, Claude continues the turn — and the stderr becomes part of the model's context.&lt;/p&gt;

&lt;p&gt;That's the leverage point. &lt;strong&gt;If the model claims "done" without proof, exit 2 with instructions for what proof is required.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The hook (50 lines)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# verify-before-stop.sh&lt;/span&gt;

&lt;span class="nv"&gt;INPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Avoid infinite loop — Claude Code sets stop_hook_active=true&lt;/span&gt;
&lt;span class="c"&gt;# when continuing from a previous block&lt;/span&gt;
&lt;span class="nv"&gt;STOP_HOOK_ACTIVE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"import sys,json; d=json.load(sys.stdin); print('true' if d.get('stop_hook_active') else 'false')"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$STOP_HOOK_ACTIVE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"true"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nv"&gt;VERIFY_LOG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;".claude/state/stop-verify.log"&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; .claude/state

&lt;span class="c"&gt;# No file changes → pure conversation → allow stop&lt;/span&gt;
&lt;span class="nv"&gt;HAS_CHANGES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git diff &lt;span class="nt"&gt;--name-only&lt;/span&gt; 2&amp;gt;/dev/null | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-5&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;HAS_UNTRACKED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git ls-files &lt;span class="nt"&gt;--others&lt;/span&gt; &lt;span class="nt"&gt;--exclude-standard&lt;/span&gt; 2&amp;gt;/dev/null | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s1"&gt;'.claude/state/'&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-5&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HAS_CHANGES&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HAS_UNTRACKED&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Files changed → require VERIFIED log entry in last 5 min&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VERIFY_LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nv"&gt;FIVE_MIN_AGO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-v-5M&lt;/span&gt; +%s 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'5 minutes ago'&lt;/span&gt; +%s 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;echo &lt;/span&gt;0&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;LAST_VERIFY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'|VERIFIED'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VERIFY_LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; 2&amp;gt;/dev/null | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'|'&lt;/span&gt; &lt;span class="nt"&gt;-f1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;LAST_ACTION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'|VERIFY_ACTION'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VERIFY_LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; 2&amp;gt;/dev/null | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'|'&lt;/span&gt; &lt;span class="nt"&gt;-f1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LAST_VERIFY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LAST_VERIFY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-gt&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FIVE_MIN_AGO&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; 2&amp;gt;/dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LAST_ACTION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LAST_ACTION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-gt&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FIVE_MIN_AGO&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; 2&amp;gt;/dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
            &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;|STOP_ALLOWED"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VERIFY_LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
            &lt;span class="nb"&gt;exit &lt;/span&gt;0
        &lt;span class="k"&gt;fi
    fi
fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;|STOP_BLOCKED"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VERIFY_LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⛔ BLOCKED: files changed but no verification logged in last 5 min."&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Required: log a VERIFY_ACTION + VERIFIED entry, e.g.:"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'   echo "$(date +%s)|VERIFY_ACTION|ran npm test, 3 failures" &amp;gt;&amp;gt; .claude/state/stop-verify.log'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'   echo "$(date +%s)|VERIFIED" &amp;gt;&amp;gt; .claude/state/stop-verify.log'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
&lt;span class="nb"&gt;exit &lt;/span&gt;2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;Drop into &lt;code&gt;.claude/hooks/&lt;/code&gt; and wire it up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;.claude/settings.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Stop"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/verify-before-stop.sh"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart your Claude Code session. Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  How verification works in practice
&lt;/h2&gt;

&lt;p&gt;When Claude tries to end after edits, the hook blocks with stderr telling it exactly what to log. The model then has to either:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# A) Actually verify&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;npm &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;span class="c"&gt;# ...test output...&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;|VERIFY_ACTION|npm test all green"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .claude/state/stop-verify.log
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;|VERIFIED"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .claude/state/stop-verify.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# B) Admit it didn't verify&lt;/span&gt;
&lt;span class="c"&gt;# Claude in next turn: "I couldn't actually run the tests — they require a&lt;/span&gt;
&lt;span class="c"&gt;# dev DB. Here's what I changed; please run npm test manually before merging."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Either way, you stop getting false "done" reports.&lt;/p&gt;

&lt;h2&gt;
  
  
  4 design decisions worth defending
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Why a 5-minute TTL on &lt;code&gt;VERIFIED&lt;/code&gt;?&lt;/strong&gt;&lt;br&gt;
Long enough that legitimate verify→stop sequences don't trip on it. Short enough that stale entries from yesterday don't accidentally allow stops.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Why &lt;code&gt;git diff&lt;/code&gt; as the change signal?&lt;/strong&gt;&lt;br&gt;
Catches both edits and untracked new files. Excludes &lt;code&gt;.claude/state/&lt;/code&gt; (the hook's own writes) to avoid self-triggering loops.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Why exit 2 (not 1)?&lt;/strong&gt;&lt;br&gt;
Claude Code treats exit 2 as a Stop block with stderr passed to the model. Exit 1 is treated as a generic error and may not surface the message correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Why explicit &lt;code&gt;VERIFY_ACTION&lt;/code&gt; + &lt;code&gt;VERIFIED&lt;/code&gt; two-line pattern?&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;VERIFY_ACTION&lt;/code&gt; line forces the model to commit &lt;em&gt;what it verified&lt;/em&gt; in writing — not just claim it. Doubles as an audit trail you can grep later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch out for the cap
&lt;/h2&gt;

&lt;p&gt;Claude Code v2.1.143+ added a built-in safeguard: after ~8 consecutive Stop-hook blocks, the turn ends with a warning regardless. This is intentional — prevents broken hooks from infinite-looping the model. Override with &lt;code&gt;CLAUDE_CODE_STOP_HOOK_BLOCK_CAP=20&lt;/code&gt; if you have legitimate reason for more retries.&lt;/p&gt;

&lt;p&gt;Design your hook to give the model &lt;em&gt;enough information to comply on the next turn&lt;/em&gt;, not on the 8th turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this saved me
&lt;/h2&gt;

&lt;p&gt;After 6 months of using this hook across 14 projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Eliminated&lt;/strong&gt; "AI says tests pass, they didn't" regressions — the #1 source of "wait, I thought you said this worked" merges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forced&lt;/strong&gt; explicit verification logging — &lt;code&gt;.claude/state/stop-verify.log&lt;/code&gt; now reads like an audit trail.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Survived&lt;/strong&gt; conversation compaction — the log file persists, so even after Claude Code summarizes the session, the verification record is still on disk.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Open source
&lt;/h2&gt;

&lt;p&gt;Full source on GitHub: &lt;a href="https://github.com/ianymu/claude-verify-before-stop" rel="noopener noreferrer"&gt;https://github.com/ianymu/claude-verify-before-stop&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT license. Star it if you find it useful — easier to find when you need it again.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you want the rest
&lt;/h2&gt;

&lt;p&gt;I packaged this with 5 more hooks I built over those 12 months — &lt;code&gt;cost-tracker.sh&lt;/code&gt; (logs every $ of Opus spend to &lt;code&gt;costs.jsonl&lt;/code&gt;), &lt;code&gt;block-secrets.sh&lt;/code&gt; (scans Write/Edit/Bash for &lt;code&gt;sk-ant-*&lt;/code&gt;/JWT/AWS keys before commit), &lt;code&gt;force-progress-update.sh&lt;/code&gt; (checkpoint every 5 actions to survive compaction), &lt;code&gt;pre-compact-diary.sh&lt;/code&gt; (preserves WIP context), &lt;code&gt;enforce-autoplan.sh&lt;/code&gt; (no code-write without a plan).&lt;/p&gt;

&lt;p&gt;Full 6-pack is &lt;strong&gt;$49 launch price&lt;/strong&gt; (regular $79), 30-day money-back, instant download via Polar: &lt;a href="https://landing-ianymu.vercel.app" rel="noopener noreferrer"&gt;https://landing-ianymu.vercel.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But honestly — &lt;code&gt;verify-before-stop&lt;/code&gt; alone solves 80% of the pain. Start there. Use it free for a week, then decide if the rest is worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;Curious if you've built similar workflow guards. The verify-before-stop pattern feels obvious in hindsight — what other "lies of completion" patterns have you caught?&lt;/p&gt;

&lt;p&gt;Drop your own hooks in the comments. If they're solid I'll add them to the README with credit.&lt;/p&gt;

&lt;p&gt;— Ian&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I analyzed thousands of founder posts and built a platform to solve the #1 complaint</title>
      <dc:creator>Ian</dc:creator>
      <pubDate>Tue, 10 Mar 2026 12:44:42 +0000</pubDate>
      <link>https://dev.to/yanlong_mu_f5167490b87724/i-analyzed-thousands-of-founder-posts-and-built-a-platform-to-solve-the-1-complaint-dab</link>
      <guid>https://dev.to/yanlong_mu_f5167490b87724/i-analyzed-thousands-of-founder-posts-and-built-a-platform-to-solve-the-1-complaint-dab</guid>
      <description>&lt;p&gt;6 months ago I quit my job to build products solo. The coding was fine. Marketing I could figure out. What almost made me give up was the silence.&lt;/p&gt;

&lt;p&gt;Not "I'm lonely" silence. More like — I just spent three weeks on a feature and I genuinely can't tell if it's brilliant or stupid. And there's nobody to ask.&lt;/p&gt;

&lt;h2&gt;
  
  
  The research
&lt;/h2&gt;

&lt;p&gt;I started reading everything solo founders were posting online. Reddit, HN, IndieHackers, Twitter. Thousands of posts. Same three problems everywhere:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. No accountability&lt;/strong&gt;&lt;br&gt;
You set goals on Monday. By Wednesday they're gone. Nobody notices, nobody cares. Without a co-founder or team, goals are just wishes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Generic advice&lt;/strong&gt;&lt;br&gt;
"Have you tried content marketing?" Thanks. Very helpful. The advice you get from people who don't know your product is almost always useless. Not because they don't care — they just don't have context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Communities built for broadcasting, not connection&lt;/strong&gt;&lt;br&gt;
You scroll, read, maybe post something. It gets 2 upvotes. You move on. There are hundreds of founder communities, but they're all optimized for content consumption, not for actual relationships.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually worked
&lt;/h2&gt;

&lt;p&gt;The one thing that changed my productivity was finding another founder at my exact stage. We did weekly 30-minute calls. Simple format:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did you do this week?&lt;/li&gt;
&lt;li&gt;What's the plan for next week?&lt;/li&gt;
&lt;li&gt;What's blocking you?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My output doubled. Just knowing someone would ask me on Friday whether I did the thing I said I'd do on Monday — that was enough.&lt;/p&gt;

&lt;p&gt;Then he got busy, the calls stopped, and I went back to building alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm building
&lt;/h2&gt;

&lt;p&gt;So I'm making a platform called &lt;strong&gt;ADot&lt;/strong&gt; to solve this at scale.&lt;/p&gt;

&lt;p&gt;The core: AI matches you with a founder at your stage (pre-launch? first 10 users? trying to hit $1K MRR?) for structured weekly check-ins. Not networking events. Not another Slack group. Actual accountability with someone who gets it.&lt;/p&gt;

&lt;p&gt;On top of that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Community feed where milestones get seen (not buried)&lt;/li&gt;
&lt;li&gt;AI recommendations based on what similar products did to grow&lt;/li&gt;
&lt;li&gt;Open-source founder tools (pain point analyzer, competitor tracker)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Current status
&lt;/h2&gt;

&lt;p&gt;Just launched the landing page. Validating whether this resonates or if I'm projecting.&lt;/p&gt;

&lt;p&gt;If you've ever built alone and wished someone was in the trenches with you: &lt;a href="https://adot-lp.vercel.app?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=launch_v1" rel="noopener noreferrer"&gt;https://adot-lp.vercel.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback &amp;gt;&amp;gt; signups. Tell me why this won't work.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>buildinpublic</category>
      <category>sideprojects</category>
      <category>solopreneur</category>
    </item>
    <item>
      <title>I analyzed thousands of founder posts and built a platform to solve the #1 complaint</title>
      <dc:creator>Ian</dc:creator>
      <pubDate>Tue, 10 Mar 2026 12:44:42 +0000</pubDate>
      <link>https://dev.to/yanlong_mu_f5167490b87724/i-analyzed-thousands-of-founder-posts-and-built-a-platform-to-solve-the-1-complaint-bch</link>
      <guid>https://dev.to/yanlong_mu_f5167490b87724/i-analyzed-thousands-of-founder-posts-and-built-a-platform-to-solve-the-1-complaint-bch</guid>
      <description>&lt;p&gt;6 months ago I quit my job to build products solo. The coding was fine. Marketing I could figure out. What almost made me give up was the silence.&lt;/p&gt;

&lt;p&gt;Not "I'm lonely" silence. More like — I just spent three weeks on a feature and I genuinely can't tell if it's brilliant or stupid. And there's nobody to ask.&lt;/p&gt;

&lt;h2&gt;
  
  
  The research
&lt;/h2&gt;

&lt;p&gt;I started reading everything solo founders were posting online. Reddit, HN, IndieHackers, Twitter. Thousands of posts. Same three problems everywhere:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. No accountability&lt;/strong&gt;&lt;br&gt;
You set goals on Monday. By Wednesday they're gone. Nobody notices, nobody cares. Without a co-founder or team, goals are just wishes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Generic advice&lt;/strong&gt;&lt;br&gt;
"Have you tried content marketing?" Thanks. Very helpful. The advice you get from people who don't know your product is almost always useless. Not because they don't care — they just don't have context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Communities built for broadcasting, not connection&lt;/strong&gt;&lt;br&gt;
You scroll, read, maybe post something. It gets 2 upvotes. You move on. There are hundreds of founder communities, but they're all optimized for content consumption, not for actual relationships.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually worked
&lt;/h2&gt;

&lt;p&gt;The one thing that changed my productivity was finding another founder at my exact stage. We did weekly 30-minute calls. Simple format:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did you do this week?&lt;/li&gt;
&lt;li&gt;What's the plan for next week?&lt;/li&gt;
&lt;li&gt;What's blocking you?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My output doubled. Just knowing someone would ask me on Friday whether I did the thing I said I'd do on Monday — that was enough.&lt;/p&gt;

&lt;p&gt;Then he got busy, the calls stopped, and I went back to building alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm building
&lt;/h2&gt;

&lt;p&gt;So I'm making a platform called &lt;strong&gt;ADot&lt;/strong&gt; to solve this at scale.&lt;/p&gt;

&lt;p&gt;The core: AI matches you with a founder at your stage (pre-launch? first 10 users? trying to hit $1K MRR?) for structured weekly check-ins. Not networking events. Not another Slack group. Actual accountability with someone who gets it.&lt;/p&gt;

&lt;p&gt;On top of that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Community feed where milestones get seen (not buried)&lt;/li&gt;
&lt;li&gt;AI recommendations based on what similar products did to grow&lt;/li&gt;
&lt;li&gt;Open-source founder tools (pain point analyzer, competitor tracker)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Current status
&lt;/h2&gt;

&lt;p&gt;Just launched the landing page. Validating whether this resonates or if I'm projecting.&lt;/p&gt;

&lt;p&gt;If you've ever built alone and wished someone was in the trenches with you: &lt;a href="https://adot-lp.vercel.app?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=launch_v1" rel="noopener noreferrer"&gt;https://adot-lp.vercel.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback &amp;gt;&amp;gt; signups. Tell me why this won't work.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>buildinpublic</category>
      <category>sideprojects</category>
      <category>solopreneur</category>
    </item>
  </channel>
</rss>
