<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: sisyphusse1-ops</title>
    <description>The latest articles on DEV Community by sisyphusse1-ops (@sisyphusse1ops).</description>
    <link>https://dev.to/sisyphusse1ops</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3923876%2Facc79695-9c8c-4afe-ab6e-4ba0c9f25ad5.png</url>
      <title>DEV Community: sisyphusse1-ops</title>
      <link>https://dev.to/sisyphusse1ops</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sisyphusse1ops"/>
    <language>en</language>
    <item>
      <title>I shipped cc-audit as a GitHub Action. Now your CLAUDE.md gets linted on every PR.</title>
      <dc:creator>sisyphusse1-ops</dc:creator>
      <pubDate>Sun, 10 May 2026 23:21:14 +0000</pubDate>
      <link>https://dev.to/sisyphusse1ops/i-shipped-cc-audit-as-a-github-action-now-your-claudemd-gets-linted-on-every-pr-5fal</link>
      <guid>https://dev.to/sisyphusse1ops/i-shipped-cc-audit-as-a-github-action-now-your-claudemd-gets-linted-on-every-pr-5fal</guid>
      <description>&lt;p&gt;Quick follow-up to my &lt;a href="https://dev.to/sisyphusse1ops/i-scored-92-public-claudemd-files-against-a-12-rule-baseline-median-score-512-2971"&gt;earlier post&lt;/a&gt; about scanning 492 public &lt;code&gt;CLAUDE.md&lt;/code&gt; files. Takeaway from that scan: median compliance with the 12-rule baseline was &lt;strong&gt;3/12&lt;/strong&gt;. The top-missed rules were rules 9, 10, 12, and 1 — the behavior-file equivalent of skipping unit tests.&lt;/p&gt;

&lt;p&gt;The fix is easy: run a linter. The harder part is remembering to run it.&lt;/p&gt;

&lt;p&gt;So I packaged cc-audit as a &lt;strong&gt;GitHub Action&lt;/strong&gt;. Drop three lines into your repo's workflow, and every push that touches &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt; gets an automatic report in the run summary — plus a hard fail if someone ever pastes a real API key into the behavior file.&lt;/p&gt;

&lt;h2&gt;
  
  
  The workflow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/cc-audit.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cc-audit&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CLAUDE.md'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AGENTS.md'&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CLAUDE.md'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AGENTS.md'&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;audit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sisyphusse1-ops/cc-audit@v1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get
&lt;/h2&gt;

&lt;p&gt;Every matching push/PR runs cc-audit against the file. The run summary shows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;File&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rules covered&lt;/td&gt;
&lt;td&gt;7 / 12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance score&lt;/td&gt;
&lt;td&gt;58 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Leaked secrets&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Status&lt;/td&gt;
&lt;td&gt;warn&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The step fails with a loud &lt;code&gt;::error::&lt;/code&gt; annotation if any leaked-secret pattern is detected — OpenAI keys, Anthropic keys, GitHub PATs, AWS access keys, Stripe live keys, postgres URLs with credentials. Placeholder-aware, so &lt;code&gt;&amp;lt;YOUR_KEY&amp;gt;&lt;/code&gt; and &lt;code&gt;sk-example-...&lt;/code&gt; don't trigger false positives.&lt;/p&gt;

&lt;p&gt;By default it doesn't fail the build on mere rule-coverage warnings, because a 7/12 file isn't "broken" — it's just not thorough. You can flip that with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sisyphusse1-ops/cc-audit@v1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;fail-on-warning&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Auto-install the baseline
&lt;/h2&gt;

&lt;p&gt;There's also a companion action for the &lt;a href="https://github.com/sisyphusse1-ops/claude-code-pro-pack" rel="noopener noreferrer"&gt;claude-code-pro-pack&lt;/a&gt; itself. If your repo doesn't have a &lt;code&gt;CLAUDE.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt; yet, this installs the 12-rule baseline in one step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sisyphusse1-ops/claude-code-pro-pack@v1&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;flavor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;both&lt;/span&gt;            &lt;span class="c1"&gt;# claude | agents | both&lt;/span&gt;
    &lt;span class="na"&gt;install-templates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="c1"&gt;# also copy templates/ and examples/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's polite — skips files that already exist unless you pass &lt;code&gt;overwrite: true&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  End-to-end demo
&lt;/h2&gt;

&lt;p&gt;I shipped a demo repo that uses both actions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;→ &lt;a href="https://github.com/sisyphusse1-ops/ccpp-demo" rel="noopener noreferrer"&gt;github.com/sisyphusse1-ops/ccpp-demo&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Check the Actions tab — you'll see real runs installing the pack, then linting it. The install workflow is &lt;code&gt;workflow_dispatch&lt;/code&gt; so you can fork the repo, trigger the install on your fork, and watch the same thing happen on your own files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why bother
&lt;/h2&gt;

&lt;p&gt;Three reasons I wrote this and why you might want to run it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Behavior drift.&lt;/strong&gt; CLAUDE.md files get edited casually by whoever's on-call for the agent that week. Compliance scores drift down over months. A linter in CI catches it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secret leaks.&lt;/strong&gt; The 492-file scan found zero real leaks, which is great — but the base rate of pasting &lt;code&gt;.env&lt;/code&gt; contents into docs is nonzero across the wider population. A 40 ms check on every PR catches it before it hits the default branch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Onboarding.&lt;/strong&gt; New engineer opens your repo. CI report in the PR summary shows them the 12-rule baseline exists, which rules your file covers, and which it doesn't. The explanation is in the action output, not in a wiki page they won't find.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Install time
&lt;/h2&gt;

&lt;p&gt;Workflow file: 3 lines.&lt;br&gt;&lt;br&gt;
CI overhead per run: 20-30 seconds on &lt;code&gt;ubuntu-latest&lt;/code&gt; (no Docker image pull, just &lt;code&gt;checkout&lt;/code&gt; + Python stdlib).&lt;br&gt;&lt;br&gt;
Token cost: zero.&lt;br&gt;&lt;br&gt;
Cost to break your build: zero if no secrets leaked.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repos
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;cc-audit&lt;/strong&gt; (linter + action) — &lt;a href="https://github.com/sisyphusse1-ops/cc-audit" rel="noopener noreferrer"&gt;github.com/sisyphusse1-ops/cc-audit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;claude-code-pro-pack&lt;/strong&gt; (baseline rules + installer action) — &lt;a href="https://github.com/sisyphusse1-ops/claude-code-pro-pack" rel="noopener noreferrer"&gt;github.com/sisyphusse1-ops/claude-code-pro-pack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ccpp-demo&lt;/strong&gt; (both in action, end-to-end) — &lt;a href="https://github.com/sisyphusse1-ops/ccpp-demo" rel="noopener noreferrer"&gt;github.com/sisyphusse1-ops/ccpp-demo&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three MIT.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If this saves you a merge review, or catches a leaked key, let me know. That's the use case I optimized for.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>ai</category>
      <category>devops</category>
      <category>actions</category>
    </item>
    <item>
      <title>I scored 492 public CLAUDE.md files against a 12-rule baseline. Median: 3/12.</title>
      <dc:creator>sisyphusse1-ops</dc:creator>
      <pubDate>Sun, 10 May 2026 23:06:40 +0000</pubDate>
      <link>https://dev.to/sisyphusse1ops/i-scored-92-public-claudemd-files-against-a-12-rule-baseline-median-score-512-2971</link>
      <guid>https://dev.to/sisyphusse1ops/i-scored-92-public-claudemd-files-against-a-12-rule-baseline-median-score-512-2971</guid>
      <description>&lt;p&gt;Last week I wrote a tiny Python linter — &lt;a href="https://github.com/sisyphusse1-ops/cc-audit" rel="noopener noreferrer"&gt;cc-audit&lt;/a&gt; — that scores a &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt; file against twelve behavior rules for AI coding agents. I ran it against 492 real public CLAUDE.md files pulled from GitHub code search.&lt;/p&gt;

&lt;p&gt;Here's what the ecosystem actually looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Pulled the first 500 public &lt;code&gt;CLAUDE.md&lt;/code&gt; filename matches from GitHub code search&lt;/li&gt;
&lt;li&gt;492 were fetchable at scan time (8 had been moved, renamed, or gated behind forks)&lt;/li&gt;
&lt;li&gt;Each file scored on 12 behavior rules via keyword-signal matching (does the file address each rule?)&lt;/li&gt;
&lt;li&gt;Separately scanned for leaked secrets (API keys, database URLs, private keys) with placeholder-aware filtering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 12 rules come from the &lt;a href="https://github.com/sisyphusse1-ops/claude-code-pro-pack" rel="noopener noreferrer"&gt;claude-code-pro-pack&lt;/a&gt; baseline (Karpathy's original 4 + 8 more covering agent-orchestration failure modes):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read adjacent / existing code before writing new code&lt;/li&gt;
&lt;li&gt;Don't invent APIs, imports, or file paths&lt;/li&gt;
&lt;li&gt;Surface partial success — never silent-fail&lt;/li&gt;
&lt;li&gt;Cap per-task token budget; stop and ask when hit&lt;/li&gt;
&lt;li&gt;Match the project's existing style and conventions&lt;/li&gt;
&lt;li&gt;One task per run; don't bundle unrelated changes&lt;/li&gt;
&lt;li&gt;Surface conflicting patterns instead of averaging them&lt;/li&gt;
&lt;li&gt;Run tests before declaring done&lt;/li&gt;
&lt;li&gt;Don't edit out of scope without saying so&lt;/li&gt;
&lt;li&gt;Summarize every tool call's effect in one line&lt;/li&gt;
&lt;li&gt;Stop and ask if stuck or ambiguous&lt;/li&gt;
&lt;li&gt;Visible fail states — never hide errors&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Files scanned:&lt;/strong&gt; 492&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Size:&lt;/strong&gt; min 11 B, median 3.9 KB, mean 7.5 KB, max 167 KB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance:&lt;/strong&gt; median 3/12, mean 3.54/12, max 10/12&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perfect (12/12) scores:&lt;/strong&gt; 0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-score files:&lt;/strong&gt; 41 (8%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top quartile (≥9/12):&lt;/strong&gt; 11 files (2.2%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Files with leaked production secrets:&lt;/strong&gt; 0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The one-sentence version: the median CLAUDE.md covers a quarter of the behavior rules that matter. The top 2% cover three-quarters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Most-missed rules (out of 492)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Rule&lt;/th&gt;
&lt;th&gt;Files missing&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Don't edit out of scope&lt;/td&gt;
&lt;td&gt;482&lt;/td&gt;
&lt;td&gt;98%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Summarize tool calls&lt;/td&gt;
&lt;td&gt;464&lt;/td&gt;
&lt;td&gt;94%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Visible fail states&lt;/td&gt;
&lt;td&gt;448&lt;/td&gt;
&lt;td&gt;91%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Read adjacent code&lt;/td&gt;
&lt;td&gt;446&lt;/td&gt;
&lt;td&gt;91%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Surface partial success&lt;/td&gt;
&lt;td&gt;414&lt;/td&gt;
&lt;td&gt;84%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Don't invent APIs&lt;/td&gt;
&lt;td&gt;383&lt;/td&gt;
&lt;td&gt;78%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;One task per run&lt;/td&gt;
&lt;td&gt;361&lt;/td&gt;
&lt;td&gt;73%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Token budget / stop-and-ask&lt;/td&gt;
&lt;td&gt;350&lt;/td&gt;
&lt;td&gt;71%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Stop and ask if stuck&lt;/td&gt;
&lt;td&gt;272&lt;/td&gt;
&lt;td&gt;55%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Surface pattern conflicts&lt;/td&gt;
&lt;td&gt;252&lt;/td&gt;
&lt;td&gt;51%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Match project style&lt;/td&gt;
&lt;td&gt;222&lt;/td&gt;
&lt;td&gt;45%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Run tests&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;13%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Most-hit rules
&lt;/h2&gt;

&lt;p&gt;The one rule nearly everyone covers is &lt;strong&gt;run tests&lt;/strong&gt; — only 13% missed it. That tracks. Every CLAUDE.md template floating around for the last year includes some version of "run the tests."&lt;/p&gt;

&lt;p&gt;The second-most-covered is &lt;strong&gt;match project style&lt;/strong&gt; (55% coverage), mostly because it's also the rule people quote from Karpathy's original.&lt;/p&gt;

&lt;p&gt;Everything else sits in the "some files remember, most don't" zone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the top misses cost you real time
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Rule 9 (don't edit out of scope) — missed by 98% of files.&lt;/strong&gt; Without this, an agent "helpfully" reformats your whole file while fixing a one-line bug. Resulting PR: 500 lines of noise wrapping 3 lines of fix. Reviewers drown; real changes get lost. Costs a single sentence to add.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 10 (summarize tool calls) — missed by 94%.&lt;/strong&gt; Without this, you get verbose explanations of "what I'm about to do" and very little "what I actually did." In a long session you lose the thread. One sentence: &lt;em&gt;"After every tool call, write one line: what you changed and which file."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 12 (visible fail states) — missed by 91%.&lt;/strong&gt; This is the "migration completed successfully" problem in a different skin — the agent hides a failure in a paragraph of success prose, or just doesn't surface the stack trace. Fix: &lt;em&gt;"When anything fails, quote the error verbatim and stop. Never paraphrase."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 1 (read adjacent code first) — missed by 91%.&lt;/strong&gt; Top cause of duplicate functions and inconsistent patches. An agent that doesn't read adjacent code will happily implement a utility that already exists three lines away, or patch one half of a codebase in a style that conflicts with the other half.&lt;/p&gt;

&lt;p&gt;Rules 9, 10, 12, and 1 are each one sentence. Adding all four moves a median file from 3/12 to 7/12.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the zero-score files looked like
&lt;/h2&gt;

&lt;p&gt;41 files scored 0/12. They split into two shapes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A single paragraph.&lt;/strong&gt; Often something like "This project uses Python. Be careful." — and that's the entire file. A project description wearing a CLAUDE.md name tag.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A README dump.&lt;/strong&gt; The entire &lt;code&gt;README.md&lt;/code&gt; copy-pasted in verbatim with no behavior rules at all. Good project context, zero agent guidance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither shape is worthless for onboarding. Neither is reducing agent failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the top quartile did differently
&lt;/h2&gt;

&lt;p&gt;The 11 files scoring ≥9/12 shared four patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit tool-calling preferences (&lt;em&gt;"use &lt;code&gt;rg&lt;/code&gt; not &lt;code&gt;grep&lt;/code&gt;"&lt;/em&gt;, &lt;em&gt;"use &lt;code&gt;fd&lt;/code&gt; not &lt;code&gt;find&lt;/code&gt;"&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;Named failure modes to avoid (&lt;em&gt;"don't claim migration success if rows were skipped"&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;A scoped-edits rule (&lt;em&gt;"don't touch files outside the current task without asking first"&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;A style-matching rule (&lt;em&gt;"check 3 nearby files before choosing formatting"&lt;/em&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those four additions alone explain most of the gap between median and top quartile.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about leaked secrets?
&lt;/h2&gt;

&lt;p&gt;I was genuinely curious whether people paste real API keys into CLAUDE.md files. They mostly don't.&lt;/p&gt;

&lt;p&gt;Of 492 files scanned:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;0 real leaked secrets&lt;/strong&gt; matching strict patterns (OpenAI keys, Anthropic keys, Google API keys, AWS access keys, GitHub tokens, Stripe live keys)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4 postgres connection strings&lt;/strong&gt; that looked like secrets at first match — all of them turned out to be localhost + dummy users (&lt;code&gt;user:password@localhost&lt;/code&gt;), i.e. example config that would only "work" against someone's local dev box&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 literal placeholder&lt;/strong&gt; (&lt;code&gt;postgresql://USER:***@HOST/DATABASE&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The placeholder-filter in the scanner caught most &lt;code&gt;sk-example&lt;/code&gt;, &lt;code&gt;&amp;lt;YOUR_KEY&amp;gt;&lt;/code&gt;, and &lt;code&gt;***&lt;/code&gt;-style examples. Whatever paranoia you had about CLAUDE.md being a secret-leak vector: this data says it isn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to actually do
&lt;/h2&gt;

&lt;p&gt;If you maintain a CLAUDE.md or AGENTS.md, these are the highest-leverage edits you can make in ninety seconds:&lt;/p&gt;

&lt;p&gt;Add these four sentences anywhere in the file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- When fixing a bug, don't edit files outside the immediate scope unless you say so first.
- After every tool call, write one line: what you changed and which file.
- If anything fails, quote the error verbatim and stop. Never paraphrase failures.
- Before writing new code, read the adjacent 20–40 lines of existing code in the same file.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the ninety-second edit isn't enough context, the full 12-rule baseline as a drop-in:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;→ &lt;a href="https://github.com/sisyphusse1-ops/claude-code-pro-pack" rel="noopener noreferrer"&gt;github.com/sisyphusse1-ops/claude-code-pro-pack&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to score your existing one:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;→ &lt;a href="https://github.com/sisyphusse1-ops/cc-audit" rel="noopener noreferrer"&gt;github.com/sisyphusse1-ops/cc-audit&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/sisyphusse1-ops/cc-audit/main/cc_audit.py &lt;span class="nt"&gt;-o&lt;/span&gt; cc_audit.py
python3 cc_audit.py CLAUDE.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One file, stdlib only, 40 ms on a 10 KB file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology caveats
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The rule check is a keyword-signal pass. It checks whether the file mentions each concern, not whether the wording is good. A file that mentions "tests" and "scope" gets credit for those rules even if the phrasing would embarrass you.&lt;/li&gt;
&lt;li&gt;The 3/12 median is a floor for coverage, not ceiling for quality.&lt;/li&gt;
&lt;li&gt;A thoughtful 6/12 file easily beats a formulaic 10/12 one.&lt;/li&gt;
&lt;li&gt;I deliberately did not score for: accurate project facts, prose quality, tone, or structure — only behavior-rule coverage.&lt;/li&gt;
&lt;li&gt;GitHub code search returns fewer than the full 23,484 indexed CLAUDE.md files; a different 492 would shift the numbers a little but not the shape.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Raw data
&lt;/h2&gt;

&lt;p&gt;The full per-file results are in &lt;a href="https://github.com/sisyphusse1-ops/cc-audit/blob/main/data/scan-500.json" rel="noopener noreferrer"&gt;the scan-500 JSON&lt;/a&gt; on the cc-audit repo. Each entry has repo name, file size, and compliance score.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If this landed, send it to the one person you know who writes behavior files for AI coding agents. There's a decent chance their current file scores 3/12 and four extra sentences would push it to 7/12.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>I built a coding agent that runs on Gemma 4 — here's what 2B parameters can actually do</title>
      <dc:creator>sisyphusse1-ops</dc:creator>
      <pubDate>Sun, 10 May 2026 22:56:37 +0000</pubDate>
      <link>https://dev.to/sisyphusse1ops/i-built-a-coding-agent-that-runs-on-gemma-4-heres-what-2b-parameters-can-actually-do-a80</link>
      <guid>https://dev.to/sisyphusse1ops/i-built-a-coding-agent-that-runs-on-gemma-4-heres-what-2b-parameters-can-actually-do-a80</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/sisyphusse1-ops/gemma-coder" rel="noopener noreferrer"&gt;&lt;strong&gt;gemma-coder&lt;/strong&gt;&lt;/a&gt; — a single-file Python CLI that turns Gemma 4 into an agentic coding assistant. It reads your &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt; rulebook, uses a model-agnostic XML tool protocol, and ships the 12-rule &lt;a href="https://github.com/sisyphusse1-ops/claude-code-pro-pack" rel="noopener noreferrer"&gt;claude-code-pro-pack&lt;/a&gt; baseline as the default behavior file.&lt;/p&gt;

&lt;p&gt;The interesting part isn't the loop — it's that the whole thing works against &lt;strong&gt;Gemma 4 E2B (2 billion effective parameters)&lt;/strong&gt; running locally. The same file runs against 31B on cloud for the power users, E4B on a phone, E2B on a Raspberry Pi 5. Same protocol, same rulebook, different scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Gemma 4 E2B specifically
&lt;/h2&gt;

&lt;p&gt;The obvious submission path is to reach for 31B and flex. I went the other way. Three reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. E2B is the one that demonstrates the Gemma 4 story.&lt;/strong&gt; Running &lt;em&gt;server-grade&lt;/em&gt; models in the cloud is boring — OpenAI and Anthropic do that better. Running a 2B-effective model on hardware that sits in your living room is the unique capability unlock Google shipped this month. If a submission doesn't exercise that, it's a different model's submission wearing Gemma's name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. It forces better engineering.&lt;/strong&gt; A 31B model tolerates sloppy prompts. E2B doesn't. Every line of the system prompt has to earn its place. That's a better stress test for the agent architecture, and the fixes you make for E2B make the 31B path faster and cheaper too.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Privacy and offline.&lt;/strong&gt; Coding agents handle codebases with credentials, client IP, unreleased features. An agent that runs fully local is the only agent my legal team hasn't twitched at. E2B makes "fully local" practical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Smallest clean run I captured:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;gemma-coder &lt;span class="s2"&gt;"Add a one-line docstring to every function in src.py"&lt;/span&gt;

━━━ step 1/12 ━━━
Read src.py → identify fns.
→ tool: read_file&lt;span class="o"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;"path"&lt;/span&gt;: &lt;span class="s2"&gt;"src.py"&lt;/span&gt;&lt;span class="o"&gt;})&lt;/span&gt;
← &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"content"&lt;/span&gt;: &lt;span class="s2"&gt;"def add(a, b):&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;    return a + b&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;def sub(a, b):&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;    return a - b&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;, ...&lt;span class="o"&gt;}&lt;/span&gt;

━━━ step 2/12 ━━━
Add docstrings → rewrite src.py.
→ tool: write_file&lt;span class="o"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;"path"&lt;/span&gt;: &lt;span class="s2"&gt;"src.py"&lt;/span&gt;, &lt;span class="s2"&gt;"content"&lt;/span&gt;: &lt;span class="s2"&gt;"def add(...):&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;    &lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s2"&gt;Add two numbers.&lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s2"&gt;..."&lt;/span&gt;&lt;span class="o"&gt;})&lt;/span&gt;
← &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"bytes"&lt;/span&gt;: 124&lt;span class="o"&gt;}&lt;/span&gt;

━━━ step 3/12 ━━━
Done.
→ tool: &lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="o"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;"summary"&lt;/span&gt;: &lt;span class="s2"&gt;"Added one-line docstrings to add() and sub() in src.py"&lt;/span&gt;&lt;span class="o"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three steps. No re-reads, no wasted calls. That's what "narrow tool scope + rulebook baseline" buys you on a 2B model.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tool protocol
&lt;/h3&gt;

&lt;p&gt;Gemma 4 doesn't have native OpenAI-style function calling. Instead of fighting that, I treated it as a feature: the CLI uses a simple XML-framed JSON contract that every capable LLM can follow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;&amp;lt;tool&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"read_file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/main.py"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;&amp;lt;/tool&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results come back as &lt;code&gt;&amp;lt;tool_result&amp;gt;...&amp;lt;/tool_result&amp;gt;&lt;/code&gt; in the next user turn. Six tools total: &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;write_file&lt;/code&gt;, &lt;code&gt;search&lt;/code&gt;, &lt;code&gt;run&lt;/code&gt;, &lt;code&gt;patch&lt;/code&gt;, &lt;code&gt;done&lt;/code&gt;. That's it.&lt;/p&gt;

&lt;p&gt;Benefit: the same loop runs against &lt;strong&gt;any&lt;/strong&gt; LLM that can obey the format. I tested the same file against Gemma 4 31B, Qwen 2.5 Coder 32B, and Llama 3.3. All three worked. That portability is a byproduct of respecting Gemma 4's actual capabilities instead of bolting on an abstraction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rulebook-first system prompt
&lt;/h3&gt;

&lt;p&gt;The system prompt is short by design: tool schema + the project's &lt;code&gt;CLAUDE.md&lt;/code&gt; (or &lt;code&gt;AGENTS.md&lt;/code&gt;) dropped in verbatim. No framework prose, no chain-of-thought incantations, no "you are a helpful assistant."&lt;/p&gt;

&lt;p&gt;The 12-rule pack that ships as the default rulebook closes the four most common Gemma 4 failure modes I saw in testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token spirals&lt;/strong&gt; — rule 6 caps per-task token budget so the model doesn't loop on the same 4KB of context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent partial failures&lt;/strong&gt; — rule 12 requires visible fail states; no more "migration completed" when it skipped rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-pattern pollution&lt;/strong&gt; — rule 7 forces the agent to surface conflicts between codebase patterns instead of averaging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adjacent-code blindness&lt;/strong&gt; — rule 8 mandates reading surrounding code before writing; fixes duplicate-function drift&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't abstract. Each rule earned its place from a specific failure in actual runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retry-with-backoff
&lt;/h3&gt;

&lt;p&gt;Cloud gateways can return transient 5xx mid-session. &lt;code&gt;call_openrouter&lt;/code&gt; wraps the HTTP call with 3-attempt exponential backoff (3s / 9s / 27s). Not glamorous, but it's the difference between a flaky demo and a shippable tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Gemma 4 E2B actually can and can't do
&lt;/h2&gt;

&lt;p&gt;What it handles cleanly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rename a function across 2-3 files&lt;/li&gt;
&lt;li&gt;Add docstrings and type hints&lt;/li&gt;
&lt;li&gt;Fix a failing unit test when the fix is local&lt;/li&gt;
&lt;li&gt;Draft a README section from existing code&lt;/li&gt;
&lt;li&gt;Apply a lint-style pattern fix consistently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What makes it struggle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-file refactors with cross-file dependency tracking (context pressure kills it around 50k tokens)&lt;/li&gt;
&lt;li&gt;Novel architecture decisions (it's 2B params, not 100B — manage expectations)&lt;/li&gt;
&lt;li&gt;Long-running debugging where each step depends on the last&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the "boring 80%" of coding agent work, E2B is remarkable. For the exciting 20%, use a bigger model. Now there's a CLI that lets you pick.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# OpenRouter free tier, no local setup&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/sisyphusse1-ops/gemma-coder/main/gemma_coder.py &lt;span class="nt"&gt;-o&lt;/span&gt; gemma_coder.py
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-or-...
python3 gemma_coder.py &lt;span class="s2"&gt;"your task here"&lt;/span&gt;

&lt;span class="c"&gt;# or local Ollama&lt;/span&gt;
ollama pull gemma4:e2b
python3 gemma_coder.py &lt;span class="nt"&gt;--provider&lt;/span&gt; ollama &lt;span class="nt"&gt;--model&lt;/span&gt; gemma4:e2b &lt;span class="s2"&gt;"your task here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One file. Python stdlib only. No framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;Repo: &lt;strong&gt;github.com/sisyphusse1-ops/gemma-coder&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Companion projects referenced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/sisyphusse1-ops/claude-code-pro-pack" rel="noopener noreferrer"&gt;claude-code-pro-pack&lt;/a&gt; — the 12-rule baseline it loads&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/sisyphusse1-ops/cc-audit" rel="noopener noreferrer"&gt;cc-audit&lt;/a&gt; — lints any CLAUDE.md against those rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three are MIT.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;I chose &lt;strong&gt;Gemma 4 E2B&lt;/strong&gt; because the submission is fundamentally about answering: &lt;em&gt;can a 2B-effective-parameter model actually drive a useful coding agent?&lt;/em&gt; Using 31B would have sidestepped the question. The value of the project is precisely that it exercises the smallest Gemma 4 variant and finds the envelope where it succeeds.&lt;/p&gt;

&lt;p&gt;What E2B unlocked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runs on a Raspberry Pi 5.&lt;/strong&gt; 5W of power, $75 of hardware, no cloud dependency, no API keys, no rate limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy by default.&lt;/strong&gt; Credentials, client code, unreleased features stay on the machine. "Fully local" stops being a wish-list item.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forces rulebook discipline.&lt;/strong&gt; The constraint of a small model made every part of the system prompt earn its place. Result: a cleaner tool protocol and a rulebook that transfers directly to larger models too.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Model selection was not "which is biggest." It was "which Gemma 4 variant makes the strongest argument for the unique capability the family ships."&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Thanks for reading. If you try it, open an issue with your model + task + result — I'm collecting real-world envelope data for a follow-up post on where each Gemma 4 variant tops out.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
    <item>
      <title>I read 31 pages of Anthropic prompting guidance so you don't have to — here's what actually changes with Claude 4.7</title>
      <dc:creator>sisyphusse1-ops</dc:creator>
      <pubDate>Sun, 10 May 2026 22:56:10 +0000</pubDate>
      <link>https://dev.to/sisyphusse1ops/i-read-31-pages-of-anthropic-prompting-guidance-so-you-dont-have-to-heres-what-actually-changes-1kd9</link>
      <guid>https://dev.to/sisyphusse1ops/i-read-31-pages-of-anthropic-prompting-guidance-so-you-dont-have-to-heres-what-actually-changes-1kd9</guid>
      <description>&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;Claude Opus 4.7 follows prompts &lt;strong&gt;literally&lt;/strong&gt;. Generic 4.6-era prompts like "review this contract" or "summarize this report" underperform now, not because the model got worse but because 4.7 stopped guessing at unstated structure.&lt;/p&gt;

&lt;p&gt;Six shifts you need to internalize, plus a rewrite checklist you can apply to any existing prompt in under a minute.&lt;/p&gt;

&lt;h2&gt;
  
  
  The six shifts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Name every output. Name every boundary.
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;4.6-era:&lt;/strong&gt; &lt;code&gt;Review this contract.&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4.7-ready:&lt;/strong&gt; &lt;code&gt;Review this contract. Flag risks per clause. Rate severity 1-5. Suggest one rewrite per risky clause. Return as a table with columns: Clause | Risk | Severity | Rewrite.&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;4.7 does exactly what the sentence says. If you don't name the columns, you get whatever columns it picks. If you don't cap severity levels, you get adjective soup.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Length scales with input now. Cap it explicitly.
&lt;/h3&gt;

&lt;p&gt;Long input plus the word &lt;code&gt;summarize&lt;/code&gt; used to give you a roughly fixed-length summary. Now it gives you a long summary. Because the input was long.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Old:&lt;/strong&gt; &lt;code&gt;Summarize this report.&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New:&lt;/strong&gt; &lt;code&gt;Summarize this report in exactly 5 bullets. Each bullet under 15 words. First word of each bullet is an action verb.&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Negative instructions don't stick. Say what TO do.
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Don't use jargon&lt;/code&gt; is still in the context. 4.7 just doesn't reliably change behavior from it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Old:&lt;/strong&gt; &lt;code&gt;Don't use jargon. Don't sound like a marketer.&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New:&lt;/strong&gt; &lt;code&gt;Write in plain English a 16-year-old could read aloud. Use short concrete words. Replace "leverage" with "use". Replace "scalable" with "works at any size".&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rule of thumb: every negative instruction rewrites as a positive one plus a concrete swap example.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Action verbs ship specific artifacts.
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Can you help me with the email?&lt;/code&gt; produces a helpful-but-vague paragraph. Action verbs produce a draft.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Old:&lt;/strong&gt; &lt;code&gt;Can you help me with the email?&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Open Gmail. Find &amp;lt;contact&amp;gt; and read our last thread.
  Draft the reply email. Final draft. Send-ready.
  Goal: book a 30-min meeting by Friday.
  Length: under 90 words.
  Tone: confident, casual, specific.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each verb at the top (&lt;code&gt;Open&lt;/code&gt;, &lt;code&gt;Find&lt;/code&gt;, &lt;code&gt;Draft&lt;/code&gt;) commits 4.7 to producing a shippable artifact, not discussing one.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Fewer tool calls, more reasoning between. Ask for aggressive search if you need it.
&lt;/h3&gt;

&lt;p&gt;4.7 calls tools less aggressively than 4.6 did. It reasons more between calls. Usually this is a quality lift. Sometimes you explicitly want broad search.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Use web search aggressively. Verify every claim with at least 2 sources before answering.&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Colder default tone. Name the warmth if you want it back.
&lt;/h3&gt;

&lt;p&gt;4.7 dropped the "great question!" energy and most emojis. If your product brand needs warmer voice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Use a warm, conversational tone. Acknowledge my framing before answering.&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even better — paste 2-3 reference sentences in the voice you want. 4.7 matches rhythm well.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one phrase that keeps delivering
&lt;/h2&gt;

&lt;p&gt;Anthropic's own 4.7 doc includes this line, and it has become the single highest-leverage addition you can staple onto any creative or open-ended prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Go beyond the basics. Polish like it's a real client deliverable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Paired with a section-by-section brief, it consistently pushes 4.7 past the literal-minimum output. I've tested this on landing pages, PR documents, legal memos, and code refactors. Same pattern, same lift.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full landing page example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a landing page for my AI consultancy.

Sections (in order):
- Hero (headline + subheadline + CTA)
- Logo bar (6 client placeholders)
- 3 case-study cards (problem / what I did / result)
- Service blocks (4)
- Testimonial carousel (3 quotes)
- About (180-word bio + headshot placeholder)
- Newsletter signup
- Footer

Style: editorial, serif headlines, sans-serif body, generous whitespace.
Animations: subtle on scroll. No purple gradients.

Go beyond the basics. Polish like it's a real client deliverable.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The rewrite checklist
&lt;/h2&gt;

&lt;p&gt;Run any prompt through this before sending it to 4.7:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Every output is &lt;strong&gt;named&lt;/strong&gt; (format, columns, order, length)&lt;/li&gt;
&lt;li&gt;[ ] Every length is &lt;strong&gt;capped&lt;/strong&gt; (words, bullets, rows)&lt;/li&gt;
&lt;li&gt;[ ] Zero negative instructions — every "don't / no / avoid" rewritten as "do X with example"&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Action verbs first&lt;/strong&gt; (Open, Draft, Build, Flag, Summarize — not "can you help…")&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Tool-use preference stated&lt;/strong&gt; if it matters (&lt;code&gt;Use web search aggressively&lt;/code&gt; or &lt;code&gt;Answer from training, no tools&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Tone named&lt;/strong&gt;, with 2-3 reference sentences if you want warmth back&lt;/li&gt;
&lt;li&gt;[ ] For creative work, the quality lift phrase is appended&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  One place the pattern helps the most: agent behavior files
&lt;/h2&gt;

&lt;p&gt;If you use Claude Code, Codex, or Cursor with a &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt; in your project root, the same rules apply to those files. Negative instructions ("don't use jargon", "don't hallucinate") age poorly. Rewriting them as positive imperatives with concrete examples measurably improves compliance.&lt;/p&gt;

&lt;p&gt;I bundled a 12-rule &lt;code&gt;CLAUDE.md&lt;/code&gt; template (Karpathy's original 4 + 8 more covering agent-orchestration failure modes) plus a few working skills. It's a drop-in. Free, MIT:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;→ github.com/sisyphusse1-ops/claude-code-pro-pack&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And a tiny Python linter that scores any existing &lt;code&gt;CLAUDE.md&lt;/code&gt; against the 12 rules and flags leaked secrets:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;→ github.com/sisyphusse1-ops/cc-audit&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both are single-commit additions to your repo. No install, no framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic's 31-page Claude 4.7 prompting guide (PDF, official)&lt;/li&gt;
&lt;li&gt;Ruben Hassid's digest at ruben.substack.com/p/prompt-47&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;If this landed, share with one person who's still prompting 4.7 like it's 4.6. That's the thing that actually helps the work.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>anthropic</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
