<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: dualform</title>
    <description>The latest articles on DEV Community by dualform (@dualform).</description>
    <link>https://dev.to/dualform</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3983367%2Fe1135311-8f4c-4c62-9e4e-138ae7335d54.png</url>
      <title>DEV Community: dualform</title>
      <link>https://dev.to/dualform</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dualform"/>
    <language>en</language>
    <item>
      <title>Make your AI code review refuse to fake a PASS</title>
      <dc:creator>dualform</dc:creator>
      <pubDate>Sun, 14 Jun 2026 04:22:29 +0000</pubDate>
      <link>https://dev.to/dualform/make-your-ai-code-review-refuse-to-fake-a-pass-1e98</link>
      <guid>https://dev.to/dualform/make-your-ai-code-review-refuse-to-fake-a-pass-1e98</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4cuobh24fxpt8ap39k0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4cuobh24fxpt8ap39k0.png" alt="review-audit on a changed file" width="799" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you ask a coding agent to "review this change," you often get a confident &lt;strong&gt;PASS&lt;/strong&gt; — one that skipped axes, cited no evidence, and never ran the tests. A green light you can't trust is worse than no review: it tells you to ship.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;review-audit&lt;/code&gt; is a small Claude Code skill built around one rule: &lt;strong&gt;an axis is "audited" only when it can show evidence.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;A single, read-only pass over your change across six axes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;correctness&lt;/strong&gt; — boundary / null / type / timezone mistakes, with a reproducing input where possible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;wiring (anti-Potemkin)&lt;/strong&gt; — new code that's defined but never imported or called is &lt;em&gt;dead&lt;/em&gt;, not done; it greps the call sites and counts references&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;security&lt;/strong&gt; — secrets, command/SQL injection, unsafe eval or deserialize&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;test efficacy&lt;/strong&gt; — tests that would pass even against an unimplemented stub (tautologies, no asserts, happy-path only)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;spec compliance&lt;/strong&gt; — when a spec exists, it checks the acceptance criteria&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;regression&lt;/strong&gt; — it actually runs the tests/build and reports the exit code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For each axis it writes down &lt;em&gt;how&lt;/em&gt; it checked: a &lt;code&gt;file:line&lt;/code&gt;, a grep result, a command and its exit code. "I didn't check this" is a first-class output, not a silent gap. And &lt;strong&gt;an unexamined axis cannot be part of a PASS.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Read-only, by design
&lt;/h2&gt;

&lt;p&gt;It proposes fixes but does not apply them. A before/after checksum confirms your tree was not touched — the audit can't quietly "fix" something and then call it reviewed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why evidence matters
&lt;/h2&gt;

&lt;p&gt;The failure mode of AI review isn't being wrong — it's being &lt;em&gt;plausibly&lt;/em&gt; right. "Looks fine" reads like a pass. So the skill won't PASS regression or wiring on "looks fine": regression needs a real run with an exit code; wiring needs concrete grep or &lt;code&gt;file:line&lt;/code&gt;. No evidence, no PASS on that axis.&lt;/p&gt;

&lt;p&gt;It runs in the calling agent's own context — no sub-agent fan-out — so it stays cheap enough to run on every change. When one pass genuinely isn't enough (a release gate, a high-risk change), it tells you, in its own output, to escalate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/dualform-labs/review-audit.git
&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; review-audit/skills/review-audit ~/.claude/skills/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then give Claude Code a change that &lt;em&gt;looks&lt;/em&gt; fine but has an unverified path — say, a function that's defined but never called — and run &lt;code&gt;/review-audit&lt;/code&gt;. Watch it flag the wiring, list each axis as audited / partial / not-audited, and refuse to PASS what it didn't verify.&lt;/p&gt;

&lt;p&gt;One prompt file, no dependencies, Apache-2.0. No network calls, no telemetry, no bypass-permissions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/dualform-labs/review-audit" rel="noopener noreferrer"&gt;https://github.com/dualform-labs/review-audit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;More: &lt;a href="https://dualformai.com/review-audit" rel="noopener noreferrer"&gt;https://dualformai.com/review-audit&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companion skill: &lt;a href="https://github.com/dualform-labs/spec-skill" rel="noopener noreferrer"&gt;spec&lt;/a&gt; — decide the build before the agent writes code.&lt;/p&gt;

&lt;p&gt;— a dualform project&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Two tiny Claude Code skills that fixed my two biggest agent problems</title>
      <dc:creator>dualform</dc:creator>
      <pubDate>Sun, 14 Jun 2026 03:57:16 +0000</pubDate>
      <link>https://dev.to/dualform/two-tiny-claude-code-skills-that-fixed-my-two-biggest-agent-problems-13fh</link>
      <guid>https://dev.to/dualform/two-tiny-claude-code-skills-that-fixed-my-two-biggest-agent-problems-13fh</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fppfgr262fd2gjcndv3vn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fppfgr262fd2gjcndv3vn.png" alt=" " width="799" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Two open-source skills for Claude Code. Each is a single prompt file, Apache-2.0, no dependencies. Repos at the bottom.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Working with a coding agent, I kept hitting the same two failure modes. Not "the model can't write code" — it writes code fine. The failures were upstream and downstream of the code: &lt;strong&gt;the agent guessing on an ambiguous task&lt;/strong&gt;, and &lt;strong&gt;me trusting a review that hadn't actually checked anything.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I built one small skill for each. Here's what they do and why they're shaped the way they are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem 1: agents guess, then you redo the work
&lt;/h2&gt;

&lt;p&gt;Hand a vague task to an agent and you watch the same thing happen. It guesses. It drifts. It quietly makes a call you'd have made differently — and you find out after the code is written, when changing your mind is expensive. The cost isn't the typing. It's the rework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;spec&lt;/code&gt;&lt;/strong&gt; moves the decisions to the front. You run &lt;code&gt;/spec &amp;lt;one-line idea&amp;gt;&lt;/code&gt;, it reads your repo and the conversation, then asks only what it &lt;em&gt;couldn't&lt;/em&gt; infer — a short batch of multiple-choice questions, each with a recommended default you can reject in a tap. The answers go into one spec file with a fixed 13-section template. Then it builds straight through, pausing only when a choice is genuinely yours to make.&lt;/p&gt;

&lt;p&gt;Two things keep it honest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A self-check before you see anything.&lt;/strong&gt; The draft is handed to a fresh-context sub-agent with &lt;em&gt;only&lt;/em&gt; the spec file, asked "could you build this as-is?" If something's still ambiguous, that section gets fixed before you're shown the spec. (No sub-agents available? It falls back to a self-audit that has to quote a grounding line per section.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-Potemkin completion.&lt;/strong&gt; Every acceptance criterion must be an executable command or a numbered visual check — not "file exists" or "imports OK." A feature is &lt;code&gt;done&lt;/code&gt; only after it ran on a real case and the output was shown.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole skill is one prompt file (&lt;code&gt;SKILL.md&lt;/code&gt;). No build, no dependencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem 2: "review this" returns a PASS that checked nothing
&lt;/h2&gt;

&lt;p&gt;Ask an AI to "review this change" and you can get a confident, plausible &lt;strong&gt;PASS&lt;/strong&gt; — that skipped half the checks, cited no evidence, and never ran the tests. A green light you can't trust is worse than no review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;review-audit&lt;/code&gt;&lt;/strong&gt; is a read-only, single-pass audit over your change across six axes: correctness, wiring (built-but-never-called / dead code), security, test efficacy, spec compliance, and regression. The discipline is simple and strict:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An axis is marked &lt;strong&gt;audited&lt;/strong&gt; only when the report shows concrete &lt;code&gt;file:line&lt;/code&gt; + grep/run evidence. "I didn't check this" is a first-class output, not a silent gap.&lt;/li&gt;
&lt;li&gt;An unexamined axis &lt;strong&gt;cannot&lt;/strong&gt; be part of a PASS.&lt;/li&gt;
&lt;li&gt;It's read-only: it proposes fixes but doesn't apply them, and a before/after checksum confirms your tree wasn't touched.&lt;/li&gt;
&lt;li&gt;Regression needs a real run with an exit code; wiring needs concrete grep / &lt;code&gt;file:line&lt;/code&gt;. "Looks fine" isn't evidence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It runs in the calling agent's own context — no sub-agent fan-out — so it stays cheap enough to run on every change. When one pass genuinely isn't enough (a release gate, a high-risk change), it tells you, in its own output, to escalate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;Both are one prompt file each.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# spec&lt;/span&gt;
git clone https://github.com/dualform-labs/spec-skill.git
&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; spec-skill/skills/spec ~/.claude/skills/

&lt;span class="c"&gt;# review-audit&lt;/span&gt;
git clone https://github.com/dualform-labs/review-audit.git
&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; review-audit/skills/review-audit ~/.claude/skills/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in Claude Code: &lt;code&gt;/spec a menu-bar app that warns me when my Mac is thermally throttled&lt;/code&gt;, or &lt;code&gt;/review-audit&lt;/code&gt; on a change before you call it done. Output language is &lt;code&gt;auto&lt;/code&gt; / &lt;code&gt;ja&lt;/code&gt; / &lt;code&gt;en&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;No network calls (Claude Code only), no telemetry, no bypass-permissions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest notes
&lt;/h2&gt;

&lt;p&gt;These are prompt-file skills, not magic. Single-pass detection in &lt;code&gt;review-audit&lt;/code&gt; is model-dependent; if you need per-run proof of detection power or fresh-context adversarial verification, that's the heavier &lt;code&gt;review-audit-pro&lt;/code&gt; tier (coming soon). And &lt;code&gt;spec&lt;/code&gt; won't make a bad idea good — it just makes the decisions explicit before code is written, where they're cheap to change.&lt;/p&gt;

&lt;p&gt;If you try them, I'd genuinely like to hear where they break or annoy you.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;spec: &lt;a href="https://github.com/dualform-labs/spec-skill" rel="noopener noreferrer"&gt;https://github.com/dualform-labs/spec-skill&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;review-audit: &lt;a href="https://github.com/dualform-labs/review-audit" rel="noopener noreferrer"&gt;https://github.com/dualform-labs/review-audit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;dualformai.com/spec-skill · dualformai.com/review-audit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;— a dualform project&lt;/p&gt;

</description>
      <category>agents</category>
      <category>claude</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
