Two tiny Claude Code skills that fixed my two biggest agent problems

#agents #claude #productivity #showdev

Two open-source skills for Claude Code. Each is a single prompt file, Apache-2.0, no dependencies. Repos at the bottom.

Working with a coding agent, I kept hitting the same two failure modes. Not "the model can't write code" — it writes code fine. The failures were upstream and downstream of the code: the agent guessing on an ambiguous task, and me trusting a review that hadn't actually checked anything.

So I built one small skill for each. Here's what they do and why they're shaped the way they are.

Problem 1: agents guess, then you redo the work

Hand a vague task to an agent and you watch the same thing happen. It guesses. It drifts. It quietly makes a call you'd have made differently — and you find out after the code is written, when changing your mind is expensive. The cost isn't the typing. It's the rework.

spec moves the decisions to the front. You run /spec <one-line idea>, it reads your repo and the conversation, then asks only what it couldn't infer — a short batch of multiple-choice questions, each with a recommended default you can reject in a tap. The answers go into one spec file with a fixed 13-section template. Then it builds straight through, pausing only when a choice is genuinely yours to make.

Two things keep it honest:

A self-check before you see anything. The draft is handed to a fresh-context sub-agent with only the spec file, asked "could you build this as-is?" If something's still ambiguous, that section gets fixed before you're shown the spec. (No sub-agents available? It falls back to a self-audit that has to quote a grounding line per section.)
Anti-Potemkin completion. Every acceptance criterion must be an executable command or a numbered visual check — not "file exists" or "imports OK." A feature is done only after it ran on a real case and the output was shown.

The whole skill is one prompt file (SKILL.md). No build, no dependencies.

Problem 2: "review this" returns a PASS that checked nothing

Ask an AI to "review this change" and you can get a confident, plausible PASS — that skipped half the checks, cited no evidence, and never ran the tests. A green light you can't trust is worse than no review.

review-audit is a read-only, single-pass audit over your change across six axes: correctness, wiring (built-but-never-called / dead code), security, test efficacy, spec compliance, and regression. The discipline is simple and strict:

An axis is marked audited only when the report shows concrete file:line + grep/run evidence. "I didn't check this" is a first-class output, not a silent gap.
An unexamined axis cannot be part of a PASS.
It's read-only: it proposes fixes but doesn't apply them, and a before/after checksum confirms your tree wasn't touched.
Regression needs a real run with an exit code; wiring needs concrete grep / file:line. "Looks fine" isn't evidence.

It runs in the calling agent's own context — no sub-agent fan-out — so it stays cheap enough to run on every change. When one pass genuinely isn't enough (a release gate, a high-risk change), it tells you, in its own output, to escalate.

Install

Both are one prompt file each.

# spec
git clone https://github.com/dualform-labs/spec-skill.git
cp -r spec-skill/skills/spec ~/.claude/skills/

# review-audit
git clone https://github.com/dualform-labs/review-audit.git
cp -r review-audit/skills/review-audit ~/.claude/skills/

Then in Claude Code: /spec a menu-bar app that warns me when my Mac is thermally throttled, or /review-audit on a change before you call it done. Output language is auto / ja / en.

No network calls (Claude Code only), no telemetry, no bypass-permissions.

Honest notes

These are prompt-file skills, not magic. Single-pass detection in review-audit is model-dependent; if you need per-run proof of detection power or fresh-context adversarial verification, that's the heavier review-audit-pro tier (coming soon). And spec won't make a bad idea good — it just makes the decisions explicit before code is written, where they're cheap to change.

If you try them, I'd genuinely like to hear where they break or annoy you.

spec: https://github.com/dualform-labs/spec-skill
review-audit: https://github.com/dualform-labs/review-audit
dualformai.com/spec-skill · dualformai.com/review-audit

— a dualform project

Top comments (1)

Mehmet Can Farsak • Jun 14

Really like the spec-skill approach — moving decisions upfront before code is written is such an underrated pattern. I see the same problem but at an even earlier stage: when you ask an agent to brainstorm or explore ideas, it defaults to implementation mode and starts coding instead of ideating.

I built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) that tackles this with PreToolUse hooks — it keeps agents in ideation mode by blocking premature tool calls. Three modes (divergent, actionable, academic) let you pick the right headspace. Feels like a natural companion to your spec-first workflow — brainstorm properly first, then spec, then implement.