I built a Claude skill that keeps your AI coding tools from contradicting each other — and I need beta testers

Frederick — Wed, 20 May 2026 15:02:11 +0000

If you use more than one AI coding tool — Claude Code, Cursor, Copilot, Windsurf — you've probably hit this:

You ask one to build a feature. It does something reasonable. You ask another to extend it. It contradicts the first. You ask a third to clean up. Now you have three different interpretations of what the system should do.

This isn't a bug in any of the tools. It's a missing source of truth.

What I built

A Claude skill called spec-driven-development that generates three files before any code is written:

requirements.md  — what the system must do (REQ-xxx IDs, acceptance criteria)
design.md        — how it will be built (data models, endpoints, file structure)
tasks.md         — atomic ordered steps, each linked to a requirement

Then it generates matching AI config files for every tool you use:

CLAUDE.md                          ← Claude Code reads this automatically
.cursorrules                       ← Cursor
.windsurfrules                     ← Windsurf
.github/copilot-instructions.md    ← GitHub Copilot
.aider.conf.yml                    ← Aider

Each config file contains the same Universal Instruction Block — identical constraint rules pointing every agent at the same spec files. They can't drift because they all defer to the same authority.

The session continuity problem

There's a fourth file: CONTEXT.md. It's a session journal. When your context window fills and you start a fresh Claude Code session, Claude reads CONTEXT.md first and announces:

"Session 4 resuming. Last session we completed TASK-005 (JWT middleware). Active task is TASK-007 — POST /tasks implementation. Ready to continue."

No re-explaining. No lost context. Just continuation.

It works for existing codebases too

If you already have code but no specs, the retrofit workflow reverse-engineers them from what you describe. Fields that weren't explicitly confirmed get marked [TO VERIFY]. The first phase of tasks.md is always "Spec Verification" — tasks that confirm the spec actually matches the live code before any new work starts.

How I validated it

I didn't just ship it and hope. I built a proper test suite:

Phase 2A — Static assertions (67 checks)
A Python script that checks SKILL.md and reference files for structural correctness. Runs in GitHub Actions CI on every push.

Phase 2B — Behavioral tests (15 prompts)
Run in a live Claude Code session. For each prompt, Claude simulates a full response before looking at the assertions — blind evaluation. Tests include "continue where we left off" (CONTEXT.md present) and "what are we working on?" (CONTEXT.md absent).

Phase 2C — Generation quality (53 checks)
Three full end-to-end flows: greenfield project, retrofit codebase, cross-AI configuration. Claude Code generates real files, a Python checker validates every file. These run in CI against committed fixtures.

Total: 135 assertions. All passing. CI is green.

The test suite ships with the skill. Every future change must pass before merging.

What I still need

The 135 assertions were written by me, so they test what I anticipated. What they don't test: a stranger saying "help me get organised" or "scaffold me a project" — phrasing I didn't think of.

That's the beta.

I'm looking for 5 testers:

Profile	What you'll do
Developer starting a new project	Use the skill to spec it from scratch
Solo dev with an active side project	Retrofit or greenfield — whatever fits
Team lead, multiple AI tools on the team	Generate cross-AI configs for your project
Existing codebase, no specs	Retrofit your system into SDD
Power user of 3+ AI tools	Configure all of them, compare consistency

All you do is use it naturally on your real work and file GitHub Issues when something doesn't work. One issue per problem. Include the exact phrase you used — that's the most valuable data.

Repo (MIT): https://github.com/FredAntB/Spec-Driven-Development