If you use more than one AI coding tool — Claude Code, Cursor, Copilot, Windsurf — you've probably hit this:
You ask one to build a feature. It does something reasonable. You ask another to extend it. It contradicts the first. You ask a third to clean up. Now you have three different interpretations of what the system should do.
This isn't a bug in any of the tools. It's a missing source of truth.
What I built
A Claude skill called spec-driven-development that generates three files before any code is written:
requirements.md — what the system must do (REQ-xxx IDs, acceptance criteria)
design.md — how it will be built (data models, endpoints, file structure)
tasks.md — atomic ordered steps, each linked to a requirement
Then it generates matching AI config files for every tool you use:
CLAUDE.md ← Claude Code reads this automatically
.cursorrules ← Cursor
.windsurfrules ← Windsurf
.github/copilot-instructions.md ← GitHub Copilot
.aider.conf.yml ← Aider
Each config file contains the same Universal Instruction Block — identical constraint rules pointing every agent at the same spec files. They can't drift because they all defer to the same authority.
The session continuity problem
There's a fourth file: CONTEXT.md. It's a session journal. When your context window fills and you start a fresh Claude Code session, Claude reads CONTEXT.md first and announces:
"Session 4 resuming. Last session we completed TASK-005 (JWT middleware). Active task is TASK-007 — POST /tasks implementation. Ready to continue."
No re-explaining. No lost context. Just continuation.
It works for existing codebases too
If you already have code but no specs, the retrofit workflow reverse-engineers them from what you describe. Fields that weren't explicitly confirmed get marked [TO VERIFY]. The first phase of tasks.md is always "Spec Verification" — tasks that confirm the spec actually matches the live code before any new work starts.
How I validated it
I didn't just ship it and hope. I built a proper test suite:
Phase 2A — Static assertions (67 checks)
A Python script that checks SKILL.md and reference files for structural correctness. Runs in GitHub Actions CI on every push.
Phase 2B — Behavioral tests (15 prompts)
Run in a live Claude Code session. For each prompt, Claude simulates a full response before looking at the assertions — blind evaluation. Tests include "continue where we left off" (CONTEXT.md present) and "what are we working on?" (CONTEXT.md absent).
Phase 2C — Generation quality (53 checks)
Three full end-to-end flows: greenfield project, retrofit codebase, cross-AI configuration. Claude Code generates real files, a Python checker validates every file. These run in CI against committed fixtures.
Total: 135 assertions. All passing. CI is green.
The test suite ships with the skill. Every future change must pass before merging.
What I still need
The 135 assertions were written by me, so they test what I anticipated. What they don't test: a stranger saying "help me get organised" or "scaffold me a project" — phrasing I didn't think of.
That's the beta.
I'm looking for 5 testers:
| Profile | What you'll do |
|---|---|
| Developer starting a new project | Use the skill to spec it from scratch |
| Solo dev with an active side project | Retrofit or greenfield — whatever fits |
| Team lead, multiple AI tools on the team | Generate cross-AI configs for your project |
| Existing codebase, no specs | Retrofit your system into SDD |
| Power user of 3+ AI tools | Configure all of them, compare consistency |
All you do is use it naturally on your real work and file GitHub Issues when something doesn't work. One issue per problem. Include the exact phrase you used — that's the most valuable data.
Repo (MIT): https://github.com/FredAntB/Spec-Driven-Development
Open an issue titled [Beta] I'd like to test and describe which profile fits you. I'll get back to you within 24 hours.
Top comments (0)