I’ve been using AI to code for some time now. Copilot, Claude Code, Codex, Pi . I’ve shipped code with them. I’ve also spent more time than I’d like to admit fixing the things they confidently produced.
A few months ago I started seeing a pattern I couldn’t ignore. The bugs weren’t random. They were the same bugs, over and over, across different projects. AI would write code that looked right. It would compile. The obvious test case would pass. And then it would fail on the edge case nobody asked about.
I thought it was my prompting. So I got better at prompting. The bugs got more subtle, not fewer.
Then I created my workflow where I focused on human gates , planning and review. If you want to check that article out here is the link
Then I came across spec-driven development SDD and it was a obvious update for my workflow so I built my own workflow around it in Claude Code. I’m going to walk you through what I built and what I learned. I’m still figuring it out. This is not a “here’s the answer” post. This is a “here’s where I am” post.
🍥The Experiment That Made Me Care
A few weeks ago I did an experiment I built a refund endpoint for a Spring Boot project. Standard stuff. I thought to add a partial refund feature to compare vibe coding and SDD.
I prompted Claude Code with a reasonable description. Out came code that compiled and ran. I tested a single refund. It worked.
Then I ran two refund requests at the same time on the same order. Both succeeded. The order’s total was $100. The total refunded came out to $150.
The customer would have walked away with an extra $50.
The code wasn’t broken in the usual sense. It had a check. It compared the refund amount to the order total. The check was just wrong under concurrent load a classic race condition. The AI didn’t know to handle concurrency because I didn’t tell it to handle concurrency. I didn’t tell it because I used it as a search engine not a pair programmer . And I wasn’t thinking about it because the prompt and go workflow doesn’t make you think about it.
That’s the moment I stopped blaming prompts.
✨What I Actually Realized
If you see SDD approach, you are not just writing a prompt. When you start building a feature, you should know in your head how you will build the feature.
That’s how we did coding previously, right?
You get a feature. You create a design. You put it somewhere Obsidian, Notion, a notepad, your notebook, whatever. You draw the full picture. You know in your head what code you’ll add, what design pattern you’ll use, what the edge cases are. Once you know everything, you start coding.
That’s what spec-driven development is asking you to do with AI.
You’re clarifying everything to the agent. You’re not the audience watching it work. You’re the one who is driving it. You should know how this feature should be built, what changes should be done, what the requirements are, what the design changes in the code will be.
The AI is incredibly capable at translating clear specs into working code. It’s bad at extracting intent from vague prompts. Once you accept that, the whole approach inverts.
🔍SDD Is Not New (And That’s the Point)
I want to be honest here because I see a lot of takes painting spec-driven development as some breakthrough AI-era methodology. It’s not.
CORBA’s IDL files in the nineties pioneered spec generates code for interfaces. Protocol Buffers carried that pattern forward in 2001. Test-Driven Development from the late nineties established “write the contract first.” Behavior-Driven Development in 2006 made specs readable in plain English.
SDD takes those ideas and applies them to entire features, scaled up by LLMs. One researcher named Bryan Finster put it bluntly in a January 2026 paper: “ SDD is not a revolution. It’s just BDD with branding. ”
He’s mostly right. The branding does matter, because it reminds practitioners that specs should be authoritative, not advisory.
The reason this works now and didn’t work in the 2000s with UML codegen is that natural language plus LLMs can bridge the gap that diagrams and codegen compilers never could. We’re not inventing the methodology. We’re finally making it viable.
Spec Kit and Kiro
There are two real tools getting attention right now.
GitHub’s Spec Kit , open-sourced September 2025. It’s a CLI that installs slash commands and templates into your existing project. You run it, and your AI agent (Claude Code, Copilot, Cursor, over thirty of them) gets a structured spec-driven workflow. Good if you want to get started fast.
AWS’s Kiro. A full IDE built on Code OSS, with spec-driven development as a first-class primitive. Three documents per feature requirements, design, tasks with human approval gates between each. Good if you want the IDE experience.
Both are solid. Honestly. If you want to try SDD today, install one of them.
But if you’re a developer reading this, I assume you have your own way of working. Even in your career, you’ll try different workflows to find what suits your style. Instead of adopting someone else’s structure, you can create your own something that complements how you think.
That’s what I did.
🧰What I Built
I built a 14-phase workflow inside Claude Code using its native primitives — subagents, slash commands, hooks, and a status tracking file. No external tools. No third-party install. Just Claude Code’s own capabilities, composed deliberately.
Eleven Claude subagents, each with one job:
- A repo-init agent that reads my codebase and writes a project.md. Tech stack, build commands, test commands, conventions. Every later agent reads this.
- An issue-fetch agent that pulls a ticket from GitHub and creates a working folder.
- Requirements clarification agents is for business questions, They ask me questions in batches till it is clear with requirements.
- A requirements agent that drafts a requirements.md from the answers which I review and give review comments it will resolve and present me file until I am satisfied with the requirements.
- Technical Design clarification agents is for technical questions, They ask me questions in batches till it is clear with design.
- A technical design agent that drafts a design.md which I review and give review comments, psuedo code to help it make perfect design as i like and present me file until I am satisfied with the design.
- A task planner that breaks the design into ordered tasks with explicit test-first requirements.
- A TDD implementation agent that runs one task at a time red, green, refactor logging every test command and result to a traceability.md file.
- A review agent that audits the diff against requirements, design, tests, conventions, security, and maintainability.
- A review resolution agent that fixes the issues the human accepts or what Human gave as a custom review finding as human should review all the code at this place too as even with all this there are mistakes in implamentation which as a dev you are responsible for and should be minimum.
- A Human resolution review agent will show what is resolved to the human ask for final approval this will be the gate where you review the diff again if you have time for extra caution.
- Final Summary will be created once approved by human and will upate final summary.md file.
- A PR agent that drafts the commit message and pull request body.
Above all of them sits an orchestrator a thirteenth file that parses my plain-English messages and routes them to the right subagent. I never type slash commands during the workflow. I just say “approve requirements” or “accept findings 1 and 2, reject 3 and all also add this finding” or “raise PR” and the orchestrator handles it.
I’m not claiming this is the best structure. It’s mine. It fits how I work. Yours would look different and should.
🏁What It Caught On the Refund Bug
I ran the same refund task through this workflow.
The very first clarification agent before any code
None of these were in the ticket description. None would have been caught by vibe coding. Every single one would become a bug if missed.
That’s the whole thing. The clarification phase is where the bugs that ship in production get caught before any code exists.
The agent didn’t catch the race condition. It caught it because the workflow forced me to think about concurrency before I let the AI write anything.
I’m the one who solved the bug. The workflow at least just made sure I didn’t skip the question.
🔍The Results with Spec driven Development Flow
I again ran two refund requests at the same time on the same order. This time one succeeded one did not as expected. The order’s total was $100. The total refunded came out to $90.
✨The Honest Trade-offs
This workflow has overhead. Real overhead.
A vibe-coded refund endpoint takes me maybe ten minutes. The spec-driven version through this workflow takes closer to forty minutes clarification rounds, requirements review, design clarification, design review, then the actual TDD implementation.
For a one-off script? Not worth it. Friday afternoon prototype? Vibe code it. Exploring something where you don’t know what you want yet? Vibe code it.
But for code that handles money, code that lives in production, code that other people will read and maintain the forty minutes upfront saves multiples on rework, debugging, and shipped bugs. The tests aren’t an afterthought. The design doc isn’t fiction. The next person reading the code including future me has the requirements, the design, the tasks, and the full test history sitting right there in the repo.
I’m not telling you to use my workflow. I’m telling you to think about which work in your life deserves which approach.
🍥Why I Wrote This
This post isn’t a tutorial. It’s me sharing where I am.
I’ve been hearing a lot of “vibe coding is dead” and “spec-driven is the future” takes lately. I think both are slightly wrong. Vibe coding is the right tool for some work. Spec-driven is the right tool for other work. The skill is knowing which is which.
I’m continuously improving my workflow. If you’ve built something similar, if you think mine can be improved somewhere please tell me. That’s the whole point of writing this in public.


























Top comments (0)