DEV Community

Cover image for Spec-driven development with AI is real now. The stale spec is the part nobody fixed.
Kunal Sharda
Kunal Sharda

Posted on

Spec-driven development with AI is real now. The stale spec is the part nobody fixed.

Spec-driven development won the argument. A year ago, writing a spec before you let an agent touch the code sounded like process for its own sake. Now GitHub's Spec Kit has tens of thousands of stars, AWS shipped Kiro around the same idea, and Claude Code, Cursor, and most of the rest have some version of write-the-spec-first baked in. The method is settled. What almost nobody talks about is the failure that shows up three weeks later, when the spec you carefully wrote no longer matches the thing you shipped, and your agent is now confidently building from a document that lies.

I came to this from years of automation work in regulated banking, where "just prompt it and see" was never going to fly past a risk team. So when the SDD tooling landed I was the easy convert. The pitch is clean. The spec is the source of truth, the code is a regenerable output, and the agent builds from intent instead of from a paragraph you pasted into a chat window. Spec Kit even formalizes it into a flow, Spec then Plan then Tasks then Implement, anchored by a "constitution" file of principles that are not supposed to change. Kiro walks you through requirements, then design, then tasks before it writes a line. Both are good. I am not here to dunk on them.

Here is where it cracked for me.

The spec is excellent at time zero. You write it, the agent builds from it, and the first-pass result is genuinely better than vibe coding. Then you ship. An edge case comes back from support, someone patches the code directly to stop the bleeding, and the spec is now quietly wrong. Nobody updates it, because updating a separate document is unrewarded work that no sprint board tracks. Two sprints later a new feature lands next to the old one, the agent reads the old spec to orient itself, and it builds on top of a description of a system that stopped existing a month ago. The model did not hallucinate. The spec did.

The fix is not a better spec format

I tried that route first. Better templates, a tidier folder, a longer constitution. It bought me about a week. The actual fix is making spec drift visible the same way a failing test is visible. A spec earns its keep only when it is tied to the thing that proves it true, which is a test, and when breaking that tie shows up somewhere people already look. That starts with writing acceptance criteria that an agent and a test runner can both read, not prose that only a human can interpret on a good day.

# checkout.feature  (lives next to the code, not in a wiki)
@story:PLAN-412 @verifies:checkout_guest.spec.ts
Feature: Guest checkout

  Scenario: Valid payment creates an order
    Given a guest with items in their cart
    When they submit a valid card
    Then an order is created
    And a receipt email is sent
Enter fullscreen mode Exit fullscreen mode

That file is the spec and the test contract at the same time. The @verifies tag points at the test that proves the scenario, and the @story tag points back at the work that requested it. Now you can add a CI step that greps every .feature for its @verifies target and fails the build when the test is missing or red. Drift stops being something you hope a reviewer notices. It turns into a broken build, which is the one signal engineers cannot ignore.

You can do all of this today with files in your repo and a few lines of CI. No vendor required. If you only take one thing from this post, take that: link the spec to a test, and make the broken link fail loudly.

Where one repo stops being enough

The repo-local version works until the rest of delivery gets involved, and it always does. The story lives in Jira or Linear. The architecture decision that made the feature risky lives in a diagram nobody has opened since kickoff. The prose spec lives in Notion. The test lives in CI. Each tool is good at its slice. Notion is a pleasant place to write the narrative spec and get a team to actually agree on it. Linear is the cleanest pure tracker I have used. Jira will bend to almost any workflow you can dream up, which is exactly why big orgs keep it. None of them hold the spec, the story, the decision, and the test as linked nodes. So the linkage that makes SDD durable is the precise thing your stack drops on the floor between tabs.

There is a second reason linkage matters now, and it is the agent itself. A spec is only as useful as what the agent can see at the moment it runs. Paste the spec into context and it works for one task, then goes stale inside the same session as the agent edits around it. What you actually want is for the agent to query the live spec, the linked story, and the current test status, so it builds from what is true today instead of what you typed last month. That is the whole "context is the model, not the prompt" idea, and it is why I stopped thinking of a spec as a document and started thinking of it as a node with edges.

I went far enough down this hole that I built a product around it, so read the next sentence with the appropriate amount of salt. Stride keeps plan, design, tests, and process on one connected graph, and runs an MCP server so Claude Code and Codex read the real stories and tests instead of a snapshot you pasted. That is my bias, said out loud. But you do not need my tool to get most of the value here. You need the spec wired to a test, and you need drift to break the build. A .feature file and one CI check get you embarrassingly far before you have to buy anything.

The part I still have not solved cleanly is the prose half of a spec. Acceptance criteria keep behavior honest because a test can check them. The why behind a decision, the tradeoff you weighed and rejected, the constraint from a compliance review, that reasoning does not reduce to a Given/When/Then, and it rots the quietest of all. So I will hand it to you, because I am genuinely trying to steal whatever is working. If you are running spec-driven development with an agent right now, how do you keep the intent and the reasoning from going stale, not just the acceptance tests?

Top comments (0)