Spec-Driven Development Renamed an Old Problem. It Didn't Solve It

#ai #discuss #opensource #productivity

TL;DR: Spec Kit, OpenSpec, BMAD, Kiro — all of it is built on the assumption that a spec can stay the source of truth. It can't, for the same reason design docs and wikis never stayed current either. I think the interesting unsolved problem in this space isn't "more rigorous specs," it's "specs that don't require a human to remember to update them." Curious if people running these in production agree.

The pitch everyone's making right now

Spec-driven development has become the default answer to "AI agents write plausible-looking code that's subtly wrong." The idea: stop prompting, write a structured spec, let the agent execute against it, review the diff. GitHub Spec Kit has 90K+ stars and /speckit.specify → /speckit.plan → /speckit.tasks → /speckit.implement is basically a known workflow at this point. OpenSpec does the same thing lighter, with delta specs and openspec validate --strict. BMAD goes the other direction and simulates a full 12-agent team. AWS built an entire IDE (Kiro) around the idea. Tessl raised $125M on it.

It's not hype for no reason — it works better than raw prompting for a lot of real cases. But I've been going deep on this for the last few weeks, reading every practitioner write-up I can find instead of just the vendor pages, and one problem keeps showing up that none of these tools actually solve.

The problem: keeping the spec true is the same problem we've had since READMEs

We've tried to keep documentation in sync with code for 20+ years — design docs, architecture wikis, READMEs. It's never worked, for a boring reason: editing the code is fast and gets you something shipped, editing the doc is slow and gets you nothing visible. When a team is moving, the doc loses every time.

SDD just gives the same artifact a new name and calls it "the source of truth." But the asymmetry is identical. You're three tasks into a spec, the agent hits a constraint nobody documented, fixes it inline, ships it — and the spec still describes the old plan. Multiple people running this in production describe specs drifting within days, not months. None of the major frameworks have a real answer for this beyond "discipline" — which is exactly the thing that's failed every previous time we asked for it.

Even the premise has a real critique

Kent Beck's objection (picked up by Martin Fowler in January) is worth sitting with: most SDD write-ups assume you write the whole spec before implementation starts, which quietly assumes you won't learn anything during implementation that should change the spec. That's a strange assumption to build a methodology on — it's the opposite of the feedback-loop principle XP was built around. Fowler's point was that AI should speed up the feedback loop, not become an excuse to front-load more upfront documentation. ThoughtWorks' Radar flagged basically the same risk: heavy upfront specs trending toward big-bang releases, which is the exact failure mode agile exists to prevent.

A spec can pass every check and still be wrong

One account that stuck with me: a dev ran a real multi-month project fully spec-driven. Thorough specs, the agent built exactly what was specified, every acceptance criterion passed. The system was still wrong, because the spec couldn't capture the tacit, runtime, cross-system knowledge that only shows up once you're actually building. He changed one infra decision mid-project and the entire downstream spec graph broke, because everything after that point had silently inherited an assumption that no longer held.

And independently — Scott Logic ran a real SDD workflow against a real project and got: ~10x slower, more ceremony, same number of bugs as their normal process. I don't think that means SDD is useless. I think it means the rigor itself isn't the bottleneck people assume it is.

What I think is actually underexplored

Not another framework asking teams to write more documents more carefully. Something closer to: infer and update the spec from what the code and the running system actually do, the same way you'd treat generated docs or a lockfile — maintained automatically, not maintained by someone remembering to do it.

I don't have a working version of this, and I'm not sure it's even tractable at the granularity you'd need for it to be useful (anyone who's tried to auto-generate accurate architecture diagrams from a real codebase knows how that usually goes). I'm putting this out mostly because I'd rather find out it's a dead end from people who've actually hit this wall than after building something.

Genuinely curious, if you're running any of these in prod:

Does your spec survive past week 2-3, or does it quietly become aspirational?
When it drifts, what's the actual trigger — a missed edge case, a changed dependency, a scope call nobody backported into the spec?
Has anyone tried generating/updating specs from code or telemetry instead of writing them forward? Did it work, or did it just produce noise?
If you've used openspec validate --strict or Spec Kit's constitution model in anger — where did it actually hold up, and where did it just add ceremony?

Not pitching anything here, genuinely trying to figure out if this is a real gap or if I'm missing a tool/pattern that already handles it.