Rost

Posted on Jul 4 • Originally published at glukhov.org

What Is Spec-Driven Development? The Spec as Source of Truth

#aicoding #architecture #documentation #llm

Spec-Driven Development is one of those ideas that software engineers have reached for before
and then set aside when the effort stopped paying.

What changed in 2025 is that AI coding
agents arrived and made the absence of explicit intent expensive. Prompts are ephemeral.
Agent sessions reset. Code changes but the reasoning behind it disappears. The spec is the
artifact that stops that from happening.

The Spec Is Becoming the Source of Truth

For most of software development history, the spec was either a temporary planning artifact
or an afterthought. Requirements lived in tickets, design decisions lived in chat threads,
and the code was the ground truth. Documentation described what existed after the fact.

Spec-Driven Development inverts that relationship. The specification becomes the primary
artifact. Code is what gets generated or verified against the spec, not the other way around.

This is not a new idea. Formal methods, design-by-contract, and BDD all contain versions of
it. What is new is the practical motivation: AI coding agents need explicit, durable context
to produce correct and consistent output. Prompts are too ephemeral. The spec is the only
artifact that can carry intent across agent sessions, across team members, and across time.

What Spec-Driven Development Actually Means

Spec-Driven Development, usually shortened to SDD, is a workflow where a versioned
specification guides or generates implementation. The specification is written and reviewed
before the agent writes code. It captures:

What to build -- user problem, goals, and non-goals
What correct behavior looks like -- acceptance criteria, edge cases, error states
How to build it -- architecture decisions, data model, API contracts, security constraints
How to verify it -- test strategy, validation rules, traceability back to requirements

The spec is not a one-time document. It is updated when reality differs from the design.
When the agent discovers something during implementation that the spec got wrong, the spec
is corrected before continuing. The spec stays honest because it is treated like code.

Recent academic work formalizes this framing: researchers describe SDD as treating
specifications as the source of truth and code as generated or verified against them. The
practical interpretation is that the spec is the reviewed, durable record of intent that
any human or AI tool can read and trust.

Three terms capture different points on the spec-use spectrum:

Spec-first means writing the full specification before any implementation begins. This
is the strictest interpretation and the one closest to waterfall if not done carefully.

Spec-anchored means keeping a specification in sync with implementation throughout the
feature lifecycle. The spec is updated as decisions change. This is the most practical
version for most teams.

Spec-as-source means generating or validating implementation from the spec, either
through AI agents or through tooling that checks code against spec constraints. This is the
direction tools like GitHub Spec Kit and Kiro are moving toward.

Why SDD Matters Now

The honest answer is that SDD is not compelling for a solo developer building a one-day
script. The overhead is not worth it.

SDD becomes valuable when three conditions are present: the feature is large enough to span
multiple sessions, the agent needs to make decisions that affect the architecture, and the
work will be reviewed or continued by someone else.

All three conditions are increasingly common with AI-assisted development.

LLMs need context, not just prompts. A model that receives a vague prompt makes vague
decisions. A model that receives a reviewed specification with explicit constraints, non-goals,
and acceptance criteria makes better decisions and is easier to course-correct when it drifts.
This connects to how retrieval and representation work: giving an agent a versioned spec is a form of structured retrieval of project intent.

Code generation is cheap; deciding what to build is still hard. The bottleneck in AI-assisted
development is no longer typing -- it is knowing what to build and how to constrain the
agent. SDD shifts the effort to where it matters: specifying intent clearly before generation
begins.

Prompts are ephemeral. The agent does not remember what you told it in the last session.
A versioned specification stored in the repository does. Every new session can read the same
spec and implement against the same intent without re-establishing context from scratch.

Vibe coding works until it does not. For prototypes and throwaway work, ad-hoc prompting is
faster and appropriate. As soon as a feature requires security constraints, multi-file
architecture decisions, or a team handoff, the absence of a spec becomes the main source of
drift and defects. See the comparison in SDD vs Vibe Coding for when each approach applies.

Core Artifacts

A practical SDD workflow produces four main artifacts. Each one reduces a specific kind of
ambiguity before it reaches the agent.

Requirements specification. The problem statement, the users affected, the goals, the
explicit non-goals, and the acceptance criteria. Non-goals are as important as goals -- they
tell the agent what not to build and prevent scope creep. Acceptance criteria are precise
enough that each one maps to at least one test.

Design specification. The architectural decisions relevant to this feature: affected
modules, data model changes, API contracts, migrations, security constraints, observability
requirements, and known failure modes. This is not a full system design document -- it is the
subset of architecture decisions needed to implement this feature correctly.

Task plan. A sequence of small implementation tasks, each with explicit dependencies,
expected file changes, and validation criteria. Tasks are small enough to implement in a
single agent session and verify with a focused diff. Each task has a human review checkpoint.

Traceability record. A mapping from acceptance criteria to tests, from design decisions
to affected files, and from tasks to commits. This is what separates SDD from documentation
that becomes stale: traceability makes it possible to verify that the spec was actually
implemented, not just written.

These artifacts do not have to be heavyweight. A simple feature might produce a single
two-page markdown document covering all four areas. The format matters less than the habit
of writing intent down before implementation begins.

How SDD Differs from Documentation

The most common confusion is treating SDD artifacts as documentation. They are not
documentation in the conventional sense.

Documentation describes. It tells you what the system does, how to use it, and what
it contains. It is written after the fact and updated when the system changes.

Specs constrain. A spec tells the agent what it is allowed to build and what it is
not allowed to do. It is authoritative before implementation begins. It is validated after
implementation completes. A spec that describes what was actually built -- rather than
constraining what should be built -- has already failed its purpose.

Executable specs guide generation and validation. The best SDD specs are close enough
to machine-readable that an agent can implement against them and a test suite can verify
them. Acceptance criteria written as "the endpoint must reject unauthenticated requests with
a 401 response" is an executable spec; "the endpoint is secure" is documentation.

Decision Records -- ADRs, PDRs, and DDRs -- are complementary to SDD artifacts but serve a different
purpose. Decision records capture why a choice was made and what was rejected. SDD
specifications capture what to build and how to verify it. Both belong in the repository.
Together they give AI agents the full picture: the current intent and the reasoning behind it.

How SDD Differs from TDD

Test-Driven Development and Spec-Driven Development are often confused because both produce
explicit artifacts before code exists. The difference is the starting point.

TDD starts with tests. You write a failing test that describes the behavior you want,
then write the minimum code to make it pass. TDD is a feedback loop at the unit level. It
produces good tests but does not answer the question of whether you are building the right
thing.

SDD starts with intent. Before tests exist, before the architecture is decided, the
spec answers: who has this problem, what does correct behavior look like, what is explicitly
out of scope. The spec then informs what tests to write, which is why good SDD and good TDD
are complementary rather than competing.

A practical way to think about it: SDD drives TDD. The acceptance criteria in the spec
become the test scenarios. The design spec identifies the integration boundaries that need
contract tests. The task plan identifies which unit behaviors need test coverage before the
agent implements them.

How SDD Differs from BDD

Behavior-Driven Development uses natural-language scenarios -- typically in Gherkin format --
to describe expected behavior from the user's perspective. These scenarios bridge the gap
between business intent and technical implementation.

SDD is broader. It includes behavior descriptions (which can use BDD-style language or plain
prose) but also covers architecture decisions, data models, security constraints, task
planning, and traceability. BDD can be a useful format for writing acceptance criteria inside
an SDD requirements spec. The spec is the container; BDD scenarios are one way to write what
goes inside it.

The distinction matters in practice: BDD tooling focuses on making scenarios executable.
SDD practice focuses on making intent durable -- across tools, across sessions, and across
team members.

How SDD Differs from Formal Methods

Formal methods use mathematical notation and automated verification to prove properties of
software systems. They are extremely rigorous and extremely expensive for most production
development contexts.

SDD does not require formal notation. A markdown file with acceptance criteria and architecture
decisions is a specification. It constrains without being mathematically formal. The level of
rigor scales with the stakes: a spec for a billing service should be more precise and more
carefully reviewed than a spec for a documentation page.

The relationship is a spectrum:

Informal prose spec (minimum viable SDD)
Structured markdown with acceptance criteria and non-goals
Machine-readable spec with schema validation
Contract tests derived directly from the spec
Formal spec with automated proof

Most teams operate in the middle of that spectrum. The goal is not mathematical rigor -- it
is making intent explicit enough that an AI agent can implement against it and a human
reviewer can verify the result.

Benefits of Spec-Driven Development

Less intent drift. The spec is the reference. When the agent drifts -- and it will -- the
reviewer has something to compare the implementation against. Without a spec, drift is
invisible until something breaks.

Better AI outputs. Agents given explicit constraints, non-goals, and acceptance criteria
produce implementations that are closer to what was intended and easier to correct when they
miss. Context quality directly determines output quality.

Easier review. A pull request attached to a spec is easier to review than a pull request
that requires the reviewer to reconstruct the intent from the code. The spec is the review
checklist.

Team alignment. When multiple people or agents are working on the same feature, the spec
is the shared contract. Without it, each contributor optimizes locally and the pieces may
not fit.

Better test planning. Acceptance criteria in the spec map directly to test cases. Test
coverage becomes a spec coverage question: is every acceptance criterion covered by at least
one test?

Durable handoff. When a feature changes hands -- between engineers, between agent sessions,
between sprints -- the spec is the handoff artifact. It captures what was decided, what was
out of scope, and what remains to be validated.

Costs of Spec-Driven Development

Upfront effort. Writing a good spec before writing any code takes time. For small
features, this overhead is real and sometimes not worth it.

False confidence. A spec that exists but is not validated against the implementation
gives a false sense of correctness. Stale specs are sometimes worse than no spec: they
mislead reviewers and agents that read them.

Stale specs. Specs drift when the team treats them as planning artifacts rather than
living documents. Updating the spec when implementation differs from design is not optional
-- it is what separates SDD from documentation that accumulates and rots.

Generated bureaucracy. AI agents can generate exhaustive task lists and verbose
specifications quickly. A 200-task spec generated in thirty seconds is not a useful spec --
it is a bureaucracy generator. Good SDD requires judgment about what to specify and what to
leave implicit.

Tool lock-in. Some SDD tools are opinionated about format, file structure, and workflow.
A spec written in a proprietary format is harder to carry across tools than a markdown file
with clear headers and acceptance criteria.

Conclusion

Spec-Driven Development is not a new methodology. It is an old discipline becoming practical
again because the cost of implicit intent is now visible in AI-generated code.

The discipline is simple: write down what you intend to build, reviewed and versioned, before
the agent builds it. Keep that record honest by updating it when reality differs. Use it as
the reference for review, testing, and handoff.

The spec is not magic. A spec that is not validated becomes the most expensive kind of
documentation: one that misleads confidently. Good SDD is the practice of keeping specs
honest -- small enough to maintain, precise enough to constrain, and durable enough to
outlast any single agent session.

Useful Links

Decision Records for AI-Driven Software Development -- ADRs, PDRs, and DDRs that complement SDD specs by capturing why decisions were made
Spec-Driven Development vs Vibe Coding: Waterfall? -- when to add specs and when to keep prompting freely
What is Vibe Coding -- Meaning, Tools, Benefits, and Risks -- the vibe coding cluster pillar
App Architecture in Production -- the cluster home for architecture, documentation, testing, and integration patterns
Unit Testing in Go: Structure and Best Practices -- turning SDD acceptance criteria into executable tests
Unit Testing in Python: Complete Guide -- test-writing practices that map to SDD acceptance criteria
Python Design Patterns for Clean Architecture -- code structure practices that SDD helps preserve
Retrieval vs Representation in Knowledge Management -- how explicit specs relate to AI context and retrieval
GitHub Spec Kit documentation -- a portable open-source SDD toolkit
Martin Fowler on Spec-Driven Development tools -- careful analysis of Kiro, Spec Kit, and Tessl

DEV Community