Juhani Ränkimies

Posted on Mar 9

Change Management When Specs Are Living Documents

#ai

If your spec is the source of truth, what happens when the truth needs to change?

This is the question most spec-first approaches handle poorly. Either the spec becomes stale (everyone ignores it after the first sprint), or updating it becomes ceremony (three documents to change before you can write a line of code). Both failure modes kill the approach.

Living specs need living change management. But the change management has to be lightweight enough that developers actually use it, and structured enough that AI can reason about what's happening.

There's a third failure mode that's less obvious: coupling work breakdown to specs. Most spec-first approaches do this -- user stories have subtasks, BDD features have scenarios that double as work items, spec documents accumulate task lists. This works for the initial build when one team writes one spec and implements it. It breaks when the spec is a living document.

A living spec gets touched by multiple changes over time. CR-001 adds acceptance criteria. CR-007 modifies one of them. CR-012 deprecates another. Each change has its own scope, its own tasks, its own timeline. If tasks live on the spec, you end up with competing task lists from different changes colliding on the same document. The spec becomes a project management artifact instead of a contract.

The fix is recognizing that specs and change management are orthogonal concerns:

The spec defines what must be true (current contract)
The change record defines what's changing and why (delta)
Tasks belong to the change, not the spec

This separation is why change records in this approach live in their own folders (changes/CR-001-slug/) with their own task breakdowns, rather than being embedded in or attached to spec documents. A spec can be touched by many changes. A change can touch many specs. Neither owns the other.

Why git history isn't enough

The instinctive response is: "We have git. Changes are tracked. What more do we need?"

Git history tracks what changed. It doesn't track:

Why the change was made
Whether the change was intentional or accidental drift
Which contract (behavioral commitment) was affected
Whether proof obligations changed
What the migration or compatibility implications are
Whether this change is part of a larger effort or standalone

When a human reads a git log, they can sometimes reconstruct this from commit messages and PR descriptions. When AI reads a git log, it gets a sequence of diffs with variable-quality commentary. It has no reliable way to distinguish "intentional contract change" from "implementation refactor" from "accidental behavior change."

Explicit change records solve this by classifying intent alongside implementation.

The change record

A change record is a lightweight document that captures what's changing and why. It lives in its own folder alongside optional supporting material:

docs/changes/
  CR-001-repository-bootstrap/
    change.md
    tasks.md          # optional: implementation breakdown
  CR-002-quality-baseline/
    change.md
    tasks.md

Each change record covers problem, motivation, scope (in/out), proposed contract delta, affected artifacts, and risks. It also classifies the change -- is this adding new behavior, modifying existing behavior, or an internal quality improvement? Is it breaking for consumers? Does it deprecate something? These classifications let both humans and AI reason quickly about impact without reading the full record.

When is a change record required?

Not every commit needs a change record. The overhead would kill adoption. The rule is a six-question test:

A change record is required if any answer is yes:

Does public or externally observable behavior change?
Does any acceptance criterion change, get added, removed, or deprecated?
Do proof obligations change materially?
Does the implementation introduce a new component, boundary crossing, runtime dependency, or integration?
Does responsibility move between modules or layers in a way future work must understand?
Does rollout, migration, or compatibility handling need explicit coordination?

If all answers are no, it's a simple fix. Fix it, commit it, move on.

Examples of simple fixes that don't need a CR:

Typo or comment correction
Bug fix that restores already-specified behavior within the same module
Local performance improvement with no contract or structure change
Test stabilization when the proof obligation itself doesn't change
Logging improvement within an existing flow

Examples of small changes that still need a CR:

Adding a new config flag visible to consumers
Changing timeout behavior that users can feel
Introducing a new database table or integration
Splitting logic into a new module that changes responsibility boundaries

The change package as workbench

Here's where the approach differs from traditional change management.

During active implementation, the change record is the workbench -- the central coordination surface. But the spec and verification map move ahead to the target state before implementation catches up.

The sequence:

Create the change record. Document what's changing and why.
Update spec.md to the target state. The spec now describes what the system should do after the change, not what it currently does. This intentionally creates a temporary mismatch.
Update verification.yaml. The proof map reflects the new contract. Some tests don't exist yet.
Write or update tests. Tests align with the new spec. They may fail because the old implementation doesn't satisfy the new contract.
Implement. Make the tests pass.
Verify. All tests pass, all quality gates pass.
Re-establish alignment. Confirm that specs, verification map, tests, and implementation all agree.

The key insight: specs move first. The living spec represents the intended target contract during active work, not the current implementation state. The change record tracks what's in flight. When the change completes, everything realigns.

This avoids the two common failure modes:

Spec becomes stale (can't happen -- spec moves before code)
Spec change feels like ceremony (it's part of the change flow, not separate from it)

Temporary misalignment is intentional

This point deserves emphasis because it's counterintuitive.

In the middle of a change, the spec says one thing and the implementation does another. Tests may be failing. That's fine. The change record exists to track this in-progress state. The spec is the target, not a description of current reality.

The rule is: temporary in-progress misalignment can be acceptable. Done means alignment is restored.

This only works with discipline. The change must close. Alignment must be verified. Specs that moved ahead but never got implemented are drift, not progress.

A real example

From the SitePilot2 bootstrap, CR-002 (development quality baseline) shows this pattern in a real change:

Problem: Even with a Rust project in place, contributors lack a shared local workflow. Quality expectations are documented but not enforceable.

Proposed contract delta:

Spec delta: The repository gains an explicit local development workflow. Quality policy documentation changes from hypothetical commands to actual baseline commands. The repository baseline includes a Bun-first Playwright toolchain for browser proof.

Verification delta: Add proof that documented local commands execute successfully. Add proof that quality and tech-stack docs reflect the implemented toolchain. Add proof that sample Playwright tests pass.

What happened:

Playwright scaffolding was added
CI workflow was created running all quality checks
Local developer commands were defined and documented
quality-policy.md was updated from aspirational to actual
docs/tech-stack.md was updated to reflect real tooling

Status log:

2026-03-08 - created
2026-03-08 - activated: Playwright baseline scaffolding added
2026-03-08 - updated: normalized CI to Bun-based installation
2026-03-08 - updated: defined canonical Cargo commands and Bun wrappers
2026-03-08 - updated: expanded CI to run full quality baseline
2026-03-08 - verified: all docs synchronized, bun run verify passes
2026-03-08 - merged

Each status log entry is a sentence. The whole change record is readable in two minutes. But it captures intent, scope, contract delta, and verification in a way that git history alone never would.

Branch alignment

Branches map one-to-one to change records:

One CR -> one branch
One branch -> one PR
One PR -> one observable contract delta

Naming convention: cr/001-repository-bootstrap, cr/002-development-quality-baseline.

This keeps spec changes, proof changes, and code changes visible as one coherent unit. A PR reviewer can look at the change record to understand what contract delta they're reviewing, rather than trying to reconstruct intent from diffs.

Mixed-change escalation

Sometimes work starts as a simple fix and expands. You're fixing a bug, and you realize the fix requires changing an acceptance criterion. Or you're refactoring, and you discover the module boundary needs to move.

The rule: if work starts as a simple fix but expands into contract or structural change:

Stop treating it as exempt
Create a CR immediately
Capture the already-discovered delta
Continue only after readiness is re-evaluated if the new scope is material

This prevents the common pattern where "quick fixes" silently evolve into undocumented contract changes.

Supporting artifacts

Not every change needs the same documentation depth. Trigger rules keep overhead proportional to complexity:

design.md is mandatory when:

More than one viable implementation approach exists
A new dependency, component, or architectural boundary is introduced
The change spans multiple feature packages
A security, performance, or reliability tradeoff must be chosen

tasks.md is mandatory when:

Implementation requires more than three dependent steps
Rollout or migration must happen in sequence
Multiple specs must change in a controlled order

ADR input is mandatory when:

The change revisits a durable technology choice
The change alters a cross-cutting architectural rule

Simple additive changes may need only change.md. Complex cross-cutting changes may need all supporting artifacts. The trigger rules prevent both under-documentation and over-documentation.

What this costs

Honest accounting:

A simple change record takes ~10 minutes to write
A complex one with tasks.md takes ~30 minutes
Updating specs before implementation takes ~5-15 minutes depending on the change
Re-establishing alignment at the end takes ~5 minutes

Against this: every future developer (human or AI) who touches the affected feature can understand what changed, why, and how the contract evolved. The audit trail is explicit, not reconstructed from git archaeology.

The tradeoff sharpens when AI is doing most of the implementation. AI can generate code in seconds. The bottleneck is ensuring that code satisfies the right contract. Change records make the contract explicit. Without them, you're back to reviewing diffs and hoping you can reconstruct intent -- which is the trust problem from Part 1.

Drift: the real enemy

The biggest risk to living specs isn't that they change -- it's that they drift. Code changes without spec updates. Tests pass but don't cover new behavior. Quality gates exist but aren't enforced.

Drift is silent. Each individual change is small enough to ignore. Over weeks, the spec describes a system that doesn't exist anymore.

Detection examples:

Acceptance criteria with no linked tests
Tests tagged to deleted criteria
Behavior changes with no spec or change-record update
Quality checks required by policy but absent in CI
ADR commitments violated by module structure

Some drift can be caught deterministically (missing test linkage, absent CI checks). Some requires judgment (does this behavior change constitute a contract change?). The approach uses automation where possible and review questions where not.

The PR review checklist makes this concrete:

Did the contract change? If yes, where is the change record?
Did acceptance criteria change? If yes, where is the updated proof mapping?
Does the implementation satisfy the updated contract without introducing undocumented behavior?

These questions are as important when reviewing AI-generated PRs as human-generated ones. More important, arguably, because AI doesn't flag its own contract violations.

DEV Community