Aleksandr

Posted on Jul 4

Handoff-Driven Development

#ai #llm #productivity #documentation

Spec-driven development, improved. This is SDD + handoffs — the cutting edge of best practice for solo development and small teams alike.
It is not a story about a swarm of agents autonomously shipping a product through an abstract "plan → code → review → test → deploy" loop, syncing with each other via workflows. We don't have a private datacenter or half a million dollars for tokens.

At the end there is a link to a template repository you can clone — or simply point at — and tell Opus (Fable): "set up the same specification system for me." It handles the rest on its own.

The problems we solve

1. Context cleanliness. The model's context must contain what the task needs — and must not contain anything extra. Extra material raises cognitive load and degrades the solution. The right context is the right answer, the right code, the right decision. Context is also finite and paid for — the question "what NOT to load" matters no less than "what to load."

2. Switching between tasks. The context of a single task can be assembled by hand, right in the prompt. But there are usually several tasks, and they run in parallel: auditing and fixing server code, designing the client app, migrating to a new CI/CD, conceptual planning of a new feature. Each has its own set of relevant documents, its own done/remaining state, its own accepted decisions.
Sessions are finite too: even within one task, context given at the start of a session is no longer current when carried into another session. Switching between tasks and between sessions costs us time, quality, and tokens.

Part one: SDD

Maps

Starting position: a solution monorepo with a motley set of projects — applications, microservices, shared packages, devops manifests.

Each such subproject has its own specs/ folder with its own local specification and its own local map. And at the root sits the root specification map, which links to the subproject maps. The result is a hierarchy, and it delivers the main thing: from the root you can write a prompt about any part of the solution — the model finds what it needs on its own, descending through the maps in two or three hops without loading anything extra along the way.

Two or three levels of hierarchy: root → subproject → package. More is questionable: every level is a hop the reader (the LLM) must make before reaching actual content.

The reachability invariant: every specification file is reachable through a chain of markdown links from at least one map (a bare text path doesn't count as a link — neither the IDE nor the audit sees it). An unreachable file is an orphan; a document nobody can find does not exist. Orphans are caught by a dedicated audit script.

Genres

Every specification file belongs to a genre. The genre idea comes from Diátaxis, but the taxonomy is our own.

as-built — a description of existing code, including architecture overviews. Obliged to follow the code — or to honestly state that it is stale (more on that below). The level of detail is convenient to think of in C4 terms: the root map and root as-built specs are System Context and Containers (the solution as a whole and its container applications); subproject specs are Container (the internals of one container: domains, services, stores); the third level is Component. The Code level is not given to specs: code describes itself best.

plan — an implementation plan for a specific track or feature. The heart of a working session: implementation revolves around the plan. Fundamentally temporary: a completed plan posing as a current document is disinformation. After implementation the plan dies: by default — a tombstone (its value has already been distilled into as-built and the backlog; tombstones below); superseded — if the plan has a successor; archive — if it retains reference value as decision history.

reference — above-the-code reference material: research, rationales, working notebooks. Like as-built, only not about code: not obliged to follow reality, valuable as material.

vision — above-the-code vision of a product or feature. The code doesn't exist yet, or isn't obliged to conform. Business plans and product hypotheses go here too: everything that is "what we are building," not "how it works." Roughly like plan, but not for code.

log — the journal of architecture decisions, aka ADR (Architecture Decision Records). Append-only: each entry is "context → decision → consequences," and entries carry their own statuses. It answers the eternal question "why is this done so strangely here" — in a paragraph, not an archaeological expedition through git history.

Statuses

The second axis, orthogonal to genre: where the document is in its lifecycle.

Status	Meaning
`draft`	Brainstorm/proposal, not accepted
`accepted`	Accepted for implementation (for plan/vision)
`current`	Up to date; for as-built — matches the code
`stale`	Known drift from code/reality, awaiting reconciliation
`superseded`	Replaced by another document — a link to the successor is mandatory
`archive`	Historical, unmaintained, reference value only

Important! The most valuable status in the table is stale. The classic requirement "always update documentation together with the code" is not always satisfiable. The HDD rule is more honest: code and its as-built spec are edited in the same session; can't update the spec right now — mark it stale. The debt isn't paid, but it is explicitly marked. A model that sees stale knows: don't trust this document, re-verify against the code. No disinformation occurred — information about unreliability occurred.

So every specification file carries one line under its title:

Status: as-built / current · verified: 2026-07-03

Genre, status, and verification date. Note: verified: is the date of the last reconciliation of the content against the code, not the date of a text edit. Fixing a typo is not verifying. "Current, verified yesterday" and "current, verified six months ago" read very differently.

Tombstones

A document has a lifecycle — so it must have a death. If documentation is never deleted ("might come in handy"), it accumulates. And then a graveyard poses as a library — a direct hit to context cleanliness.

The death protocol is two steps. The document is deleted physically — you already have an archive, it's called git. In the map, its place is taken by a tombstone, one line: "spec-server-audit.md — deleted 2026-07-03, git history."

What the tombstone solves: a dead specification file cannot accidentally end up in a live context. On the other hand, we know that this file was deleted and when — that information exists. And we don't think we simply lost it. If it is suddenly needed for something (it isn't), it can be pulled from git rather than keeping 150 kilobytes in the live catalog.

Backlogs

One more effective feature: two files, backlog.md and backlog-resolved.md. The invariant: a task lives in exactly one of them. Opened a task — a line in the backlog; closed it — the line moves to resolved together with a description of the solution.

Why this is good for AI. The open backlog is a compact, always-current answer to "what isn't done yet," cheap to load into context in full. The resolved one is a precedent base: before fixing a bug, the model can grep resolved and discover that something similar has already been fixed, in such-and-such a way — and not reinvent the solution (and not reintroduce the old bug). And the single-location invariant guarantees that a task isn't listed as "open" in one file and "closed" in the other at the same time — a model is more vulnerable to this than a human. For a model, such a tracker in git is suddenly (actually obviously) more convenient than an MCP to Jira: a file loads into context in one read and greps offline, while Jira over MCP is a tool call per action, pagination, and JSON wrappers — tokens, latency, and noise; lower context quality, lower solution quality. The main thing is atomicity: a task closes in the same commit as the code and the spec — one transaction of truth. Nothing stops you from keeping Jira itself for stakeholders and their dashboards. An honesty caveat: synchronization is out of HDD's scope.

Part two: HANDOFF

If SDD is about storing and organizing specifications, then Handoff is about context management — about passing context between sessions and between tasks. Here the LLM is, in effect, taking over a watch. And for that we need a watch log.

The index: TRACKS.md

A track is a task: it evolves, it gets superseded by subsequent tasks, it splits into subtasks (new tracks). Eventually, it simply gets done.
The root specs/ holds TRACKS.md — an index of live tracks, one line per track:

- **Server audit fixes** — [handoff](server/specs/handoff-server-audit-fixes.md) —
  active, 2026-07-02, next: error channel in concept.service.ts

The gist in one phrase, a link to the handoff, a status (active / paused — and why / blocked — and on what), the date of last touch, the next step. No content — pointers only. The whole file reads in a fraction of a second and answers "what is in progress and what to pick up." Switching between tasks is choosing a handoff.

The handoff

The handoff itself is a file, handoff-<slug>.md, living next to its work zone: a server track's — in the server specs, a product track's — in the root ones. One file per track, edited in place; git keeps the versions. Inside:

What to load at session start — a numbered list with a "why" for every item. And, mandatorily, an anti-list: what NOT to load in full (heavy artifacts — grep, don't read). A handoff is a recipe for assembling context for the task: the solution to problem 1 at the track level.
Task and methodology — the worked-out working protocol, verbatim. For example: recon before editing → report and propose → wait for approval → fix with a test, where the new test is first run against the pre-fix code — to prove it catches the bug at all.
State — done / remaining / noticed along the way but out of scope.
Methodological lessons — problems that keep recurring, or actions required from the developer.
Do not re-decide — the track's decisions, each with a one-line rationale. Revisiting a decision doesn't erase it silently, but marks it: ⚠, date, evidence.
First step of the new session — a concrete entry point: what to read, what to scout, what to ask the human.

Rhythm and closure

A working day with this construction looks like this. Session start: the model reads TRACKS.md, loads the handoff of the chosen track, and from it — exactly the specified context. Session end: update the handoff (state, lessons, decisions) and the line in the index (date, next). That's all.

Closing a track. The handoff is deleted — but through distillation: the "do not re-decide" decisions and everything useful move into the permanent specs; a one-or-two-sentence summary goes into TRACKS-LOG.md, the journal of closed tracks; the line is removed from the index. New handoffs are cloned from the template and filled in with the task from the prompt. Old ones are deleted — operational activity heavily pollutes a repository.

The system is closed by a single instruction in the agent config (for Claude Code that's CLAUDE.md): starting work on a track — load its handoff; ending a session — update the handoff and its line in TRACKS.md. It works — Claude keeps getting better and better at discipline.

How this differs from Spec Kit and Kiro

Spec-driven is currently associated with GitHub Spec Kit and Amazon Kiro — so what makes HDD different? For Kit and Kiro, the specification is generation input: specify → plan → tasks → feature code. The spec looks forward and effectively dies after the merge. In HDD, specs look at the existing system and are obliged to remain true, while the handoff holds the truth about unfinished work. The Kit/Kiro flow fits into HDD's plan genre — with the difference that after implementation the plan dies (tombstone), while as-built lives on.

HDD's borrowings: lifecycle statuses — from ADR/RFC practice; genres — Diátaxis; map zoom levels — C4; "documentation in the repo + a mechanical audit" — docs-as-code as it is. What is original here: the "genre × status × verified" triple on every file, the reachability invariant, and the handoff layer as a whole.

Limitations

Once more: this is not a story about a swarm of agents; it is about solo development, or a small team on a regular monthly subscription. The place where MCPs to Confluence and Jira are excessive and counterproductive. On the other hand, solo development is ever more common in the modern era. Moreover, LLMs make it possible for one person to run several projects.

The template repository

Everything described is packaged into a template repository: github.com/yetanothervan/handoff-driven-development

Inside: spec-specs.md — the system's rules in one document; a specs-map.md starter; handoff-template.md; TRACKS.md and TRACKS-LOG.md; the audit script specs-audit.py; a ready-made block for CLAUDE.md.

To apply it: clone the repo — or simply give your agent the link with this prompt, verbatim:

Look at the repository https://github.com/yetanothervan/handoff-driven-development Let's apply the same specification system in my project: lay out the maps and rules across our structure, create TRACKS.md, and propose the first tracks.

It lays everything out on its own from there.

Top comments (1)

Troels Roennow • Jul 13

I actually wrote a guide on progressive loading for autonomous development, and this hits the exact same nerve. The challenge is identical: how do we stop the model from drowning in stale context?

Your reachability invariant and the handoff file are just more explicit versions of the same mechanism. Solo dev with AI works best when you treat context like a scarce resource. Every extra file loaded is a tax on accuracy.

Thanks for sharing, I'll go through your repo to see what I can learn!