Specification-First Agentic Development: A Methodology for Structured, Traceable AI-Assisted Development

holger leichsenring — Sun, 12 Apr 2026 20:04:59 +0000

I like clean code. Most of the programs I've written over the years are reasonably well structured. Sure, there's always that moment every two years where I look at old code and think — evolved, great. Stagnation is dead. But the code follows the "right" rules, whatever that means at the time. Most importantly, it follows a common thread.

As a freelancer switching between IIOT applications, web apps, message-based backends, infrastructure automation, and pipelines, I'm mostly in the luxury position of being able to forget what I did and understand it again quickly just by reading the lines and the folder structure.

But what I very rarely do is document the why.

The Religious Wars of Documentation

You know the debate. Using IDE features to auto-generate docs for classes and methods that add exactly zero value:

/// <summary>
/// Gets a blue collar worker 
/// </summary>
public class GetBlueCollarWorkerRequestHandler(
    ILogger<GetBlueCollarWorkerRequestHandler> logger,
    IBlueCollarWorkerAdapter blueCollarWorkerAdapter) 
    : IRequestHandler<GetBlueCollarWorkerRequest, GetBlueCollarWorkerResponse>

When the method is called Handle and the class is called GetBlueCollarWorkerRequestHandler, I really don't need "Gets a Blue Collar Worker" written above it.

So personally I only document two things:

Official interfaces — Swagger/REST APIs, NuGet/npm packages, libs
Problematic areas — when I write something genuinely non-obvious, with links to sources

Sometimes that's enough. But it doesn't help you get an overview of why the program works the way it does. Documentation of the "why" takes time. I use Arc42 with Architecture Decision Records — but that's in a separate repository, and it only covers architectural thoughts, not implementation decisions made feature-by-feature.

Result: six months after shipping, nobody remembers why anything was built the way it was.

How Developer Work Has Changed

We've all been through the evolution:

Stack Overflow copy-paste (okay, I did it)
ChatGPT copy-paste (I lied, I did both)
Claude/Codex in the IDE writing the code for me

When I started with AI-assisted coding, I ran into the usual problems:

Claude just implements things. Lots of code. Am I still going to read all of it?
Using claude.md and coding principles — Claude sometimes just ignores them
Well-structured code → Claude produces good results. Bad code → Claude makes it worse
Beyond a certain complexity, Claude starts doing weird things
"Just let him do" is a terrible idea, even when you're working on three things in parallel
Context-switching between parallel topics is hard for humans, harder for AI

And then there were the recurring annoyances with IDE-based Claude:

After context is lost, I waste a lot of tokens just getting back to where I was
Long chat threads create cluttered history — and once it's gone, re-explaining everything is exhausting
Every restart, I'm explaining the entire project from scratch again

Specification-First Agentic Development

I felt my approach wasn't good enough. It wasn't leveraging what AI could do — specifically: document whatever you want, without complaining, which is something I as a developer was never willing to do consistently.

The Core Idea

Instead of staying in the IDE and trying to keep track, I needed a more structured approach:

Everything gets written down
The AI needs to know where it was and what to do next
I need to track what's changed, what's planned, and what's done

The Phase Workflow

Here's how it looks in practice:

Discuss new things in Claude Web (not the IDE)
Build a rough plan and create a .md document from the conversation
Move the file to the IDE — let Claude double-check the document, ask questions, resolve ambiguities
Move to planned/

When it's time to implement, Claude always knows exactly which phases exist and how to handle them.

The Folder Structure

.agentsmith/phases/
├── done/       # completed phases (historical reference)
├── active/     # phase currently being worked on (max 1)
└── planned/    # upcoming phases with requirements

The `claude.md` Instructions

## Implementation Workflow (follow this order for every phase)

1. Write phase prompt first — create planned/p{NN}-slug.md BEFORE writing any code
2. Move to active — when starting work
3. Enter plan mode — explore codebase, design approach, get approval before coding
4. Implement step by step — contracts first, then implementation, then DI, then tests
5. Build after each step — fix errors immediately
6. Run ALL tests — 0 failures before moving on
7. Log decisions — append to decisions.md (what, alternatives, why) — MANDATORY
8. Update context.yaml — move phase from active to done
9. Move to done
10. Commit — one commit per phase

The Decision Log

After every phase, Claude appends to decisions.md in the repo:

## p66: Docs Enhancement — Self-Documentation & Multi-Agent Orchestration

- [Architecture] DESIGN.md placed in docs/ not project root — it is a docs-site
  concern, not product code
- [Tooling] CSS-only theme overrides via extra_css, no custom MkDocs templates —
  keeps MkDocs upgrades safe
- [TradeOff] Content first, styling second — missing content is a blocker,
  imperfect styling is not

## p67: API Scan Compression & ZAP Fix

- [Architecture] Category slicing (auth/design/runtime) instead of finding
  compression — findings are already compact at ~90 chars/piece, compression
  would lose information
- [Implementation] Skip DAST skills on ZAP failure via ZapFailed flag — avoids
  wasting 2 LLM calls on empty input

Six months later, when nobody remembers why anything was done the way it was — it's all right there.

Saving Tokens with `context.yaml`

Having all these documents in the repo, I don't want Claude to re-read everything from scratch every time. That's what context.yaml is for.

It describes the architecture, stack, integrations, quality rules, and — critically — a compressed summary of every phase:

meta:
  project: agent-smith
  version: 1.0.0
  type: [agent, pipeline]
  purpose: "Self-hosted AI orchestration framework: code, legal, security, workflows."

stack:
  runtime: .NET 8
  lang: C#
  infra: [Docker, K8s, Redis]
  testing: [xUnit, Moq, FluentAssertions]
  sdks: [Anthropic, OpenAI, Google-Gemini, Octokit, LibGit2Sharp]

state:
  done:
    p01: "Solution structure, domain entities, contracts, YAML config loader"
    p02: "Command/Handler pattern: 9 context records, 9 handler stubs, CommandExecutor"
    p03: "Providers: AzureDevOps+GitHub tickets, Local+GitHub source, Claude agentic loop"
    p04: "Pipeline execution: IntentParser, PipelineExecutor, ProcessTicketUseCase, DI wiring"
    ...

Claude knows directly what features have been implemented just by reading this file — and knows which phase document to look at for details.

An Interesting Parallel: Karpathy's Knowledge Base

Andrej Karpathy recently wrote about using LLMs to build personal knowledge bases: collecting external material into a raw/ directory, letting the LLM compile it into a linked markdown wiki, then running Q&A against it.

The structural parallel is obvious. But there's a key difference:

Karpathy collects external knowledge — papers, articles, datasets
Specification-First Agentic Development persists internal knowledge while building the product

The documentation isn't a separate artifact you create after the fact. It's generated as a side effect of the development process itself.

A Real Example: The Documentation Site

As all features, bugfixes, ideas, and decisions are already documented in the repo, it's not surprising that Claude can generate full technical documentation rapidly — and accurately.

Phase 53 of my project was exactly this:

# Phase 53: Documentation Site

## Goal: Technical documentation at docs.agent-smith.org

Complete file structure, MkDocs Material setup, GitHub Actions
deployment, README reduction to essentials...

How long did it take? About 15 minutes. Because all the information was already there — in the phase files, the decision log, the context.yaml. Claude just had to synthesize it.

I really celebrated that one.

Summary

Specification-First Agentic Development is just how the work is structured. It defines phases directly in code, producing a consistent development pattern that includes the plan, the decisions, and the reasoning.

The benefits:

Fewer wasted tokens — context.yaml gives Claude what it needs without re-reading everything
Parallelism — multiple phases can be planned and tracked simultaneously
Traceability — every decision is logged with alternatives and rationale
Restartability — restart your machine, restart Claude, pick up exactly where you left off
Self-documentation — the docs practically write themselves

It's not rocket science. It's just discipline — finally enforced by a patient AI that never complains about writing things down.

GitHub Repo with the template and howto explanation.
Agent Smith implementation where the idea was born GitHub repo
Originally posted on CodingSoul

DEV Community: holger leichsenring