I like clean code. Most of the programs I've written over the years are reasonably well structured. Sure, there's always that moment every two years where I look at old code and think — evolved, great. Stagnation is dead. But the code follows the "right" rules, whatever that means at the time. Most importantly, it follows a common thread.
As a freelancer switching between IIOT applications, web apps, message-based backends, infrastructure automation, and pipelines, I'm mostly in the luxury position of being able to forget what I did and understand it again quickly just by reading the lines and the folder structure.
But what I very rarely do is document the why.
The Religious Wars of Documentation
You know the debate. Using IDE features to auto-generate docs for classes and methods that add exactly zero value:
/// <summary>
/// Gets a blue collar worker
/// </summary>
public class GetBlueCollarWorkerRequestHandler(
ILogger<GetBlueCollarWorkerRequestHandler> logger,
IBlueCollarWorkerAdapter blueCollarWorkerAdapter)
: IRequestHandler<GetBlueCollarWorkerRequest, GetBlueCollarWorkerResponse>
When the method is called Handle and the class is called GetBlueCollarWorkerRequestHandler, I really don't need "Gets a Blue Collar Worker" written above it.
So personally I only document two things:
- Official interfaces — Swagger/REST APIs, NuGet/npm packages, libs
- Problematic areas — when I write something genuinely non-obvious, with links to sources
Sometimes that's enough. But it doesn't help you get an overview of why the program works the way it does. Documentation of the "why" takes time. I use Arc42 with Architecture Decision Records — but that's in a separate repository, and it only covers architectural thoughts, not implementation decisions made feature-by-feature.
Result: six months after shipping, nobody remembers why anything was built the way it was.
How Developer Work Has Changed
We've all been through the evolution:
- Stack Overflow copy-paste (okay, I did it)
- ChatGPT copy-paste (I lied, I did both)
- Claude/Codex in the IDE writing the code for me
When I started with AI-assisted coding, I ran into the usual problems:
- Claude just implements things. Lots of code. Am I still going to read all of it?
- Using
claude.mdand coding principles — Claude sometimes just ignores them - Well-structured code → Claude produces good results. Bad code → Claude makes it worse
- Beyond a certain complexity, Claude starts doing weird things
- "Just let him do" is a terrible idea, even when you're working on three things in parallel
- Context-switching between parallel topics is hard for humans, harder for AI
And then there were the recurring annoyances with IDE-based Claude:
- After context is lost, I waste a lot of tokens just getting back to where I was
- Long chat threads create cluttered history — and once it's gone, re-explaining everything is exhausting
- Every restart, I'm explaining the entire project from scratch again
Specification-First Agentic Development
I felt my approach wasn't good enough. It wasn't leveraging what AI could do — specifically: document whatever you want, without complaining, which is something I as a developer was never willing to do consistently.
The Core Idea
Instead of staying in the IDE and trying to keep track, I needed a more structured approach:
- Everything gets written down
- The AI needs to know where it was and what to do next
- I need to track what's changed, what's planned, and what's done
The Phase Workflow
Here's how it looks in practice:
- Discuss new things in Claude Web (not the IDE)
-
Build a rough plan and create a
.mddocument from the conversation - Move the file to the IDE — let Claude double-check the document, ask questions, resolve ambiguities
- Move to
planned/

When it's time to implement, Claude always knows exactly which phases exist and how to handle them.
The Folder Structure
.agentsmith/phases/
├── done/ # completed phases (historical reference)
├── active/ # phase currently being worked on (max 1)
└── planned/ # upcoming phases with requirements
The claude.md Instructions
## Implementation Workflow (follow this order for every phase)
1. Write phase prompt first — create planned/p{NN}-slug.md BEFORE writing any code
2. Move to active — when starting work
3. Enter plan mode — explore codebase, design approach, get approval before coding
4. Implement step by step — contracts first, then implementation, then DI, then tests
5. Build after each step — fix errors immediately
6. Run ALL tests — 0 failures before moving on
7. Log decisions — append to decisions.md (what, alternatives, why) — MANDATORY
8. Update context.yaml — move phase from active to done
9. Move to done
10. Commit — one commit per phase
The Decision Log
After every phase, Claude appends to decisions.md in the repo:
## p66: Docs Enhancement — Self-Documentation & Multi-Agent Orchestration
- [Architecture] DESIGN.md placed in docs/ not project root — it is a docs-site
concern, not product code
- [Tooling] CSS-only theme overrides via extra_css, no custom MkDocs templates —
keeps MkDocs upgrades safe
- [TradeOff] Content first, styling second — missing content is a blocker,
imperfect styling is not
## p67: API Scan Compression & ZAP Fix
- [Architecture] Category slicing (auth/design/runtime) instead of finding
compression — findings are already compact at ~90 chars/piece, compression
would lose information
- [Implementation] Skip DAST skills on ZAP failure via ZapFailed flag — avoids
wasting 2 LLM calls on empty input
Six months later, when nobody remembers why anything was done the way it was — it's all right there.
Saving Tokens with context.yaml
Having all these documents in the repo, I don't want Claude to re-read everything from scratch every time. That's what context.yaml is for.
It describes the architecture, stack, integrations, quality rules, and — critically — a compressed summary of every phase:
meta:
project: agent-smith
version: 1.0.0
type: [agent, pipeline]
purpose: "Self-hosted AI orchestration framework: code, legal, security, workflows."
stack:
runtime: .NET 8
lang: C#
infra: [Docker, K8s, Redis]
testing: [xUnit, Moq, FluentAssertions]
sdks: [Anthropic, OpenAI, Google-Gemini, Octokit, LibGit2Sharp]
state:
done:
p01: "Solution structure, domain entities, contracts, YAML config loader"
p02: "Command/Handler pattern: 9 context records, 9 handler stubs, CommandExecutor"
p03: "Providers: AzureDevOps+GitHub tickets, Local+GitHub source, Claude agentic loop"
p04: "Pipeline execution: IntentParser, PipelineExecutor, ProcessTicketUseCase, DI wiring"
...
Claude knows directly what features have been implemented just by reading this file — and knows which phase document to look at for details.
An Interesting Parallel: Karpathy's Knowledge Base
Andrej Karpathy recently wrote about using LLMs to build personal knowledge bases: collecting external material into a raw/ directory, letting the LLM compile it into a linked markdown wiki, then running Q&A against it.
The structural parallel is obvious. But there's a key difference:
- Karpathy collects external knowledge — papers, articles, datasets
- Specification-First Agentic Development persists internal knowledge while building the product
The documentation isn't a separate artifact you create after the fact. It's generated as a side effect of the development process itself.
A Real Example: The Documentation Site
As all features, bugfixes, ideas, and decisions are already documented in the repo, it's not surprising that Claude can generate full technical documentation rapidly — and accurately.
Phase 53 of my project was exactly this:
# Phase 53: Documentation Site
## Goal: Technical documentation at docs.agent-smith.org
Complete file structure, MkDocs Material setup, GitHub Actions
deployment, README reduction to essentials...
How long did it take? About 15 minutes. Because all the information was already there — in the phase files, the decision log, the context.yaml. Claude just had to synthesize it.
I really celebrated that one.
Summary
Specification-First Agentic Development is just how the work is structured. It defines phases directly in code, producing a consistent development pattern that includes the plan, the decisions, and the reasoning.
The benefits:
- Fewer wasted tokens — context.yaml gives Claude what it needs without re-reading everything
- Parallelism — multiple phases can be planned and tracked simultaneously
- Traceability — every decision is logged with alternatives and rationale
- Restartability — restart your machine, restart Claude, pick up exactly where you left off
- Self-documentation — the docs practically write themselves
It's not rocket science. It's just discipline — finally enforced by a patient AI that never complains about writing things down.
GitHub Repo with the template and howto explanation.
Agent Smith implementation where the idea was born GitHub repo
Originally posted on CodingSoul
Top comments (0)