I shipped a crypto fintech solo last year: 13 apps, three databases, Kubernetes, real money, in about 70 days, with AI agents doing most of the typing. The thing that made it possible was not a better model. It was refusing to prompt.
Here is the problem with prompting a capable agent. It has no memory between sessions. Every conversation starts at zero. Ask it to "add authentication" and it will confidently write 500 lines that compile, pass a lint, and solve a slightly different problem than the one you had, using an architecture you never agreed to. The more capable the model, the further a vague instruction carries it in the wrong direction.
So I stopped prompting and started specifying. I packaged the workflow I settled on into an open-source kit for the Pi coding agent, called pi-sdd-kit. This post is about the two design decisions that actually earned their keep. The rest is detail.
The shape of it
Spec-driven development (SDD) inverts the usual order. The specification is the primary artifact and the code is a consequence of it, not the other way around. In pi-sdd-kit that means every feature moves through a fixed pipeline, with a human approval gate between each phase:
IDEA → PLAN → PRD → SPEC → TASKS → EXEC → REVIEW
Each phase is a slash command (/skill:sdd-prd, /skill:sdd-spec, and so on). The agent cannot advance to the next phase without your sign-off. That is the whole idea. Everything below is how the sign-off is made real instead of aspirational.
Decision 1: steering docs are the memory the agent does not have
Most people put their coding conventions in a CLAUDE.md (or AGENTS.md) and call it context. That is a start. It is not enough, because conventions tell the agent how to write code and say nothing about what it is building or why the architecture is shaped the way it is.
pi-sdd-kit splits context into a layer that lasts. Steering docs live in .ai/steering/ and load every session:
.ai/steering/
product.md what it is, who it is for, what it is explicitly not
tech-stack.md the stack and, more importantly, the reason for each choice
conventions.md patterns: how routes, errors, and auth are structured
principles.md the non-negotiables ("all money math is integer, never float")
The tech-stack.md line that pays for itself is not the dependency list, it is the rationale: "We use PostgreSQL because payment records need ACID guarantees." That one sentence stops the agent from suggesting SQLite three sessions later when you add a service. The steering folder is the reason a solo developer can hold a 13-app system in their head: they do not. The spec holds it.
These files change rarely. When they do, it is because you made a deliberate architectural decision, and updating the file is how that decision becomes permanent context.
Decision 2: .status is the only gate, and file existence is not approval
This is the part I would defend hardest. Each feature spec folder has a one-line .status file:
.ai/sdd/specs/001-user-auth/.status
# contents, in order over the feature's life:
# requirements:approved
# design:approved
# tasks:approved
The agent reads that file before it does anything, and the rule is blunt: a design.md sitting on disk is not a green light. A completed tasks.md is not a green light. The only green light is the .status token. The agent prompt is explicit: do not write code before tasks:approved appears.
This sounds obvious until you watch an eager agent see a finished-looking spec in the directory and race straight into implementation. A file and an approved file look identical to something scanning the folder. The status token removes that ambiguity completely. Gates that live in your head are culture; a gate that lives in a file the agent must read is a mechanism. Only the mechanism survives a deadline.
The small thing that removes most ambiguity: EARS
Functional requirements are written in EARS, the Easy Approach to Requirements Syntax from requirements engineering. It is a handful of sentence templates:
WHEN a task is completed, THE SYSTEM SHALL record the timestamp and the user.
IF the amount is <= 0, THE SYSTEM SHALL reject with 422 "amount must be positive".
WHILE a task is archived, THE SYSTEM SHALL NOT allow edits.
WHEN, IF, WHILE, SHALL. It reads like a contract because it is one, and it leaves the agent almost no room to interpret. This is a 30-year-old format used by Airbus and NASA, and it turns out to be exactly what an amnesiac collaborator needs.
Try it
pi install npm:@felipefontoura/pi-sdd-kit
# then, in Pi:
/reload
/skill:sdd-init
/skill:sdd-prd # write requirements
/skill:sdd-spec # design, after you approve requirements
/skill:sdd-tasks # break into 2-4h tasks
/skill:sdd-exec # implement, only after tasks:approved
/skill:sdd-review # verify against the spec
Repo, with the full command reference and templates: github.com/felipefontoura/pi-sdd-kit
Honest caveats
- It is plain markdown underneath, so the method is not tied to Pi. The skills and slash commands are. If you use Claude Code, GitHub's Spec Kit or AWS Kiro are the equivalents; I wrote a comparison here.
- SDD is not new. It is the latest point on a 30-year line: TDD drove code from tests, BDD from behavior examples, SDD from an approved spec. This is one opinionated, file-based take where the gates are mechanical.
- "13 apps in 70 days solo" is proof of principle, not a controlled study. I have 25 years of experience and an external deadline was doing real work. Speed was measured; long-term quality was not audited. The full case study says this plainly, which is why I trust it.
If you have tried spec-driven development and bounced off it, I would genuinely like to hear where the gates felt like overkill. That is the part I am least sure about.
Top comments (0)