<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Felipe</title>
    <description>The latest articles on DEV Community by Felipe (@felipefontoura).</description>
    <link>https://dev.to/felipefontoura</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4014389%2F728f0d5e-23c3-42ae-818c-07c0aad0fa17.jpg</url>
      <title>DEV Community: Felipe</title>
      <link>https://dev.to/felipefontoura</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/felipefontoura"/>
    <language>en</language>
    <item>
      <title>I built a spec-driven workflow for my AI coding agent. Here is what actually mattered.</title>
      <dc:creator>Felipe</dc:creator>
      <pubDate>Sat, 04 Jul 2026 04:47:51 +0000</pubDate>
      <link>https://dev.to/felipefontoura/i-built-a-spec-driven-workflow-for-my-ai-coding-agent-here-is-what-actually-mattered-4dkk</link>
      <guid>https://dev.to/felipefontoura/i-built-a-spec-driven-workflow-for-my-ai-coding-agent-here-is-what-actually-mattered-4dkk</guid>
      <description>&lt;p&gt;I shipped a crypto fintech solo last year: 13 apps, three databases, Kubernetes, real money, in about 70 days, with AI agents doing most of the typing. The thing that made it possible was not a better model. It was refusing to prompt.&lt;/p&gt;

&lt;p&gt;Here is the problem with prompting a capable agent. It has no memory between sessions. Every conversation starts at zero. Ask it to "add authentication" and it will confidently write 500 lines that compile, pass a lint, and solve a slightly different problem than the one you had, using an architecture you never agreed to. The more capable the model, the further a vague instruction carries it in the wrong direction.&lt;/p&gt;

&lt;p&gt;So I stopped prompting and started specifying. I packaged the workflow I settled on into an open-source kit for the &lt;a href="https://www.npmjs.com/package/@earendil-works/pi-coding-agent" rel="noopener noreferrer"&gt;Pi coding agent&lt;/a&gt;, called &lt;a href="https://github.com/felipefontoura/pi-sdd-kit" rel="noopener noreferrer"&gt;pi-sdd-kit&lt;/a&gt;. This post is about the two design decisions that actually earned their keep. The rest is detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of it
&lt;/h2&gt;

&lt;p&gt;Spec-driven development (SDD) inverts the usual order. The specification is the primary artifact and the code is a consequence of it, not the other way around. In pi-sdd-kit that means every feature moves through a fixed pipeline, with a human approval gate between each phase:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IDEA → PLAN → PRD → SPEC → TASKS → EXEC → REVIEW
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each phase is a slash command (&lt;code&gt;/skill:sdd-prd&lt;/code&gt;, &lt;code&gt;/skill:sdd-spec&lt;/code&gt;, and so on). The agent cannot advance to the next phase without your sign-off. That is the whole idea. Everything below is how the sign-off is made real instead of aspirational.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision 1: steering docs are the memory the agent does not have
&lt;/h2&gt;

&lt;p&gt;Most people put their coding conventions in a &lt;code&gt;CLAUDE.md&lt;/code&gt; (or &lt;code&gt;AGENTS.md&lt;/code&gt;) and call it context. That is a start. It is not enough, because conventions tell the agent &lt;em&gt;how&lt;/em&gt; to write code and say nothing about &lt;em&gt;what&lt;/em&gt; it is building or &lt;em&gt;why&lt;/em&gt; the architecture is shaped the way it is.&lt;/p&gt;

&lt;p&gt;pi-sdd-kit splits context into a layer that lasts. Steering docs live in &lt;code&gt;.ai/steering/&lt;/code&gt; and load every session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.ai/steering/
  product.md       what it is, who it is for, what it is explicitly not
  tech-stack.md    the stack and, more importantly, the reason for each choice
  conventions.md   patterns: how routes, errors, and auth are structured
  principles.md    the non-negotiables ("all money math is integer, never float")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;tech-stack.md&lt;/code&gt; line that pays for itself is not the dependency list, it is the rationale: "We use PostgreSQL because payment records need ACID guarantees." That one sentence stops the agent from suggesting SQLite three sessions later when you add a service. The steering folder is the reason a solo developer can hold a 13-app system in their head: they do not. The spec holds it.&lt;/p&gt;

&lt;p&gt;These files change rarely. When they do, it is because you made a deliberate architectural decision, and updating the file is how that decision becomes permanent context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision 2: &lt;code&gt;.status&lt;/code&gt; is the only gate, and file existence is not approval
&lt;/h2&gt;

&lt;p&gt;This is the part I would defend hardest. Each feature spec folder has a one-line &lt;code&gt;.status&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.ai/sdd/specs/001-user-auth/.status
# contents, in order over the feature's life:
#   requirements:approved
#   design:approved
#   tasks:approved
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent reads that file before it does anything, and the rule is blunt: a &lt;code&gt;design.md&lt;/code&gt; sitting on disk is &lt;strong&gt;not&lt;/strong&gt; a green light. A completed &lt;code&gt;tasks.md&lt;/code&gt; is &lt;strong&gt;not&lt;/strong&gt; a green light. The only green light is the &lt;code&gt;.status&lt;/code&gt; token. The agent prompt is explicit: do not write code before &lt;code&gt;tasks:approved&lt;/code&gt; appears.&lt;/p&gt;

&lt;p&gt;This sounds obvious until you watch an eager agent see a finished-looking spec in the directory and race straight into implementation. A file and an approved file look identical to something scanning the folder. The status token removes that ambiguity completely. Gates that live in your head are culture; a gate that lives in a file the agent must read is a mechanism. Only the mechanism survives a deadline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The small thing that removes most ambiguity: EARS
&lt;/h2&gt;

&lt;p&gt;Functional requirements are written in EARS, the Easy Approach to Requirements Syntax from requirements engineering. It is a handful of sentence templates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WHEN a task is completed, THE SYSTEM SHALL record the timestamp and the user.
IF the amount is &amp;lt;= 0, THE SYSTEM SHALL reject with 422 "amount must be positive".
WHILE a task is archived, THE SYSTEM SHALL NOT allow edits.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;WHEN&lt;/code&gt;, &lt;code&gt;IF&lt;/code&gt;, &lt;code&gt;WHILE&lt;/code&gt;, &lt;code&gt;SHALL&lt;/code&gt;. It reads like a contract because it is one, and it leaves the agent almost no room to interpret. This is a 30-year-old format used by Airbus and NASA, and it turns out to be exactly what an amnesiac collaborator needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pi &lt;span class="nb"&gt;install &lt;/span&gt;npm:@felipefontoura/pi-sdd-kit
&lt;span class="c"&gt;# then, in Pi:&lt;/span&gt;
/reload
/skill:sdd-init
/skill:sdd-prd      &lt;span class="c"&gt;# write requirements&lt;/span&gt;
/skill:sdd-spec     &lt;span class="c"&gt;# design, after you approve requirements&lt;/span&gt;
/skill:sdd-tasks    &lt;span class="c"&gt;# break into 2-4h tasks&lt;/span&gt;
/skill:sdd-exec     &lt;span class="c"&gt;# implement, only after tasks:approved&lt;/span&gt;
/skill:sdd-review   &lt;span class="c"&gt;# verify against the spec&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repo, with the full command reference and templates: &lt;a href="https://github.com/felipefontoura/pi-sdd-kit" rel="noopener noreferrer"&gt;github.com/felipefontoura/pi-sdd-kit&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest caveats
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;It is plain markdown underneath, so the &lt;em&gt;method&lt;/em&gt; is not tied to Pi. The skills and slash commands are. If you use Claude Code, GitHub's Spec Kit or AWS Kiro are the equivalents; I wrote a comparison &lt;a href="https://felipefontoura.com/articles/spec-driven-development-with-claude-code" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;SDD is not new. It is the latest point on a 30-year line: TDD drove code from tests, BDD from behavior examples, SDD from an approved spec. This is one opinionated, file-based take where the gates are mechanical.&lt;/li&gt;
&lt;li&gt;"13 apps in 70 days solo" is proof of principle, not a controlled study. I have 25 years of experience and an external deadline was doing real work. Speed was measured; long-term quality was not audited. The &lt;a href="https://felipefontoura.com/articles/spec-driven-development-case-study" rel="noopener noreferrer"&gt;full case study&lt;/a&gt; says this plainly, which is why I trust it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have tried spec-driven development and bounced off it, I would genuinely like to hear where the gates felt like overkill. That is the part I am least sure about.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
