DEV Community

Cover image for Self-Refining Agents in Spec-Driven Development
Chris
Chris

Posted on

Self-Refining Agents in Spec-Driven Development

I’ve been experimenting with a spec-driven workflow, and I accidentally discovered something I didn’t expect: the agent started reviewing and improving its own work.

What I discovered is not new in terms of agentic AI; it's the point of agentic AI, but how I stumbled across it in my test was interesting nonetheless.

The Basic Idea

I created a spec.prompt.md file. This prompt accepts a ticket number and the pasted contents of the technical specifications. Then I run a command like:

/spec <ticket-number> <pasted-contents>

Originally, each time I ran /spec, it would overwrite everything and start from scratch. That worked, but it didn’t allow me to iterate or compare changes between passes.

So I added two modes:

-o = overwrite

-i = iterate

The Iterate Model

When I run:

/spec -i 1234 <contents>
Enter fullscreen mode Exit fullscreen mode

It creates a folder structure like this:

/specs/1234/
    p01/
        spec.md
        plan.md
        implementation.md
        files...
    p02/
        spec.md
        plan.md
        implementation.md
        files...
    p03/
        ...
Enter fullscreen mode Exit fullscreen mode

The original intent was to provide myself a comparison between p01 vs p02 to see if changes to the spec were implemented correctly but something unexpected happened.

For my first test of -i, I didn't change the spec at all. I just ran the same spec again:

/spec -i 1234 <original contents>
Enter fullscreen mode Exit fullscreen mode

I expected it to regenerate everything so I could compare Pass 1 to Pass 2. Instead, Pass 2 did something even smarter.

When reviewing Pass 2, I couldn't find the original build scripts and the bulk of the work was missing. I was confused why I only found a single file.

Then, it clicked: it had reviewed its previous pass to identify any gaps in the duplicated spec I provided, determined it was mostly correct, and refined its original pass.

That’s when this stopped being just a prompt and started looking like an agent. I didn’t tell it to rewrite everything. I didn’t tell it to only fix what was wrong. I didn’t tell it to review its previous implementation. It chose to.

Its self-refinement process made me rethink my own process.

The Process

Currently I have these prompts:

  • spec.prompt.md
  • spec-implement.prompt.md
  • spec-testing.prompt.md
  • using backend-engineer agent

Right now, I manually run each step after the previous one completes. But the long-term goal is for the agent to understand the workflow and run the steps itself.

Pass Workflow

Pass 1 — Spec → Plan → Implement

/spec 1234 <contents from Jira>
Enter fullscreen mode Exit fullscreen mode

The agent:

  • Reads the ticket
  • Writes the spec
  • Creates a plan
  • Implements the code
  • Saves everything into p01

Pass 2 — Self-Refinement

/spec -i 1234
Enter fullscreen mode Exit fullscreen mode

The agent:

  • Re-reads the spec
  • Reviews Pass 1
  • Fixes gaps
  • Improves implementation
  • Adds missing tests
  • Refactors if needed
  • Saves into p02

This becomes a self-refinement loop:

Implement → Review → Refine → Review → Refine
Enter fullscreen mode Exit fullscreen mode

The agent continues iterating until it believes the implementation matches the spec.

Pass 3 — Spec Updates from Engineer

After reviewing, the engineer may realize:

  • Something was unclear
  • Requirements changed
  • Edge cases were missed
  • Naming should be improved
  • Logic should be handled differently

The engineer updates the spec, then runs:

/spec -u 1234 <updated contents>
Enter fullscreen mode Exit fullscreen mode

Now the agent:

  • Compares original spec vs updated spec
  • Compares p01 vs p02 vs p03 etc.
  • Determines what changed in the spec and how that changes the code
  • Implements only what’s different, not everything.
  • Creates a new pass

This becomes iterative spec-driven development.

Final Phase — Testing Instructions

When the engineer believes the implementation is ready for validation:

/spec-testing 1234
Enter fullscreen mode Exit fullscreen mode

The agent then:

  • Reviews the latest pass
  • Identifies all changed files
  • Provides test scenarios to validate the changes

At that point, the engineer tests — not writes everything from scratch.

Expect the Unexpected and "Just Keep Swimming"

The interesting part of this experiment wasn’t that the agent wrote code. It was that it started reviewing, refining, and improving its own work in passes — the same way a developer does.

I didn’t set out to build this agent or this workflow. I set out to write a better prompt. Then I wanted it to do a little more, and a little more. Somewhere along the way, the prompt turned into a process.

This isn’t just prompt engineering anymore.
It’s process engineering.

Top comments (0)