Chris

Posted on Mar 22

Self-Refining Agents in Spec-Driven Development

#agents #agentaichallenge #ai

I’ve been experimenting with a spec-driven workflow, and I accidentally discovered something I didn’t expect: the agent started reviewing and improving its own work.

What I discovered is not new in terms of agentic AI; it's the point of agentic AI, but how I stumbled across it in my test was interesting nonetheless.

The Basic Idea

I created a spec.prompt.md file. This prompt accepts a ticket number and the pasted contents of the technical specifications. Then I run a command like:

/spec <ticket-number> <pasted-contents>

Originally, each time I ran /spec, it would overwrite everything and start from scratch. That worked, but it didn’t allow me to iterate or compare changes between passes.

So I added two modes:

-o = overwrite

-i = iterate

The Iterate Model

When I run:

/spec -i 1234 <contents>

It creates a folder structure like this:

/specs/1234/
    p01/
        spec.md
        plan.md
        implementation.md
        files...
    p02/
        spec.md
        plan.md
        implementation.md
        files...
    p03/
        ...

The original intent was to provide myself a comparison between p01 vs p02 to see if changes to the spec were implemented correctly but something unexpected happened.

For my first test of -i, I didn't change the spec at all. I just ran the same spec again:

/spec -i 1234 <original contents>

I expected it to regenerate everything so I could compare Pass 1 to Pass 2. Instead, Pass 2 did something even smarter.

When reviewing Pass 2, I couldn't find the original build scripts and the bulk of the work was missing. I was confused why I only found a single file.

Then, it clicked: it had reviewed its previous pass to identify any gaps in the duplicated spec I provided, determined it was mostly correct, and refined its original pass.

That’s when this stopped being just a prompt and started looking like an agent. I didn’t tell it to rewrite everything. I didn’t tell it to only fix what was wrong. I didn’t tell it to review its previous implementation. It chose to.

Its self-refinement process made me rethink my own process.

The Process

Currently I have these prompts:

spec.prompt.md
spec-implement.prompt.md
spec-testing.prompt.md
using backend-engineer agent

Right now, I manually run each step after the previous one completes. But the long-term goal is for the agent to understand the workflow and run the steps itself.

Pass Workflow

Pass 1 — Spec → Plan → Implement

/spec 1234 <contents from Jira>

The agent:

Reads the ticket
Writes the spec
Creates a plan
Implements the code
Saves everything into p01

Pass 2 — Self-Refinement

/spec -i 1234

The agent:

Re-reads the spec
Reviews Pass 1
Fixes gaps
Improves implementation
Adds missing tests
Refactors if needed
Saves into p02

This becomes a self-refinement loop:

Implement → Review → Refine → Review → Refine

The agent continues iterating until it believes the implementation matches the spec.

Pass 3 — Spec Updates from Engineer

After reviewing, the engineer may realize:

Something was unclear
Requirements changed
Edge cases were missed
Naming should be improved
Logic should be handled differently

The engineer updates the spec, then runs:

/spec -u 1234 <updated contents>

Now the agent:

Compares original spec vs updated spec
Compares p01 vs p02 vs p03 etc.
Determines what changed in the spec and how that changes the code
Implements only what’s different, not everything.
Creates a new pass

This becomes iterative spec-driven development.

Final Phase — Testing Instructions

When the engineer believes the implementation is ready for validation:

/spec-testing 1234

The agent then:

Reviews the latest pass
Identifies all changed files
Provides test scenarios to validate the changes

At that point, the engineer tests — not writes everything from scratch.

Expect the Unexpected and "Just Keep Swimming"

The interesting part of this experiment wasn’t that the agent wrote code. It was that it started reviewing, refining, and improving its own work in passes — the same way a developer does.

I didn’t set out to build this agent or this workflow. I set out to write a better prompt. Then I wanted it to do a little more, and a little more. Somewhere along the way, the prompt turned into a process.

This isn’t just prompt engineering anymore.
It’s process engineering.

DEV Community