Oleg Timkiv

Posted on Jun 6

My Experiment with Specification-Driven Development Using AI Agents

#ai #agents #productivity #career

In most development projects, the pattern is painfully familiar: business drops half a page of vague requirements, and within a day they’re asking, “Is it done yet?” Between those two points lies the developer’s real job - clarifying, formalizing, negotiating, and only then writing code. Over the past couple of years, this process has evolved more dramatically than the underlying technologies themselves.

The Expanding Role of Developers in 2026

Today’s developer is no longer just a coder. The role now includes:

Analyzing business requirements
Translating them into technical specifications
Collaborating with AI tools as capable collaborators
Maintaining architectural integrity and alignment with business goals

Classic roles - analysts, architects, QA - haven’t disappeared. Instead, boundaries are blurring and responsibilities are shifting. In smaller teams especially, developers increasingly act as the bridge between business, architecture, and implementation.

A Typical Task

Here’s a representative request from the business or support team:

“We need to implement book order placement. There should be a service that accepts orders and checks stock availability.”

As usual, the description lacks critical details: contracts, resilience requirements, performance constraints, error handling, and so on. That’s expected - this is just the starting point. The real work begins with requirement clarification.

The core idea I wanted to test in this experiment: the first artifact of development should no longer be code, but a specification. Code becomes the consequence of an approved specification, not the first step.

Specification-Driven Development (SDD)

The vision was a Specification-Driven Development layer with these core concepts:

Commands - define the process flow
Agents - represent thinking and decision-making at each stage
Skills - encode engineering best practices and coding standards
Spec - the single source of truth for the system

The ideal pipeline would look like a chain of agent-driven stages:

/clarify → /analyze → /acceptance → /constrain → /generate-plan → /develop → /review

In practice, I didn’t run the full ambitious version. What I actually implemented was significantly simpler - and more effective.

What I Actually Built: Three Phases with Approval Gates

Instead of many specialized agents, I reduced the system to a single /flow command and a strict linear process with three phases and mandatory human approval gates between them:

Phase	Artifact	Depends On
Analysis	`01-analysis.md`	-
Design	`02-design.md`	Analysis = APPROVED
Implementation	Implementation code	Design = APPROVED

Key rules that proved more important than the number of agents:

A phase cannot start until its dependency is APPROVED
State is stored in the front matter of each artifact (single source of truth)
Every state transition is logged in an append-only audit log
If an approved artifact changes, all dependent artifacts are automatically marked STALE and require re-review

Each phase uses a role-based agent (analyst → architect → developer), but the final decision to advance remains with the human developer.

The flow looks like this:
/flow → generates .spec/01-analysis.md → STOP (waiting for approval)
→ Developer: approve / reject with comments
→ generates .spec/02-design.md → STOP
→ generates implementation → STOP
textThis hard STOP at each gate is what keeps the prototype safe: the AI cannot race from vague requirements straight to production code.

The Real Run: Where the Gates Proved Their Value

The process did not succeed on the first attempt - and that was exactly the point.

The Analysis phase was rejected during review. The feedback was architecturally sound:

Remove technical implementation details (EF Core, PostgreSQL, InMemory, HttpClientFactory, specific timeouts) from the analysis - it should focus on “what,” not “how”
Add a Domain Entities section
Add a Use Cases section

After revisions (reject → revise → submit), version 2 of the analysis passed review. The entire history was preserved in the audit log:

01-analysis v1 | READY_FOR_REVIEW → REJECTED | remove arch decisions; add entities and use cases; fix error format
01-analysis v2 | REJECTED → DRAFT → READY_FOR_REVIEW → APPROVED

This was the clearest win of the approach: we caught architectural issues at the documentation level, before writing a single line of code. Fixing a section of the spec takes minutes. Rewriting a service built on flawed assumptions takes hours.

What Was Generated

Once both Analysis and Design were approved, the AI generated two clean ASP.NET Core 9 services following Clean Architecture. The solution compiled with zero errors and zero warnings.

Where the Initial Vision Diverged from Reality

This is perhaps the most valuable part for readers - where the ambitious prompt list met real constraints:

Intended in Prompt	What Was Actually Built	Reason
Minimal APIs	Controllers	Explicit layer boundaries in Clean Architecture (documented in Design)
Refit client	Typed HttpClientFactory	Better control over timeouts and error handling
Serilog + OpenTelemetry	Serilog only	OpenTelemetry deferred to avoid bloating the first iteration
OpenAPI 3.1	Swashbuckle (standard)	Sufficient for the prototype

Lesson: A wish list in a prompt is not architecture. Real decisions must be captured and justified in the Design document. Some items were intentionally dropped (e.g., events were out of scope), others deferred. It was much better to see these trade-offs in the specification than discover them during implementation.

What Still Needs Work

Several important areas remain:

Verifiable Behavior - Add unit tests (e.g., MockStockPolicy, PlaceOrderHandler, validators) and integration tests via WebApplicationFactory (happy path, 400 errors, 503 on stock service failure). Without green tests, acceptance criteria remain just words.
Observability - Introduce OpenTelemetry traces for the Order → Stock service call chain.
Process Engine - Currently, approvals and STALE cascading are manual and convention-based. The next step is to build a real orchestrator for gates and audit logging.

Key Takeaways

The value came not from a swarm of agents, but from a strict state machine with gates and an audit log. Three phases + hard stops + audit trail proved more powerful than five roles and eight commands.
Specification-level reviews catch errors cheaply - the rejected analysis phase demonstrated this beautifully.
Technology wish lists from prompts are not architectural decisions. They must be validated and documented in the Design artifact.
Verifiable behavior remains king: if you can’t test it, you don’t have it.

Important Context

This is not a mature industry standard - it’s an experimental workflow for leveraging AI in development. The goal is to understand where this spec-centric pipeline accelerates delivery, where it adds unnecessary overhead, and how to refine it without sacrificing quality.

I used Claude Opus 4.8 with a /flow command and a detailed task description for two services: Order Service and Stock Service with specific integration and mocked stock logic.

What’s Next

This was the first experiment validating a shift from code-centric to specification-centric development. Future explorations will examine:

The line between useful specification and bureaucracy
Automating the /flow process toward full CI/CD integration
Tighter coupling between .spec files, Git, and actual code artifacts
Scaling the model to team environments
Failure modes under high-velocity development

You can explore the full codebase here: https://github.com/zig8953/BBShop

I’d love to hear your thoughts - especially if you’ve experimented with similar spec-first or AI-augmented workflows.

DEV Community