<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Oleg Timkiv</title>
    <description>The latest articles on DEV Community by Oleg Timkiv (@oleg_timkiv_a28d9ab1866c5).</description>
    <link>https://dev.to/oleg_timkiv_a28d9ab1866c5</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3971436%2Fc16b9b0b-060a-411b-9c2a-09aff9806c60.jpg</url>
      <title>DEV Community: Oleg Timkiv</title>
      <link>https://dev.to/oleg_timkiv_a28d9ab1866c5</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oleg_timkiv_a28d9ab1866c5"/>
    <language>en</language>
    <item>
      <title>My Experiment with Specification-Driven Development Using AI Agents</title>
      <dc:creator>Oleg Timkiv</dc:creator>
      <pubDate>Sat, 06 Jun 2026 15:28:56 +0000</pubDate>
      <link>https://dev.to/oleg_timkiv_a28d9ab1866c5/my-experiment-with-specification-driven-development-using-ai-agents-6j7</link>
      <guid>https://dev.to/oleg_timkiv_a28d9ab1866c5/my-experiment-with-specification-driven-development-using-ai-agents-6j7</guid>
      <description>&lt;p&gt;In most development projects, the pattern is painfully familiar: business drops half a page of vague requirements, and within a day they’re asking, “Is it done yet?” Between those two points lies the developer’s real job - clarifying, formalizing, negotiating, and only then writing code. Over the past couple of years, this process has evolved more dramatically than the underlying technologies themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Expanding Role of Developers in 2026
&lt;/h2&gt;

&lt;p&gt;Today’s developer is no longer just a coder. The role now includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyzing business requirements&lt;/li&gt;
&lt;li&gt;Translating them into technical specifications&lt;/li&gt;
&lt;li&gt;Collaborating with AI tools as capable collaborators&lt;/li&gt;
&lt;li&gt;Maintaining architectural integrity and alignment with business goals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Classic roles - analysts, architects, QA - haven’t disappeared. Instead, boundaries are blurring and responsibilities are shifting. In smaller teams especially, developers increasingly act as the bridge between business, architecture, and implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Typical Task
&lt;/h2&gt;

&lt;p&gt;Here’s a representative request from the business or support team:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“We need to implement book order placement. There should be a service that accepts orders and checks stock availability.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As usual, the description lacks critical details: contracts, resilience requirements, performance constraints, error handling, and so on. That’s expected - this is just the starting point. The real work begins with requirement clarification.&lt;/p&gt;

&lt;p&gt;The core idea I wanted to test in this experiment: &lt;strong&gt;the first artifact of development should no longer be code, but a specification&lt;/strong&gt;. Code becomes the &lt;em&gt;consequence&lt;/em&gt; of an approved specification, not the first step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Specification-Driven Development (SDD)
&lt;/h2&gt;

&lt;p&gt;The vision was a Specification-Driven Development layer with these core concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Commands&lt;/strong&gt; - define the process flow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents&lt;/strong&gt; - represent thinking and decision-making at each stage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; - encode engineering best practices and coding standards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spec&lt;/strong&gt; - the single source of truth for the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ideal pipeline would look like a chain of agent-driven stages:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/clarify → /analyze → /acceptance → /constrain → /generate-plan → /develop → /review&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In practice, I didn’t run the full ambitious version. What I actually implemented was significantly simpler - and more effective.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Built: Three Phases with Approval Gates
&lt;/h2&gt;

&lt;p&gt;Instead of many specialized agents, I reduced the system to a single &lt;code&gt;/flow&lt;/code&gt; command and a strict linear process with three phases and mandatory human approval gates between them:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Artifact&lt;/th&gt;
&lt;th&gt;Depends On&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Analysis&lt;/td&gt;
&lt;td&gt;&lt;code&gt;01-analysis.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Design&lt;/td&gt;
&lt;td&gt;&lt;code&gt;02-design.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Analysis = APPROVED&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Implementation&lt;/td&gt;
&lt;td&gt;Implementation code&lt;/td&gt;
&lt;td&gt;Design = APPROVED&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Key rules that proved more important than the number of agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A phase cannot start until its dependency is &lt;code&gt;APPROVED&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;State is stored in the front matter of each artifact (single source of truth)&lt;/li&gt;
&lt;li&gt;Every state transition is logged in an append-only audit log&lt;/li&gt;
&lt;li&gt;If an approved artifact changes, all dependent artifacts are automatically marked &lt;code&gt;STALE&lt;/code&gt; and require re-review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each phase uses a role-based agent (analyst → architect → developer), but the final decision to advance remains with the human developer.&lt;/p&gt;

&lt;p&gt;The flow looks like this:&lt;br&gt;
/flow → generates .spec/01-analysis.md → STOP (waiting for approval)&lt;br&gt;
→ Developer: approve / reject with comments&lt;br&gt;
→ generates .spec/02-design.md → STOP&lt;br&gt;
→ generates implementation → STOP&lt;br&gt;
textThis hard &lt;code&gt;STOP&lt;/code&gt; at each gate is what keeps the prototype safe: the AI cannot race from vague requirements straight to production code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Run: Where the Gates Proved Their Value
&lt;/h2&gt;

&lt;p&gt;The process did &lt;strong&gt;not&lt;/strong&gt; succeed on the first attempt - and that was exactly the point.&lt;/p&gt;

&lt;p&gt;The Analysis phase was rejected during review. The feedback was architecturally sound:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove technical implementation details (EF Core, PostgreSQL, InMemory, HttpClientFactory, specific timeouts) from the analysis - it should focus on “what,” not “how”&lt;/li&gt;
&lt;li&gt;Add a &lt;strong&gt;Domain Entities&lt;/strong&gt; section&lt;/li&gt;
&lt;li&gt;Add a &lt;strong&gt;Use Cases&lt;/strong&gt; section&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After revisions (reject → revise → submit), version 2 of the analysis passed review. The entire history was preserved in the audit log:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;01-analysis v1&lt;/code&gt; | READY_FOR_REVIEW → REJECTED | remove arch decisions; add entities and use cases; fix error format&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;01-analysis v2&lt;/code&gt; | REJECTED → DRAFT → READY_FOR_REVIEW → APPROVED&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was the clearest win of the approach: we caught architectural issues at the documentation level, before writing a single line of code. Fixing a section of the spec takes minutes. Rewriting a service built on flawed assumptions takes hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Was Generated
&lt;/h2&gt;

&lt;p&gt;Once both Analysis and Design were approved, the AI generated two clean ASP.NET Core 9 services following Clean Architecture. The solution compiled with zero errors and zero warnings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Initial Vision Diverged from Reality
&lt;/h2&gt;

&lt;p&gt;This is perhaps the most valuable part for readers - where the ambitious prompt list met real constraints:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Intended in Prompt&lt;/th&gt;
&lt;th&gt;What Was Actually Built&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Minimal APIs&lt;/td&gt;
&lt;td&gt;Controllers&lt;/td&gt;
&lt;td&gt;Explicit layer boundaries in Clean Architecture (documented in Design)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Refit client&lt;/td&gt;
&lt;td&gt;Typed HttpClientFactory&lt;/td&gt;
&lt;td&gt;Better control over timeouts and error handling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serilog + OpenTelemetry&lt;/td&gt;
&lt;td&gt;Serilog only&lt;/td&gt;
&lt;td&gt;OpenTelemetry deferred to avoid bloating the first iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAPI 3.1&lt;/td&gt;
&lt;td&gt;Swashbuckle (standard)&lt;/td&gt;
&lt;td&gt;Sufficient for the prototype&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Lesson&lt;/strong&gt;: A wish list in a prompt is not architecture. Real decisions must be captured and justified in the Design document. Some items were intentionally dropped (e.g., events were out of scope), others deferred. It was much better to see these trade-offs in the specification than discover them during implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Still Needs Work
&lt;/h2&gt;

&lt;p&gt;Several important areas remain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Verifiable Behavior&lt;/strong&gt; - Add unit tests (e.g., &lt;code&gt;MockStockPolicy&lt;/code&gt;, &lt;code&gt;PlaceOrderHandler&lt;/code&gt;, validators) and integration tests via &lt;code&gt;WebApplicationFactory&lt;/code&gt; (happy path, 400 errors, 503 on stock service failure). Without green tests, acceptance criteria remain just words.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt; - Introduce OpenTelemetry traces for the Order → Stock service call chain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Process Engine&lt;/strong&gt; - Currently, approvals and &lt;code&gt;STALE&lt;/code&gt; cascading are manual and convention-based. The next step is to build a real orchestrator for gates and audit logging.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The value came not from a swarm of agents, but from a strict state machine with gates and an audit log. Three phases + hard stops + audit trail proved more powerful than five roles and eight commands.&lt;/li&gt;
&lt;li&gt;Specification-level reviews catch errors cheaply - the rejected analysis phase demonstrated this beautifully.&lt;/li&gt;
&lt;li&gt;Technology wish lists from prompts are not architectural decisions. They must be validated and documented in the Design artifact.&lt;/li&gt;
&lt;li&gt;Verifiable behavior remains king: if you can’t test it, you don’t have it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Important Context
&lt;/h2&gt;

&lt;p&gt;This is not a mature industry standard - it’s an experimental workflow for leveraging AI in development. The goal is to understand where this spec-centric pipeline accelerates delivery, where it adds unnecessary overhead, and how to refine it without sacrificing quality.&lt;/p&gt;

&lt;p&gt;I used &lt;strong&gt;Claude Opus 4.8&lt;/strong&gt; with a &lt;code&gt;/flow&lt;/code&gt; command and a detailed task description for two services: Order Service and Stock Service with specific integration and mocked stock logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next
&lt;/h2&gt;

&lt;p&gt;This was the first experiment validating a shift from code-centric to specification-centric development. Future explorations will examine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The line between useful specification and bureaucracy&lt;/li&gt;
&lt;li&gt;Automating the &lt;code&gt;/flow&lt;/code&gt; process toward full CI/CD integration&lt;/li&gt;
&lt;li&gt;Tighter coupling between &lt;code&gt;.spec&lt;/code&gt; files, Git, and actual code artifacts&lt;/li&gt;
&lt;li&gt;Scaling the model to team environments&lt;/li&gt;
&lt;li&gt;Failure modes under high-velocity development&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can explore the full codebase here: &lt;a href="https://github.com/zig8953/BBShop" rel="noopener noreferrer"&gt;https://github.com/zig8953/BBShop&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’d love to hear your thoughts - especially if you’ve experimented with similar spec-first or AI-augmented workflows.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>career</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
