<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nirsa</title>
    <description>The latest articles on DEV Community by Nirsa (@nirsa_aa5eea4f3dcf42a8e14).</description>
    <link>https://dev.to/nirsa_aa5eea4f3dcf42a8e14</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3920140%2F63cdd6cd-6997-4db6-9fee-032d076945e1.png</url>
      <title>DEV Community: Nirsa</title>
      <link>https://dev.to/nirsa_aa5eea4f3dcf42a8e14</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nirsa_aa5eea4f3dcf42a8e14"/>
    <language>en</language>
    <item>
      <title>Why AI coding agents fail with incomplete specs</title>
      <dc:creator>Nirsa</dc:creator>
      <pubDate>Tue, 19 May 2026 07:53:15 +0000</pubDate>
      <link>https://dev.to/nirsa_aa5eea4f3dcf42a8e14/why-ai-coding-agents-fail-with-incomplete-specs-3e74</link>
      <guid>https://dev.to/nirsa_aa5eea4f3dcf42a8e14/why-ai-coding-agents-fail-with-incomplete-specs-3e74</guid>
      <description>&lt;p&gt;AI coding agents like Codex and Claude Code are getting surprisingly good at writing code.&lt;/p&gt;

&lt;p&gt;But after using them in real projects, I noticed something:&lt;/p&gt;

&lt;p&gt;Most failures were not caused by the model.&lt;/p&gt;

&lt;p&gt;They were caused by incomplete specs.&lt;/p&gt;

&lt;p&gt;When a specification has gaps, the AI fills them in with plausible assumptions. At first the generated code often looks correct, but over time the implementation slowly drifts away from the intended behavior.&lt;/p&gt;

&lt;p&gt;I kept running into issues like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missing auth boundaries&lt;/li&gt;
&lt;li&gt;unclear tenant ownership rules&lt;/li&gt;
&lt;li&gt;retry and race-condition problems&lt;/li&gt;
&lt;li&gt;webhook duplication edge cases&lt;/li&gt;
&lt;li&gt;requirements enforced only on the client side&lt;/li&gt;
&lt;li&gt;implementation drift between spec and actual behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Eventually it becomes difficult to tell whether the bug is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;in the implementation,&lt;/li&gt;
&lt;li&gt;in the specification,&lt;/li&gt;
&lt;li&gt;or in the original requirement itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A Real Failure Case
&lt;/h2&gt;

&lt;p&gt;One issue I repeatedly saw was missing tenant ownership validation.&lt;/p&gt;

&lt;p&gt;A spec would describe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;authentication&lt;/li&gt;
&lt;li&gt;API structure&lt;/li&gt;
&lt;li&gt;expected responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;but never explicitly define ownership constraints.&lt;/p&gt;

&lt;p&gt;The AI agent would generate code that correctly authenticated the user, but still allowed cross-tenant access because ownership validation was never part of the specification.&lt;/p&gt;

&lt;p&gt;The implementation looked reasonable at first glance.&lt;/p&gt;

&lt;p&gt;But the security boundary itself was undefined.&lt;/p&gt;

&lt;p&gt;That was the moment I realized the problem was often not "bad code generation."&lt;/p&gt;

&lt;p&gt;It was ambiguous requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;I started building an open-source tool called &lt;strong&gt;SpecGuard&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The goal is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Review requirements before they become input to an AI coding agent.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Instead of reviewing generated code after implementation, SpecGuard tries to catch ambiguous or incomplete requirements earlier in the workflow.&lt;/p&gt;

&lt;p&gt;This is heavily inspired by problems I encountered while experimenting with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI-assisted development&lt;/li&gt;
&lt;li&gt;LLM coding agents&lt;/li&gt;
&lt;li&gt;spec-driven development workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Intended Workflow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write spec
    ↓
Run SpecGuard
    ↓
Fix NOT_READY findings
    ↓
Hand spec to AI coding agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  What SpecGuard Checks
&lt;/h2&gt;

&lt;p&gt;
  Main validation areas
  &lt;p&gt;SpecGuard mainly looks for ambiguous or missing areas such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;auth and permission boundaries&lt;/li&gt;
&lt;li&gt;tenant ownership rules&lt;/li&gt;
&lt;li&gt;idempotency and replay safety&lt;/li&gt;
&lt;li&gt;race conditions&lt;/li&gt;
&lt;li&gt;expiration and revocation handling&lt;/li&gt;
&lt;li&gt;state transitions&lt;/li&gt;
&lt;li&gt;webhook/background retry behavior&lt;/li&gt;
&lt;li&gt;requirements relying only on client-side validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output is one of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;READY&lt;/li&gt;
&lt;li&gt;READY_WITH_WARNINGS&lt;/li&gt;
&lt;li&gt;NOT_READY&lt;/li&gt;
&lt;/ul&gt;



&lt;/p&gt;
&lt;h2&gt;
  
  
  Why the Default Mode Does Not Use an LLM
&lt;/h2&gt;

&lt;p&gt;I intentionally made the default path non-LLM.&lt;/p&gt;

&lt;p&gt;I wanted spec validation to behave more like linting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deterministic&lt;/li&gt;
&lt;li&gt;reproducible&lt;/li&gt;
&lt;li&gt;CI-friendly&lt;/li&gt;
&lt;li&gt;cheap to run repeatedly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM-based review exists as an optional deeper layer, not the foundation.&lt;/p&gt;

&lt;p&gt;There is also an optional OpenAI/Codex-based deeper review mode, but currently I treat that as a secondary layer rather than the default workflow.&lt;/p&gt;
&lt;h2&gt;
  
  
  Codex Plugin
&lt;/h2&gt;

&lt;p&gt;In v0.4.0 I added an MVP Codex plugin.&lt;/p&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;spec-guard
specguard &lt;span class="nt"&gt;--help&lt;/span&gt;

codex plugin marketplace add KoreaNirsa/spec-guard &lt;span class="nt"&gt;--ref&lt;/span&gt; main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Create an example spec package:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;specguard example copy specs/your-feature-name &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Inside Codex, the plugin can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run SpecGuard analysis&lt;/li&gt;
&lt;li&gt;read generated results&lt;/li&gt;
&lt;li&gt;summarize READY/NOT_READY state&lt;/li&gt;
&lt;li&gt;explain main findings and next actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The plugin itself does not reimplement the engine.&lt;/p&gt;

&lt;p&gt;It wraps the existing CLI workflow.&lt;/p&gt;
&lt;h2&gt;
  
  
  GitHub PR Review Workflow
&lt;/h2&gt;

&lt;p&gt;SpecGuard also includes a GitHub Actions-based PR review workflow.&lt;/p&gt;

&lt;p&gt;When a spec package changes in a PR, it can automatically run SpecGuard Review and leave findings directly on the PR.&lt;/p&gt;

&lt;p&gt;The OpenAI review path currently uses GitHub secrets such as:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SPECGUARD_OPENAI_API_KEY
SPECGUARD_PR_REVIEW_MODEL
SPECGUARD_REVIEW_SPEC_PATHS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Current Status
&lt;/h2&gt;

&lt;p&gt;This project is still very early and pre-beta.&lt;/p&gt;

&lt;p&gt;I do not expect it to perfectly judge every specification.&lt;/p&gt;

&lt;p&gt;Right now I am mainly interested in feedback around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what kinds of specs this workflow fits well&lt;/li&gt;
&lt;li&gt;where deterministic checks break down&lt;/li&gt;
&lt;li&gt;which findings feel too noisy or too weak&lt;/li&gt;
&lt;li&gt;whether PR enforcement would fit real engineering workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are already using AI coding agents in production workflows, I’d genuinely like to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what kinds of spec failures you see most often&lt;/li&gt;
&lt;li&gt;where deterministic validation breaks down&lt;/li&gt;
&lt;li&gt;and whether something like this would actually fit your development workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m especially interested in situations where the generated implementation looked correct, but the requirement itself was underspecified.&lt;/p&gt;

&lt;p&gt;Feedback, issues, and PRs are all welcome.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/KoreaNirsa" rel="noopener noreferrer"&gt;
        KoreaNirsa
      &lt;/a&gt; / &lt;a href="https://github.com/KoreaNirsa/spec-guard" rel="noopener noreferrer"&gt;
        spec-guard
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Validation-First Workflow (VFW) for AI-assisted development
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/KoreaNirsa/spec-guard/assets/spec_guard_banner.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FKoreaNirsa%2Fspec-guard%2FHEAD%2Fassets%2Fspec_guard_banner.png" alt="SpecGuard banner"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;SpecGuard&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;SpecGuard blocks weak specs before AI coding agents turn them into defective code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SpecGuard is a Validation-First Workflow (VFW) for AI-assisted development
It turns specs into reviewed, testable, implementation-ready packages before AI coding begins.&lt;/p&gt;
&lt;p&gt;It is not a prompt-to-code generator. SpecGuard helps you prepare an approved spec package before an external Codex, Claude Code, or another coding agent writes application code.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Demo Video&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/KoreaNirsa/spec-guard/assets/specguard-demo-v0.4.0.gif"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FKoreaNirsa%2Fspec-guard%2FHEAD%2Fassets%2Fspecguard-demo-v0.4.0.gif" alt="SpecGuard demo walkthrough"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/KoreaNirsa/spec-guard/assets/specguard-demo-v0.4.0.mp4" rel="noopener noreferrer"&gt;Watch the full-resolution MP4 demo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The demo follows this flow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install SpecGuard with &lt;code&gt;pip install spec-guard&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Copy the example spec with &lt;code&gt;specguard example copy your-feature-name --force&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Insert a vulnerable spec. In v0.3.0, the packaged example intentionally includes a vulnerable spec by default so users can see a blocking SpecGuard Review.&lt;/li&gt;
&lt;li&gt;Review the SpecGuard findings.&lt;/li&gt;
&lt;li&gt;Fix the weak areas directly, or ask an AI assistant to strengthen the spec by giving it the SpecGuard Review findings.&lt;/li&gt;
&lt;li&gt;Run SpecGuard Review again and confirm it reaches READY…&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/KoreaNirsa/spec-guard" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devtools</category>
      <category>github</category>
    </item>
    <item>
      <title>The problem wasn't that the AI wrote bad code — weak specs caused unstable implementations</title>
      <dc:creator>Nirsa</dc:creator>
      <pubDate>Fri, 08 May 2026 13:02:31 +0000</pubDate>
      <link>https://dev.to/nirsa_aa5eea4f3dcf42a8e14/the-problem-wasnt-that-the-ai-wrote-bad-code-weak-specs-caused-unstable-implementations-j0p</link>
      <guid>https://dev.to/nirsa_aa5eea4f3dcf42a8e14/the-problem-wasnt-that-the-ai-wrote-bad-code-weak-specs-caused-unstable-implementations-j0p</guid>
      <description>&lt;p&gt;Recently I’ve been experimenting a lot with AI-assisted development workflows using tools like Codex and Claude Code.&lt;/p&gt;

&lt;p&gt;At first, I assumed most implementation failures came from the AI itself.&lt;/p&gt;

&lt;p&gt;But after repeatedly testing spec-driven workflows, I noticed something different:&lt;/p&gt;

&lt;p&gt;The problem wasn't that the AI wrote bad code. The problem was that weak specs caused unstable implementations.&lt;/p&gt;

&lt;p&gt;Ambiguous requirements often led to:&lt;/p&gt;

&lt;p&gt;unstable architecture inconsistent contracts missing ownership boundaries unsafe delete/update behavior implementation drift features expanding outside original intent &lt;/p&gt;

&lt;p&gt;In many cases, the AI was actually trying to follow the provided specification. The issue was that the specification itself was incomplete, unsafe, or unclear.&lt;/p&gt;

&lt;p&gt;That led me to start experimenting with what I’ve been calling&lt;/p&gt;

&lt;p&gt;VFW (Validation First Workflow)&lt;/p&gt;

&lt;p&gt;The core idea is simple&lt;br&gt;
Before AI coding starts, validate whether the specification is actually implementation-ready.&lt;/p&gt;

&lt;p&gt;As part of that experiment, I started building a small OSS project called SpecGuard&lt;br&gt;
&lt;a href="https://github.com/KoreaNirsa/spec-guard" rel="noopener noreferrer"&gt;https://github.com/KoreaNirsa/spec-guard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;SpecGuard is not a code generator.&lt;/p&gt;

&lt;p&gt;Instead, it acts more like a validation-first guard layer for spec-driven / AI-assisted development workflows.&lt;/p&gt;

&lt;p&gt;Current v0.3.0 supports things like&lt;br&gt;
readiness review for spec packages Critical / Major / Minor findings low review mode implementation handoff artifacts experimental PR drift review heuristic-first review flow &lt;/p&gt;

&lt;p&gt;Typical workflow&lt;br&gt;
Discovery → spec.md → technical-design.md → SpecGuard Review → readiness validation → implementation handoff → external coding agent → Pull Request → SpecGuard PR review&lt;/p&gt;

&lt;p&gt;The project is still very experimental and immature in many areas.&lt;/p&gt;

&lt;p&gt;Known limitations&lt;br&gt;
heuristic false positives / false negatives limited benchmark coverage small real-world validation set review calibration still evolving UX/docs still rough &lt;/p&gt;

&lt;p&gt;Right now this is still much closer to a demo-stage OSS project than a mature production tool.&lt;/p&gt;

&lt;p&gt;But I’d like to continue evolving it toward something practical enough for real engineering workflows.&lt;/p&gt;

&lt;p&gt;I’m especially interested in exploring&lt;br&gt;
Spec-Driven Development validation-first workflows contract validation AI-assisted engineering PR review automation CI/CD validation gates harness/evaluation engineering &lt;/p&gt;

&lt;p&gt;Feedback and contributors are very welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
