DEV Community

Cover image for Why AI coding agents fail with incomplete specs
Nirsa
Nirsa

Posted on

Why AI coding agents fail with incomplete specs

AI coding agents like Codex and Claude Code are getting surprisingly good at writing code.

But after using them in real projects, I noticed something:

Most failures were not caused by the model.

They were caused by incomplete specs.

When a specification has gaps, the AI fills them in with plausible assumptions. At first the generated code often looks correct, but over time the implementation slowly drifts away from the intended behavior.

I kept running into issues like:

  • missing auth boundaries
  • unclear tenant ownership rules
  • retry and race-condition problems
  • webhook duplication edge cases
  • requirements enforced only on the client side
  • implementation drift between spec and actual behavior

Eventually it becomes difficult to tell whether the bug is:

  • in the implementation,
  • in the specification,
  • or in the original requirement itself.

A Real Failure Case

One issue I repeatedly saw was missing tenant ownership validation.

A spec would describe:

  • authentication
  • API structure
  • expected responses

but never explicitly define ownership constraints.

The AI agent would generate code that correctly authenticated the user, but still allowed cross-tenant access because ownership validation was never part of the specification.

The implementation looked reasonable at first glance.

But the security boundary itself was undefined.

That was the moment I realized the problem was often not "bad code generation."

It was ambiguous requirements.

The Idea

I started building an open-source tool called SpecGuard.

The goal is simple:

Review requirements before they become input to an AI coding agent.

Instead of reviewing generated code after implementation, SpecGuard tries to catch ambiguous or incomplete requirements earlier in the workflow.

This is heavily inspired by problems I encountered while experimenting with:

  • AI-assisted development
  • LLM coding agents
  • spec-driven development workflows

Intended Workflow

Write spec
    ↓
Run SpecGuard
    ↓
Fix NOT_READY findings
    ↓
Hand spec to AI coding agent
Enter fullscreen mode Exit fullscreen mode

What SpecGuard Checks

Main validation areas

SpecGuard mainly looks for ambiguous or missing areas such as:

  • auth and permission boundaries
  • tenant ownership rules
  • idempotency and replay safety
  • race conditions
  • expiration and revocation handling
  • state transitions
  • webhook/background retry behavior
  • requirements relying only on client-side validation

The output is one of:

  • READY
  • READY_WITH_WARNINGS
  • NOT_READY

Why the Default Mode Does Not Use an LLM

I intentionally made the default path non-LLM.

I wanted spec validation to behave more like linting:

  • deterministic
  • reproducible
  • CI-friendly
  • cheap to run repeatedly

LLM-based review exists as an optional deeper layer, not the foundation.

There is also an optional OpenAI/Codex-based deeper review mode, but currently I treat that as a secondary layer rather than the default workflow.

Codex Plugin

In v0.4.0 I added an MVP Codex plugin.

Install:

pip install spec-guard
specguard --help

codex plugin marketplace add KoreaNirsa/spec-guard --ref main
Enter fullscreen mode Exit fullscreen mode

Create an example spec package:

specguard example copy specs/your-feature-name --force
Enter fullscreen mode Exit fullscreen mode

Inside Codex, the plugin can:

  • run SpecGuard analysis
  • read generated results
  • summarize READY/NOT_READY state
  • explain main findings and next actions

The plugin itself does not reimplement the engine.

It wraps the existing CLI workflow.

GitHub PR Review Workflow

SpecGuard also includes a GitHub Actions-based PR review workflow.

When a spec package changes in a PR, it can automatically run SpecGuard Review and leave findings directly on the PR.

The OpenAI review path currently uses GitHub secrets such as:

SPECGUARD_OPENAI_API_KEY
SPECGUARD_PR_REVIEW_MODEL
SPECGUARD_REVIEW_SPEC_PATHS
Enter fullscreen mode Exit fullscreen mode

Current Status

This project is still very early and pre-beta.

I do not expect it to perfectly judge every specification.

Right now I am mainly interested in feedback around:

  • what kinds of specs this workflow fits well
  • where deterministic checks break down
  • which findings feel too noisy or too weak
  • whether PR enforcement would fit real engineering workflows

If you are already using AI coding agents in production workflows, I’d genuinely like to know:

  • what kinds of spec failures you see most often
  • where deterministic validation breaks down
  • and whether something like this would actually fit your development workflow

I’m especially interested in situations where the generated implementation looked correct, but the requirement itself was underspecified.

Feedback, issues, and PRs are all welcome.

GitHub logo KoreaNirsa / spec-guard

Validation-First Workflow (VFW) for AI-assisted development

SpecGuard banner

SpecGuard

SpecGuard blocks weak specs before AI coding agents turn them into defective code.

SpecGuard is a Validation-First Workflow (VFW) for AI-assisted development It turns specs into reviewed, testable, implementation-ready packages before AI coding begins.

It is not a prompt-to-code generator. SpecGuard helps you prepare an approved spec package before an external Codex, Claude Code, or another coding agent writes application code.

Demo Video

SpecGuard demo walkthrough

Watch the full-resolution MP4 demo

The demo follows this flow:

  1. Install SpecGuard with pip install spec-guard.
  2. Copy the example spec with specguard example copy your-feature-name --force.
  3. Insert a vulnerable spec. In v0.3.0, the packaged example intentionally includes a vulnerable spec by default so users can see a blocking SpecGuard Review.
  4. Review the SpecGuard findings.
  5. Fix the weak areas directly, or ask an AI assistant to strengthen the spec by giving it the SpecGuard Review findings.
  6. Run SpecGuard Review again and confirm it reaches READY…

Top comments (0)