Stop Vibe Coding. Start Spec-Driven Development.

#ai #webdev #programming #productivity

If you type "add user auth" into Claude and ship whatever comes back, you're not engineering. You're contributing to AI slop - stop it.

Andrej Karpathy coined "vibe coding" in early 2025 — type a prompt, accept the output, move on. It felt like a superpower. Then the data came in. Experienced developers using AI tools were 19% slower on real codebases¹, and AI co-authored PRs had 1.7x more major issues². Faster keystrokes, worse software.

The models keep improving — but better generation doesn't fix misaligned intent or the cascade of design decisions that follow. That's what autospec solves.

Vibe coding fails at alignment, not generation

Modern models can reason about architecture, decompose problems, and generate plausible code. None of that matters if they're solving the wrong problem.

When you type "add user auth," the model guesses: OAuth or email/password? Sessions or JWTs? Middleware placement? Error response format? You discover which guesses were wrong after the code exists. That's the misalignment problem. No amount of model intelligence fixes it because the model never had your intent in the first place.

Spec-driven development solves this. The workflow is spec → plan → tasks → implement. Instead of jumping straight to code, the first step generates a spec.yaml — a structured artifact with requirements, acceptance criteria, edge cases, and constraints, all shaped by your project's constitution.yaml. From there you iterate on the spec: edit it by hand, or use autospec clarify to open an interactive session where you and the AI refine scope, resolve ambiguities, and tighten requirements until the spec actually captures your intent. Only then does planning and implementation begin, carrying that alignment forward.

How autospec enforces this

autospec is a streamlined open-source spec-driven workflow that orchestrates Claude Code and/or OpenCode agents.

Constitution first. A constitution defines your project's non-negotiable rules — quality standards, architectural constraints, security requirements — with explicit priority levels and enforcement mechanisms. autospec infers initial principles from your codebase (Makefile targets, CI config, README) and you refine from there. Every command runs under these constraints.

# .autospec/constitution.yaml (trimmed)
preamble: |
  autospec is a Go CLI that orchestrates AI-driven specification workflows.
  These principles ensure code quality, maintainability, and reliable execution.

principles:
  - name: "Test-First Development"
    id: "PRIN-001"
    priority: "NON-NEGOTIABLE"
    description: "Tests written before implementation. Tests define behavior."
    enforcement:
      - mechanism: "CI pipeline"
        description: "Build fails if tests fail"
    exceptions:
      - "Prototype/spike code explicitly marked as such"

  - name: "Idiomatic Go"
    id: "PRIN-002"
    priority: "MUST"
    description: "Follow Go community conventions."
    enforcement:
      - mechanism: "Code review and linting"
        description: "golangci-lint + reviewer verification"

  - name: "Performance Standards"
    id: "PRIN-003"
    priority: "MUST"
    description: "Validation <10ms, config <100ms, user ops <1s."

  - name: "Actionable Errors"
    id: "PRIN-007"
    priority: "MUST"
    description: "Errors include context, expected vs actual, and fix hints."

sections:
  - name: "Go Idioms"
    content: |
      Error handling: Wrap with context using fmt.Errorf("doing X: %w", err).
      Table tests: Use map[string]struct{} with t.Run and t.Parallel().
      Functions: Keep under 40 lines, extract helpers as needed.
      Interfaces: Accept interfaces, return concrete types.

Every principle has an ID, a priority level (NON-NEGOTIABLE, MUST, SHOULD, MAY), enforcement mechanisms, and documented exceptions. The constitution also includes project-specific sections — coding idioms, naming conventions, quality gates — that get injected into every autospec session so the AI operates under the same constraints your team does.

Structured stages. The core workflow runs spec → plan → tasks → implement. From a plain-English feature description, autospec generates a spec.yaml, then a plan.yaml, then tasks.yaml. Code only gets written after all three exist and are valid.

Here's what a real spec.yaml looks like for "add user authentication":

# specs/001-user-auth/spec.yaml (trimmed)
feature:
  branch: "001-user-auth"
  status: "Draft"
  input: "Add user authentication to the application"

user_stories:
  - id: "US-001"
    title: "User can log in with email and password"
    priority: "P1"
    as_a: "registered user"
    i_want: "to log in with my email and password"
    so_that: "I can access my account"
    acceptance_scenarios:
      - given: "I have a registered account"
        when: "I submit valid credentials"
        then: "I am logged in and redirected to dashboard"

requirements:
  functional:
    - id: "FR-001"
      description: "MUST support email/password authentication"
      testable: true
      acceptance_criteria: "Users can log in with valid email and password"
    - id: "FR-002"
      description: "MUST hash passwords before storage"
      testable: true
      acceptance_criteria: "Passwords are stored using bcrypt with cost factor 12"
  non_functional:
    - id: "NFR-002"
      category: "security"
      description: "Must rate limit login attempts"
      measurable_target: "Max 5 attempts per minute per IP"

edge_cases:
  - scenario: "User enters email with different case"
    expected_behavior: "Email comparison is case-insensitive"
  - scenario: "Session token expires during active use"
    expected_behavior: "User is prompted to log in again"

out_of_scope:
  - "OAuth/social login integration"
  - "Two-factor authentication"

Every assumption, constraint, edge case, and requirement is explicit YAML — not markdown you eyeball, but structured data you can validate programmatically. Schema validation catches missing fields and invalid references before the next stage runs. When validation fails, autospec feeds specific errors back to the AI and it self-corrects. No manual prompt editing.

Per-phase isolation. Tasks in tasks.yaml are grouped into phases — logical units like "setup," "core logic," "tests." Each phase runs in a fresh context window, so the agent isn't dragging 10,000 tokens of prior work into every call. We estimate a 38-task feature drops from ~$257 to ~$42 (83% cost reduction) with this approach, and it prevents context degradation — phase 4 executes with the same clarity as phase 1. As each task completes, autospec updates its status directly in tasks.yaml — progress is always visible and resumable. If a session gets interrupted, run autospec implement and it picks up exactly where you left off.

Non-interactive by default. No back-and-forth chatting, no manual approvals mid-session. The AI gets instructions and builds. You review artifacts between stages, not during. Interactive mode (autospec clarify) exists for when you actually want a conversation to refine the spec.

Why autospec over GitHub Spec Kit?

autospec was inspired by GitHub Spec Kit and stays true to its core workflow: spec → plan → tasks → implement. That flow is the right idea. But Spec Kit's execution has real gaps that autospec closes:

	GitHub Spec Kit	autospec
Output format	Markdown	YAML — machine-readable, schema-validated
Validation	Manual review	Automatic with retry logic on failure
Context efficiency	Full prompt each time	Per-phase/task session isolation (80%+ cost savings)
Phase orchestration	Manual	Automated with dependency ordering
Status tracking	Manual	Auto-updates `spec.yaml` and `tasks.yaml` as work progresses
Implementation	Shell scripts	Go binary — type-safe, single install, cross-platform

The biggest difference is validation. Because every artifact is structured YAML, autospec can programmatically validate each stage before the next one runs. Schema validation catches missing fields, invalid references, and structural errors — things you can't check against markdown. When validation fails, autospec feeds the specific errors back into the next AI call and the model self-corrects. No manual prompt editing.

The other difference is streamlined developer productivity. autospec runs the agent in non-interactive mode by default — no waiting on chat responses, no accepting edits one by one, no answering a stream of clarifying questions. It just generates what's needed at every stage. You review artifacts between stages, not during. When you do want a conversation — to refine scope or resolve ambiguities — every stage is also available as a Claude Code slash command (/autospec.clarify, /autospec.specify, etc.) for interactive use.

Vibe coding has no feedback loop except you re-reading output and rewording prompts. Spec-driven development has automated quality gates at every stage.

When specs are worth it

Not everything needs a spec. Here's the quick decision test:

Skip autospec — the task has zero design decisions and you can finish in under 30 minutes. Fix a typo, bump a dependency, add a nil check, rename a variable. Just do it.

Use autospec — the task involves 3+ design decisions and touches 3+ files. Adding rate limiting to an API? You're choosing between token bucket and sliding window, deciding on storage, bypass rules, error formats. A webhook delivery system? Signature scheme, retry policy, timeout handling. These are the tasks where vibe coding silently makes the wrong choice on decision #3 and you find out two hours later. Autospec surfaces all of them in the spec before any code exists.

Split first — the task would take more than two days or bundles 3+ independent features. "Add OAuth with Google, GitHub, SAML, and LDAP" is four specs, not one. Split by feature slice, layer, or user journey, then run autospec on each part.

AWS warned in 2026 that review capacity — not developer output — is now the bottleneck in delivery. With 46% of new code AI-generated, pipelines weren't designed for this volume. Specs give reviewers something to review besides a thousand-line diff. They read the spec, verify the intent, then check that the code matches. That's a fundamentally different (and faster) review loop.

Get started

GitHub: github.com/ariel-frischer/autospec

curl -fsSL https://raw.githubusercontent.com/ariel-frischer/autospec/main/install.sh | sh
cd your-project/                       # any git repo
autospec init                          # project setup: agent, permissions, constitution
autospec run -s "your feature here"   # generate spec → review it
autospec clarify                       # optional: interactive refinement with Claude
autospec run -pti                      # plan → tasks → implement