DEV Community

Cover image for Prompt Context Harness. I Did All Three. Just Hand It Requirements. CoDD — Coherence-Driven Development
yohey-w
yohey-w

Posted on

Prompt Context Harness. I Did All Three. Just Hand It Requirements. CoDD — Coherence-Driven Development

The Problem Nobody Solved

Over the past two years, AI-assisted development went through three methodology shifts:

  1. Prompt Engineering (2023-2024) — How to ask AI the right question
  2. Context Engineering (2024-2025) — How to give AI the right background knowledge (Anthropic's take)
  3. Harness Engineering (2025-2026) — How to constrain AI with rules, hooks, and guardrails (coined by Mitchell Hashimoto)

I went through all three. I run a multi-agent system with 10 AI agents operating as a feudal Japanese military hierarchy (yes, really — details here). Anthropic's Building Effective Agents paper showed that the most effective agent implementations are simple, composable patterns. That philosophy runs through everything I built. After implementing every methodology on the list, I found a problem none of them address — and built a tool to solve it. The result: on a client LMS project, I handed AI requirements and constraints. It generated 18 design docs, all code, all tests. When design decisions changed mid-project, nothing broke.

What happens when something changes mid-development?

It's not just "requirements change." During any real project, things shift constantly:

  • Requirements get added or modified — which design docs are affected?
  • You revise the API design — what about the calling code and tests?
  • Database schema changes — are your migrations safe? What about RLS policies?
  • Tech stack decision changes — test strategy and infra design both break
  • A new constraint surfaces — how far does the ripple go?

Prompt engineering doesn't answer this. Context engineering doesn't answer this. Harness engineering doesn't answer this. They're all about "how to make AI build things." None of them address "how to keep things coherent when something changes."


The Context Engineering Ceiling

But there's a deeper issue worth unpacking. Context engineering's core tool is the persistent context file — CLAUDE.md, AGENTS.md, instructions. One file, all the background knowledge, always loaded.

That works for small projects. What happens when the project outgrows a single context window?

Requirements, system design, API specs, database design, test strategy, deployment config — you can't fit all of that into one file. And what overflows the context window is invisible to the AI. Invisible means nonexistent.

The architectural answer is structured design documents — which I'll introduce later in this post.

  • Decompose knowledge into structured documents — not one monolith file
  • Frontmatter declares what to read — the dependency graph identifies relevant docs
  • Load only what's needed — impact analysis tells you exactly which docs are affected
Prompt Engineering   → Cram everything into one prompt
Context Engineering  → Write everything into one persistent file (has limits)
???                  → Structured doc set + dependency graph → load only what's relevant
Enter fullscreen mode Exit fullscreen mode

If CLAUDE.md is "everything in one place," structured design docs are "the right context at the right time." Keep reading to see what fills the ???.


The Landscape: Everyone Says "Spec First"

The "spec-driven" space is growing fast:

  • Spec Kit (GitHub official, 83k+ stars) — Write specs first, then generate code. Extensions include V-Model, Verify, Sync. GitHub Blog declares that "code serves specifications," not the other way around.
  • OpenSpec (35k+ stars) — Artifact dependency graphs and /opsx:verify. Positions itself as the lightweight alternative to Spec Kit.
  • Google Conductor — Persistent Markdown for plan-before-build with Automated Reviews.
  • cc-spec-driven — Claude Code plugin with bidirectional dependency tracking and quick impact analysis.

All of them agree: write specs first. And all of them share the same blind spot:

None of them tell you what to do when the spec changes.

They teach you how to start. They don't teach you how to keep going.


CoDD: Coherence-Driven Development

So I built it. Nobody else was solving this, so I did.

CoDD — Coherence-Driven Development. A methodology and CLI tool for maintaining design coherence when requirements change.

pip install codd-dev
Enter fullscreen mode Exit fullscreen mode

CoDD sits as Layer 3 of harness engineering:

Harness (CLAUDE.md, Hooks, Skills)  ← Rules, guardrails, workflow
  └─ CoDD (methodology)             ← Operates within the harness
       └─ Design docs (docs/*.md)   ← Artifacts CoDD generates and maintains
Enter fullscreen mode Exit fullscreen mode

It does three things:

  1. Build a dependency graph — Declare dependencies between design artifacts via Markdown frontmatter
  2. Analyze change impact — When anything changes, trace the graph to identify what's affected
  3. Maintain coherence — AI updates the affected docs, code, and tests
# Build the dependency graph from frontmatter
codd scan

# See what's affected by your last commit
codd impact
Enter fullscreen mode Exit fullscreen mode

That's it. Two commands to answer "what broke?"


Vibe Coding vs Harness Engineering vs CoDD

Andrej Karpathy coined "vibe coding" in February 2025 — "see things, say things, run things, and vibe." Building without understanding the code. Here's how the three approaches compare across evaluation axes:

Axis Vibe Coding Harness Engineering (existing) CoDD
Initial speed Excellent. "Build this" and go Good after setup Slower. Design docs come first
Change resilience Poor. Rebuild from scratch Moderate. Guardrails exist, but humans trace impact Excellent. Dependency graph traces it automatically
Design traceability None. Nobody knows why it's built this way Partial. Rules in CLAUDE.md Full. Frontmatter declares all dependencies
Setup cost Zero Medium. Hooks, CLAUDE.md, MCP config Medium. Init + frontmatter in design docs
Large projects Breaks down Manageable with rules Scales via dependency graph
AI runaway risk High. AI builds the wrong thing perfectly Moderate. Guardrails constrain Low. Divergence from design is detectable
Best fit Prototypes, throwaway code Mid-size steady-state projects Projects where things keep changing

Vibe coding isn't bad. It's perfect for prototypes. Existing harness engineering works fine for stable requirements.

The question is what happens in real projects where things keep changing — requirements, API designs, database schemas, tech decisions. Vibe coding says "rebuild." Existing harness says "a human tracks the impact." CoDD says "the graph already knows."

Want speed? Vibe code. Want durability? CoDD. Want both? Prototype with vibe, build with CoDD.


Core Philosophy

Derive, Don't Configure

CoDD's central principle: if you know the upstream, the downstream is self-evident.

If system_design.md says "Next.js + Supabase," the test strategy is vitest + Playwright. No config file needed. Architecture determines testing. Always.

system_design.md → "Next.js + Supabase"
→ Test strategy: vitest (unit) + Playwright (E2E). Zero config.

system_design.md → "FastAPI + Python"
→ Test strategy: pytest + httpx. Zero config.

system_design.md → "CLI tool in Go"
→ Test strategy: go test. Zero config.
Enter fullscreen mode Exit fullscreen mode

The world has too many config files. CoDD doesn't scatter configuration. Requirements + constraints in, design + implementation + tests out. The AI derives everything from the upstream artifacts and current best practices.

V-Model, Finally Achievable

The V-Model is the ideal development model:

Left side (Design)              Right side (Verification)
Requirements              ←→  System Test / E2E
  System Design           ←→  Integration Test
    Detailed Design (API)  ←→  Unit Test
      Implementation
Enter fullscreen mode Exit fullscreen mode

It breaks down in practice because humans can't track which documents to update when requirements change. CoDD automates that change propagation. That's what makes the V-Model sustainable.

Waves — Execution Order Along the Dependency Graph

CoDD processes design documents in units called "Waves." Changes propagate from upstream to downstream like waves through the dependency graph — that's the name.

Wave 1: Requirements (no dependencies — the origin)
Wave 2: System Design (depends on requirements)
Wave 3: Detailed Design / API Design (depends on system design)
Wave 4: Test Strategy / Infrastructure (depends on detailed design)
Enter fullscreen mode Exit fullscreen mode

Waves work in two directions:

  • Generation (upstream → downstream) — You can't write system design before requirements are defined. You can't write API design before system design exists. Wave 1 gets finalized first, then Wave 2, and so on.
  • Impact analysis (downstream → upstream lookup) — When DB design changes, trace the graph in reverse: "what depends on this?" to identify everything affected.

Both directions require the right order. Waves derive that order automatically from the dependency graph.

Frontmatter as Single Source of Truth

Dependencies are declared in Markdown frontmatter:

---
codd:
  node_id: "design:api-design"
  depends_on:
    - id: "design:system-design"
      relation: derives_from
    - id: "req:lms-requirements-v2.0"
      relation: implements
---
Enter fullscreen mode Exit fullscreen mode

graph.db is a cache. codd scan regenerates it every time. The frontmatter is the source of truth. No dual-maintained config files.

Important: humans don't write this frontmatter manually. AI does. The design docs, the frontmatter, the dependency declarations — AI generates all of it. Humans define requirements and constraints. That's it.


Commands

Command Status What it does
codd init Stable Project initialization
codd scan Stable Build dependency graph
codd impact Stable Change impact analysis (Green/Amber/Gray bands)
codd validate Alpha Frontmatter consistency check
codd generate Experimental Generate design docs in Wave order
codd plan Experimental Wave execution status
codd verify Experimental V-Model verification (type check + test → design traceability)
codd implement Experimental Design → code generation

Impact analysis returns three confidence bands:

  • Green — High confidence. Definitely affected.
  • Amber — Medium confidence. Check needed.
  • Gray — Low confidence. Possible indirect impact.

CLI vs Claude Code Skills

CoDD has two usage modes: run the CLI directly, or integrate as Claude Code Skills.

CLI (works with any editor)

codd scan
codd impact
Enter fullscreen mode Exit fullscreen mode

Run from the terminal. No editor or harness dependency. Works with Cursor, Copilot, or anything else. Good for CI/CD pipelines too.

Claude Code Skills (recommended: integrates into the dev loop)

CoDD ships with slash-command Skills for Claude Code:

/codd-init      → Project initialization
/codd-scan      → Build dependency graph
/codd-impact    → Change impact analysis
/codd-validate  → Frontmatter consistency check
/codd-generate  → Generate design docs in Wave order
Enter fullscreen mode Exit fullscreen mode

With Skills, building the same TaskFlow task management app looks like this:

You:  /codd-init
      → Claude: codd init --project-name "taskflow" --language "typescript" \
                  --requirements spec.txt

You:  /codd-generate
      → Claude: codd generate --wave 2 --path .
      → Claude reads every generated doc, checks scope, validates frontmatter
      → "Wave 2 design docs reviewed. Proceed to Wave 3?"

You:  yes

You:  /codd-generate
      → Claude: codd generate --wave 3 --path .

You:  /codd-scan
      → Claude: codd scan --path .
      → "7 documents, 15 edges. No warnings."

You:  (edit requirements — add SSO + audit logging)

You:  /codd-impact
      → Claude: codd impact --path .
      → Green Band: auto-updates system-design, api-design, db-design, auth-design
      → Amber Band: "test-strategy is affected. Update it?"
Enter fullscreen mode Exit fullscreen mode

The key difference is HITL (Human-in-the-Loop) gates are automatic. /codd-generate pauses between waves for approval. /codd-impact follows the Green/Amber/Gray protocol — auto-updating safe changes, asking before risky ones. You just make decisions.

Now combine Skills with hooks for zero-maintenance coherence:

// .claude/settings.json
{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "async": true,
        "command": "cd \"$CLAUDE_PROJECT_DIR\" && codd scan --path ."
      }]
    }]
  }
}
Enter fullscreen mode Exit fullscreen mode
  • PostToolUse hook: Every time AI edits a file, codd scan runs automatically. The dependency graph stays current.
  • pre-commit hook: Wire codd validate into git hooks so broken coherence can't be committed.

Once you add that hook, you never run codd scan manually again. Graph maintenance becomes invisible. Your workflow is just: edit files normally, run /codd-impact when you want to know what's affected. That's it.

CLI is the manual tool. Skills are the integrated workflow. CLI alone is enough, but Skills + hooks mean coherence is maintained without you ever thinking about it. Set it once. The hook does the rest.

Full setup details: Claude Code x CoDD Setup Guide


5-Minute Demo — See CoDD in Action

We'll build TaskFlow, a task management app. Write requirements in plain text, let CoDD + AI handle the rest.

Step 1: Write requirements (any format — txt, md, doc)

# TaskFlow — Requirements

## Functional Requirements
- User auth (email + Google OAuth)
- Workspace management (teams, roles, invites)
- Task CRUD with assignees, labels, due dates
- Real-time updates (WebSocket)
- File attachments (S3)
- Notification system (in-app + email)

## Constraints
- Next.js + Prisma + PostgreSQL
- Row-level security for workspace isolation
- All API endpoints rate-limited
Enter fullscreen mode Exit fullscreen mode

Save as spec.txt. No special formatting needed.

Step 2: Initialize CoDD

pip install codd-dev
mkdir taskflow && cd taskflow && git init
codd init --project-name "taskflow" --language "typescript" \
  --requirements spec.txt
Enter fullscreen mode Exit fullscreen mode

CoDD adds frontmatter automatically — node_id, type, dependency metadata. You never touch it.

Step 3: AI generates design docs

codd generate --wave 2   # System design + API design
codd generate --wave 3   # DB design + Auth design
codd generate --wave 4   # Test strategy
Enter fullscreen mode Exit fullscreen mode

wave_config is auto-generated from your requirements. Each design doc gets CoDD frontmatter — all derived, nothing manual.

Step 4: Build the graph → Change requirements → See impact

codd scan
# → 7 docs, 15 edges, zero config
Enter fullscreen mode Exit fullscreen mode

PM asks for SSO and audit logging. Open docs/requirements/requirements.md and add:

## Additional Requirements (v1.1)
- SAML SSO (enterprise customers)
- Audit logging (record & export all operations)
Enter fullscreen mode Exit fullscreen mode

Save the file and ask CoDD what's affected:

codd impact    # detects uncommitted changes automatically
Enter fullscreen mode Exit fullscreen mode
Changed files: 1
  - docs/requirements/requirements.md → req:taskflow-requirements

## Green Band (high confidence, auto-propagate)
| Target                  | Depth | Confidence |
|-------------------------|-------|------------|
| design:system-design    | 1     | 0.90       |
| design:api-design       | 1     | 0.90       |
| detail:db-design        | 2     | 0.90       |
| detail:auth-design      | 2     | 0.90       |

## Amber Band (must review)
| Target                  | Depth | Confidence |
|-------------------------|-------|------------|
| test:test-strategy      | 2     | 0.90       |
Enter fullscreen mode Exit fullscreen mode

2 lines changed → 6 out of 7 docs affected. Green band: AI auto-updates. Amber: human reviews. You know exactly what to fix before anything breaks.


Real-World Usage

I've used CoDD on a production web app — 18 design docs connected by a dependency graph.

Every design doc, every line of code, every test was generated by AI following CoDD. I defined requirements and constraints. That's it. You can add HITL (human-in-the-loop) checkpoints at each Wave if you want, or let AI run end-to-end and review only the final output. Your call.

docs/
├── requirements/       # Requirements (plain text — the only human input)
├── design/             # System design (6 files, AI-generated)
├── detailed_design/    # Detailed design (4 files, AI-generated)
├── governance/         # ADRs (3 files)
├── plan/               # Implementation plan
├── test/               # Acceptance criteria, test strategy
├── operations/         # Runbooks
└── infra/              # Infrastructure design
Enter fullscreen mode Exit fullscreen mode

18 Markdown files connected by a dependency graph.

As development progressed, design decisions naturally evolved. The initial RLS approach for tenant isolation needed refinement as detailed design revealed better table structures. That changed the DB design, which changed the API design, which changed the test strategy.

This isn't a "requirements change." It's the normal process of refining decisions as you learn more. It happens on every project.

Without CoDD, a human tracks which of 18 interconnected docs need updating. With CoDD, codd impact lists the affected artifacts with confidence bands. AI auto-updates Green band docs and surfaces Amber band items for human review.

Changes stop being scary. That's CoDD's real value. When the cost of changing a design decision drops, you can make provisional decisions early and refine them later — without the design rotting from stale assumptions that nobody updated.


What's Next

scan, impact, and validate are stable and production-tested. generate, verify, and implement are experimental. CoDD is public alpha today.

Near-term roadmap:

  • Typed dependency relationsrequires, affects, verifies, implements as distinct relation types
  • Brownfield support (codd extract) — Reverse-generate dependency-graphed design docs from existing codebases. Bring CoDD to legacy code maintenance and enhancement, not just greenfield
  • codd verify completion — Automated coherence verification across design ↔ code ↔ tests
  • Multi-harness integration examples — Documented workflows for Claude Code, Copilot, and Cursor

Long-term, CoDD aims to stand alongside TDD, BDD, and DDD as a development methodology. Not test-driven. Not behavior-driven. Not domain-driven. Coherence-driven. Development where anything can change without the design falling apart.


The Evolution, Summarized

Methodology Problem it solves Era
Prompt Engineering How to craft AI input 2023-2024
Context Engineering How to manage AI's background knowledge 2024-2025
Harness Engineering How to constrain AI execution 2025-2026
 └─ CoDD How to maintain coherence when things change 2026+

CoDD is part of harness engineering — the layer nobody built yet. Existing harness engineering solves "how do I constrain AI execution." CoDD solves "how do I keep what AI built from falling apart," within the same harness framework.

Spec Kit and OpenSpec answer "how do I start?"
CoDD answers "how do I keep going when things change?"

pip install codd-dev
Enter fullscreen mode Exit fullscreen mode

GitHub: yohey-w/codd-dev


So — what does your project do when things change mid-development?


I also write about multi-agent systems and AI development in Japanese on Zenn. If you read Japanese (or are curious enough to translate), the CoDD deep-dive is here.

Top comments (0)