yohey-w

Posted on Mar 29 • Edited on Mar 30

Prompt Context Harness. I Did All Three. Just Hand It Requirements. CoDD — Coherence-Driven Development

#ai #contextengineering #specdriven #claudecode

The Problem Nobody Solved

Over the past two years, AI-assisted development went through three methodology shifts:

Prompt Engineering (2023-2024) — How to ask AI the right question
Context Engineering (2024-2025) — How to give AI the right background knowledge (Anthropic's take)
Harness Engineering (2025-2026) — How to constrain AI with rules, hooks, and guardrails (coined by Mitchell Hashimoto)

I went through all three. I run a multi-agent system with 10 AI agents operating as a feudal Japanese military hierarchy (yes, really — details here). Anthropic's Building Effective Agents paper showed that the most effective agent implementations are simple, composable patterns. That philosophy runs through everything I built. After implementing every methodology on the list, I found a problem none of them address — and built a tool to solve it. The result: on a client LMS project, I handed AI requirements and constraints. It generated 18 design docs, all code, all tests. When design decisions changed mid-project, nothing broke.

What happens when something changes mid-development?

It's not just "requirements change." During any real project, things shift constantly:

Requirements get added or modified — which design docs are affected?
You revise the API design — what about the calling code and tests?
Database schema changes — are your migrations safe? What about RLS policies?
Tech stack decision changes — test strategy and infra design both break
A new constraint surfaces — how far does the ripple go?

Prompt engineering doesn't answer this. Context engineering doesn't answer this. Harness engineering doesn't answer this. They're all about "how to make AI build things." None of them address "how to keep things coherent when something changes."

The Context Engineering Ceiling

But there's a deeper issue worth unpacking. Context engineering's core tool is the persistent context file — CLAUDE.md, AGENTS.md, instructions. One file, all the background knowledge, always loaded.

That works for small projects. What happens when the project outgrows a single context window?

Requirements, system design, API specs, database design, test strategy, deployment config — you can't fit all of that into one file. And what overflows the context window is invisible to the AI. Invisible means nonexistent.

The architectural answer is structured design documents — which I'll introduce later in this post.

Decompose knowledge into structured documents — not one monolith file
Frontmatter declares what to read — the dependency graph identifies relevant docs
Load only what's needed — impact analysis tells you exactly which docs are affected

Prompt Engineering   → Cram everything into one prompt
Context Engineering  → Write everything into one persistent file (has limits)
???                  → Structured doc set + dependency graph → load only what's relevant

If CLAUDE.md is "everything in one place," structured design docs are "the right context at the right time." Keep reading to see what fills the ???.

The Landscape: Everyone Says "Spec First"

The "spec-driven" space is growing fast:

Spec Kit (GitHub official, 83k+ stars) — Write specs first, then generate code. Extensions include V-Model, Verify, Sync. GitHub Blog declares that "code serves specifications," not the other way around.
OpenSpec (35k+ stars) — Artifact dependency graphs and /opsx:verify. Positions itself as the lightweight alternative to Spec Kit.
Google Conductor — Persistent Markdown for plan-before-build with Automated Reviews.
cc-spec-driven — Claude Code plugin with bidirectional dependency tracking and quick impact analysis.

All of them agree: write specs first. And all of them share the same blind spot:

None of them tell you what to do when the spec changes.

They teach you how to start. They don't teach you how to keep going.

CoDD: Coherence-Driven Development

So I built it. Nobody else was solving this, so I did.

CoDD — Coherence-Driven Development. A methodology and CLI tool for maintaining design coherence when requirements change.

pip install codd-dev

CoDD sits as Layer 3 of harness engineering:

Harness (CLAUDE.md, Hooks, Skills)  ← Rules, guardrails, workflow
  └─ CoDD (methodology)             ← Operates within the harness
       └─ Design docs (docs/*.md)   ← Artifacts CoDD generates and maintains

It does three things:

Build a dependency graph — Declare dependencies between design artifacts via Markdown frontmatter
Analyze change impact — When anything changes, trace the graph to identify what's affected
Maintain coherence — AI updates the affected docs, code, and tests

# Build the dependency graph from frontmatter
codd scan

# See what's affected by your last commit
codd impact

That's it. Two commands to answer "what broke?"

Vibe Coding vs Harness Engineering vs CoDD

Andrej Karpathy coined "vibe coding" in February 2025 — "see things, say things, run things, and vibe." Building without understanding the code. Here's how the three approaches compare across evaluation axes:

Axis	Vibe Coding	Harness Engineering (existing)	CoDD
Initial speed	Excellent. "Build this" and go	Good after setup	Slower. Design docs come first
Change resilience	Poor. Rebuild from scratch	Moderate. Guardrails exist, but humans trace impact	Excellent. Dependency graph traces it automatically
Design traceability	None. Nobody knows why it's built this way	Partial. Rules in CLAUDE.md	Full. Frontmatter declares all dependencies
Setup cost	Zero	Medium. Hooks, CLAUDE.md, MCP config	Medium. Init + frontmatter in design docs
Large projects	Breaks down	Manageable with rules	Scales via dependency graph
AI runaway risk	High. AI builds the wrong thing perfectly	Moderate. Guardrails constrain	Low. Divergence from design is detectable
Best fit	Prototypes, throwaway code	Mid-size steady-state projects	Projects where things keep changing

Vibe coding isn't bad. It's perfect for prototypes. Existing harness engineering works fine for stable requirements.

The question is what happens in real projects where things keep changing — requirements, API designs, database schemas, tech decisions. Vibe coding says "rebuild." Existing harness says "a human tracks the impact." CoDD says "the graph already knows."

Want speed? Vibe code. Want durability? CoDD. Want both? Prototype with vibe, build with CoDD.

Core Philosophy

Derive, Don't Configure

CoDD's central principle: if you know the upstream, the downstream is self-evident.

If system_design.md says "Next.js + Supabase," the test strategy is vitest + Playwright. No config file needed. Architecture determines testing. Always.

system_design.md → "Next.js + Supabase"
→ Test strategy: vitest (unit) + Playwright (E2E). Zero config.

system_design.md → "FastAPI + Python"
→ Test strategy: pytest + httpx. Zero config.

system_design.md → "CLI tool in Go"
→ Test strategy: go test. Zero config.

The world has too many config files. CoDD doesn't scatter configuration. Requirements + constraints in, design + implementation + tests out. The AI derives everything from the upstream artifacts and current best practices.

V-Model, Finally Achievable

The V-Model is the ideal development model:

Left side (Design)              Right side (Verification)
Requirements              ←→  System Test / E2E
  System Design           ←→  Integration Test
    Detailed Design (API)  ←→  Unit Test
      Implementation

It breaks down in practice because humans can't track which documents to update when requirements change. CoDD automates that change propagation. That's what makes the V-Model sustainable.

Waves — Execution Order Along the Dependency Graph

CoDD processes design documents in units called "Waves." Changes propagate from upstream to downstream like waves through the dependency graph — that's the name.

Wave 1: Requirements (no dependencies — the origin)
Wave 2: System Design (depends on requirements)
Wave 3: Detailed Design / API Design (depends on system design)
Wave 4: Test Strategy / Infrastructure (depends on detailed design)

Waves work in two directions:

Generation (upstream → downstream) — You can't write system design before requirements are defined. You can't write API design before system design exists. Wave 1 gets finalized first, then Wave 2, and so on.
Impact analysis (downstream → upstream lookup) — When DB design changes, trace the graph in reverse: "what depends on this?" to identify everything affected.

Both directions require the right order. Waves derive that order automatically from the dependency graph.

Frontmatter as Single Source of Truth

Dependencies are declared in Markdown frontmatter:

---
codd:
  node_id: "design:api-design"
  depends_on:
    - id: "design:system-design"
      relation: derives_from
    - id: "req:lms-requirements-v2.0"
      relation: implements
---

graph.db is a cache. codd scan regenerates it every time. The frontmatter is the source of truth. No dual-maintained config files.

Important: humans don't write this frontmatter manually. AI does. The design docs, the frontmatter, the dependency declarations — AI generates all of it. Humans define requirements and constraints. That's it.

Commands

Command	Status	What it does
`codd init`	Stable	Project initialization
`codd scan`	Stable	Build dependency graph
`codd impact`	Stable	Change impact analysis (Green/Amber/Gray bands)
`codd validate`	Alpha	Frontmatter consistency check
`codd generate`	Experimental	Generate design docs in Wave order
`codd plan`	Experimental	Wave execution status
`codd verify`	Experimental	V-Model verification (type check + test → design traceability)
`codd implement`	Experimental	Design → code generation

Impact analysis returns three confidence bands:

Green — High confidence. Definitely affected.
Amber — Medium confidence. Check needed.
Gray — Low confidence. Possible indirect impact.

CLI vs Claude Code Skills

CoDD has two usage modes: run the CLI directly, or integrate as Claude Code Skills.

CLI (works with any editor)

codd scan
codd impact

Run from the terminal. No editor or harness dependency. Works with Cursor, Copilot, or anything else. Good for CI/CD pipelines too.

Claude Code Skills (recommended: integrates into the dev loop)

CoDD ships with slash-command Skills for Claude Code:

/codd-init      → Project initialization
/codd-scan      → Build dependency graph
/codd-impact    → Change impact analysis
/codd-validate  → Frontmatter consistency check
/codd-generate  → Generate design docs in Wave order

With Skills, building the same TaskFlow task management app looks like this:

You:  /codd-init
      → Claude: codd init --project-name "taskflow" --language "typescript" \
                  --requirements spec.txt

You:  /codd-generate
      → Claude: codd generate --wave 2 --path .
      → Claude reads every generated doc, checks scope, validates frontmatter
      → "Wave 2 design docs reviewed. Proceed to Wave 3?"

You:  yes

You:  /codd-generate
      → Claude: codd generate --wave 3 --path .

You:  /codd-scan
      → Claude: codd scan --path .
      → "7 documents, 15 edges. No warnings."

You:  (edit requirements — add SSO + audit logging)

You:  /codd-impact
      → Claude: codd impact --path .
      → Green Band: auto-updates system-design, api-design, db-design, auth-design
      → Amber Band: "test-strategy is affected. Update it?"

The key difference is HITL (Human-in-the-Loop) gates are automatic. /codd-generate pauses between waves for approval. /codd-impact follows the Green/Amber/Gray protocol — auto-updating safe changes, asking before risky ones. You just make decisions.

Now combine Skills with hooks for zero-maintenance coherence:

// .claude/settings.json
{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "async": true,
        "command": "cd \"$CLAUDE_PROJECT_DIR\" && codd scan --path ."
      }]
    }]
  }
}

PostToolUse hook: Every time AI edits a file, codd scan runs automatically. The dependency graph stays current.
pre-commit hook: Wire codd validate into git hooks so broken coherence can't be committed.

Once you add that hook, you never run codd scan manually again. Graph maintenance becomes invisible. Your workflow is just: edit files normally, run /codd-impact when you want to know what's affected. That's it.

CLI is the manual tool. Skills are the integrated workflow. CLI alone is enough, but Skills + hooks mean coherence is maintained without you ever thinking about it. Set it once. The hook does the rest.

Full setup details: Claude Code x CoDD Setup Guide

5-Minute Demo — See CoDD in Action

We'll build TaskFlow, a task management app. Write requirements in plain text, let CoDD + AI handle the rest.

Step 1: Write requirements (any format — txt, md, doc)

# TaskFlow — Requirements

## Functional Requirements
- User auth (email + Google OAuth)
- Workspace management (teams, roles, invites)
- Task CRUD with assignees, labels, due dates
- Real-time updates (WebSocket)
- File attachments (S3)
- Notification system (in-app + email)

## Constraints
- Next.js + Prisma + PostgreSQL
- Row-level security for workspace isolation
- All API endpoints rate-limited

Save as spec.txt. No special formatting needed.

Step 2: Initialize CoDD

pip install codd-dev
mkdir taskflow && cd taskflow && git init
codd init --project-name "taskflow" --language "typescript" \
  --requirements spec.txt

CoDD adds frontmatter automatically — node_id, type, dependency metadata. You never touch it.

Step 3: AI generates design docs

codd generate --wave 2   # System design + API design
codd generate --wave 3   # DB design + Auth design
codd generate --wave 4   # Test strategy

wave_config is auto-generated from your requirements. Each design doc gets CoDD frontmatter — all derived, nothing manual.

Step 4: Build the graph → Change requirements → See impact

codd scan
# → 7 docs, 15 edges, zero config

PM asks for SSO and audit logging. Open docs/requirements/requirements.md and add:

## Additional Requirements (v1.1)
- SAML SSO (enterprise customers)
- Audit logging (record & export all operations)

Save the file and ask CoDD what's affected:

codd impact    # detects uncommitted changes automatically

Changed files: 1
  - docs/requirements/requirements.md → req:taskflow-requirements

## Green Band (high confidence, auto-propagate)
| Target                  | Depth | Confidence |
|-------------------------|-------|------------|
| design:system-design    | 1     | 0.90       |
| design:api-design       | 1     | 0.90       |
| detail:db-design        | 2     | 0.90       |
| detail:auth-design      | 2     | 0.90       |

## Amber Band (must review)
| Target                  | Depth | Confidence |
|-------------------------|-------|------------|
| test:test-strategy      | 2     | 0.90       |

2 lines changed → 6 out of 7 docs affected. Green band: AI auto-updates. Amber: human reviews. You know exactly what to fix before anything breaks.

Real-World Usage

I've used CoDD on a production web app — 18 design docs connected by a dependency graph.

Every design doc, every line of code, every test was generated by AI following CoDD. I defined requirements and constraints. That's it. You can add HITL (human-in-the-loop) checkpoints at each Wave if you want, or let AI run end-to-end and review only the final output. Your call.

docs/
├── requirements/       # Requirements (plain text — the only human input)
├── design/             # System design (6 files, AI-generated)
├── detailed_design/    # Detailed design (4 files, AI-generated)
├── governance/         # ADRs (3 files)
├── plan/               # Implementation plan
├── test/               # Acceptance criteria, test strategy
├── operations/         # Runbooks
└── infra/              # Infrastructure design

18 Markdown files connected by a dependency graph.

As development progressed, design decisions naturally evolved. The initial RLS approach for tenant isolation needed refinement as detailed design revealed better table structures. That changed the DB design, which changed the API design, which changed the test strategy.

This isn't a "requirements change." It's the normal process of refining decisions as you learn more. It happens on every project.

Without CoDD, a human tracks which of 18 interconnected docs need updating. With CoDD, codd impact lists the affected artifacts with confidence bands. AI auto-updates Green band docs and surfaces Amber band items for human review.

Changes stop being scary. That's CoDD's real value. When the cost of changing a design decision drops, you can make provisional decisions early and refine them later — without the design rotting from stale assumptions that nobody updated.

What's Next

scan, impact, and validate are stable and production-tested. generate, verify, and implement are experimental. CoDD is public alpha today.

Near-term roadmap:

Typed dependency relations — requires, affects, verifies, implements as distinct relation types
Brownfield support (codd extract) — Reverse-generate dependency-graphed design docs from existing codebases. Bring CoDD to legacy code maintenance and enhancement, not just greenfield
codd verify completion — Automated coherence verification across design ↔ code ↔ tests
Multi-harness integration examples — Documented workflows for Claude Code, Copilot, and Cursor

Long-term, CoDD aims to stand alongside TDD, BDD, and DDD as a development methodology. Not test-driven. Not behavior-driven. Not domain-driven. Coherence-driven. Development where anything can change without the design falling apart.

The Evolution, Summarized

Methodology	Problem it solves	Era
Prompt Engineering	How to craft AI input	2023-2024
Context Engineering	How to manage AI's background knowledge	2024-2025
Harness Engineering	How to constrain AI execution	2025-2026
└─ CoDD	How to maintain coherence when things change	2026+

CoDD is part of harness engineering — the layer nobody built yet. Existing harness engineering solves "how do I constrain AI execution." CoDD solves "how do I keep what AI built from falling apart," within the same harness framework.

Spec Kit and OpenSpec answer "how do I start?"
CoDD answers "how do I keep going when things change?"

pip install codd-dev

GitHub: yohey-w/codd-dev

So — what does your project do when things change mid-development?

I also write about multi-agent systems and AI development in Japanese on Zenn. If you read Japanese (or are curious enough to translate), the CoDD deep-dive is here.

DEV Community