The Problem Nobody Solved
Over the past two years, AI-assisted development went through three methodology shifts:
- Prompt Engineering (2023-2024) — How to ask AI the right question
- Context Engineering (2024-2025) — How to give AI the right background knowledge (Anthropic's take)
- Harness Engineering (2025-2026) — How to constrain AI with rules, hooks, and guardrails (coined by Mitchell Hashimoto)
I went through all three. I run a multi-agent system with 10 AI agents operating as a feudal Japanese military hierarchy (yes, really — details here). Anthropic's Building Effective Agents paper showed that the most effective agent implementations are simple, composable patterns. That philosophy runs through everything I built. After implementing every methodology on the list, I found a problem none of them address — and built a tool to solve it. The result: on a client LMS project, I handed AI requirements and constraints. It generated 18 design docs, all code, all tests. When design decisions changed mid-project, nothing broke.
What happens when something changes mid-development?
It's not just "requirements change." During any real project, things shift constantly:
- Requirements get added or modified — which design docs are affected?
- You revise the API design — what about the calling code and tests?
- Database schema changes — are your migrations safe? What about RLS policies?
- Tech stack decision changes — test strategy and infra design both break
- A new constraint surfaces — how far does the ripple go?
Prompt engineering doesn't answer this. Context engineering doesn't answer this. Harness engineering doesn't answer this. They're all about "how to make AI build things." None of them address "how to keep things coherent when something changes."
The Context Engineering Ceiling
But there's a deeper issue worth unpacking. Context engineering's core tool is the persistent context file — CLAUDE.md, AGENTS.md, instructions. One file, all the background knowledge, always loaded.
That works for small projects. What happens when the project outgrows a single context window?
Requirements, system design, API specs, database design, test strategy, deployment config — you can't fit all of that into one file. And what overflows the context window is invisible to the AI. Invisible means nonexistent.
The architectural answer is structured design documents — which I'll introduce later in this post.
- Decompose knowledge into structured documents — not one monolith file
- Frontmatter declares what to read — the dependency graph identifies relevant docs
- Load only what's needed — impact analysis tells you exactly which docs are affected
Prompt Engineering → Cram everything into one prompt
Context Engineering → Write everything into one persistent file (has limits)
??? → Structured doc set + dependency graph → load only what's relevant
If CLAUDE.md is "everything in one place," structured design docs are "the right context at the right time." Keep reading to see what fills the ???.
The Landscape: Everyone Says "Spec First"
The "spec-driven" space is growing fast:
- Spec Kit (GitHub official, 83k+ stars) — Write specs first, then generate code. Extensions include V-Model, Verify, Sync. GitHub Blog declares that "code serves specifications," not the other way around.
-
OpenSpec (35k+ stars) — Artifact dependency graphs and
/opsx:verify. Positions itself as the lightweight alternative to Spec Kit. - Google Conductor — Persistent Markdown for plan-before-build with Automated Reviews.
- cc-spec-driven — Claude Code plugin with bidirectional dependency tracking and quick impact analysis.
All of them agree: write specs first. And all of them share the same blind spot:
None of them tell you what to do when the spec changes.
They teach you how to start. They don't teach you how to keep going.
CoDD: Coherence-Driven Development
So I built it. Nobody else was solving this, so I did.
CoDD — Coherence-Driven Development. A methodology and CLI tool for maintaining design coherence when requirements change.
pip install codd-dev
CoDD sits as Layer 3 of harness engineering:
Harness (CLAUDE.md, Hooks, Skills) ← Rules, guardrails, workflow
└─ CoDD (methodology) ← Operates within the harness
└─ Design docs (docs/*.md) ← Artifacts CoDD generates and maintains
It does three things:
- Build a dependency graph — Declare dependencies between design artifacts via Markdown frontmatter
- Analyze change impact — When anything changes, trace the graph to identify what's affected
- Maintain coherence — AI updates the affected docs, code, and tests
# Build the dependency graph from frontmatter
codd scan
# See what's affected by your last commit
codd impact
That's it. Two commands to answer "what broke?"
Vibe Coding vs Harness Engineering vs CoDD
Andrej Karpathy coined "vibe coding" in February 2025 — "see things, say things, run things, and vibe." Building without understanding the code. Here's how the three approaches compare across evaluation axes:
| Axis | Vibe Coding | Harness Engineering (existing) | CoDD |
|---|---|---|---|
| Initial speed | Excellent. "Build this" and go | Good after setup | Slower. Design docs come first |
| Change resilience | Poor. Rebuild from scratch | Moderate. Guardrails exist, but humans trace impact | Excellent. Dependency graph traces it automatically |
| Design traceability | None. Nobody knows why it's built this way | Partial. Rules in CLAUDE.md | Full. Frontmatter declares all dependencies |
| Setup cost | Zero | Medium. Hooks, CLAUDE.md, MCP config | Medium. Init + frontmatter in design docs |
| Large projects | Breaks down | Manageable with rules | Scales via dependency graph |
| AI runaway risk | High. AI builds the wrong thing perfectly | Moderate. Guardrails constrain | Low. Divergence from design is detectable |
| Best fit | Prototypes, throwaway code | Mid-size steady-state projects | Projects where things keep changing |
Vibe coding isn't bad. It's perfect for prototypes. Existing harness engineering works fine for stable requirements.
The question is what happens in real projects where things keep changing — requirements, API designs, database schemas, tech decisions. Vibe coding says "rebuild." Existing harness says "a human tracks the impact." CoDD says "the graph already knows."
Want speed? Vibe code. Want durability? CoDD. Want both? Prototype with vibe, build with CoDD.
Core Philosophy
Derive, Don't Configure
CoDD's central principle: if you know the upstream, the downstream is self-evident.
If system_design.md says "Next.js + Supabase," the test strategy is vitest + Playwright. No config file needed. Architecture determines testing. Always.
system_design.md → "Next.js + Supabase"
→ Test strategy: vitest (unit) + Playwright (E2E). Zero config.
system_design.md → "FastAPI + Python"
→ Test strategy: pytest + httpx. Zero config.
system_design.md → "CLI tool in Go"
→ Test strategy: go test. Zero config.
The world has too many config files. CoDD doesn't scatter configuration. Requirements + constraints in, design + implementation + tests out. The AI derives everything from the upstream artifacts and current best practices.
V-Model, Finally Achievable
The V-Model is the ideal development model:
Left side (Design) Right side (Verification)
Requirements ←→ System Test / E2E
System Design ←→ Integration Test
Detailed Design (API) ←→ Unit Test
Implementation
It breaks down in practice because humans can't track which documents to update when requirements change. CoDD automates that change propagation. That's what makes the V-Model sustainable.
Waves — Execution Order Along the Dependency Graph
CoDD processes design documents in units called "Waves." Changes propagate from upstream to downstream like waves through the dependency graph — that's the name.
Wave 1: Requirements (no dependencies — the origin)
Wave 2: System Design (depends on requirements)
Wave 3: Detailed Design / API Design (depends on system design)
Wave 4: Test Strategy / Infrastructure (depends on detailed design)
Waves work in two directions:
- Generation (upstream → downstream) — You can't write system design before requirements are defined. You can't write API design before system design exists. Wave 1 gets finalized first, then Wave 2, and so on.
- Impact analysis (downstream → upstream lookup) — When DB design changes, trace the graph in reverse: "what depends on this?" to identify everything affected.
Both directions require the right order. Waves derive that order automatically from the dependency graph.
Frontmatter as Single Source of Truth
Dependencies are declared in Markdown frontmatter:
---
codd:
node_id: "design:api-design"
depends_on:
- id: "design:system-design"
relation: derives_from
- id: "req:lms-requirements-v2.0"
relation: implements
---
graph.db is a cache. codd scan regenerates it every time. The frontmatter is the source of truth. No dual-maintained config files.
Important: humans don't write this frontmatter manually. AI does. The design docs, the frontmatter, the dependency declarations — AI generates all of it. Humans define requirements and constraints. That's it.
Commands
| Command | Status | What it does |
|---|---|---|
codd init |
Stable | Project initialization |
codd scan |
Stable | Build dependency graph |
codd impact |
Stable | Change impact analysis (Green/Amber/Gray bands) |
codd validate |
Alpha | Frontmatter consistency check |
codd generate |
Experimental | Generate design docs in Wave order |
codd plan |
Experimental | Wave execution status |
codd verify |
Experimental | V-Model verification (type check + test → design traceability) |
codd implement |
Experimental | Design → code generation |
Impact analysis returns three confidence bands:
- Green — High confidence. Definitely affected.
- Amber — Medium confidence. Check needed.
- Gray — Low confidence. Possible indirect impact.
CLI vs Claude Code Skills
CoDD has two usage modes: run the CLI directly, or integrate as Claude Code Skills.
CLI (works with any editor)
codd scan
codd impact
Run from the terminal. No editor or harness dependency. Works with Cursor, Copilot, or anything else. Good for CI/CD pipelines too.
Claude Code Skills (recommended: integrates into the dev loop)
CoDD ships with slash-command Skills for Claude Code:
/codd-init → Project initialization
/codd-scan → Build dependency graph
/codd-impact → Change impact analysis
/codd-validate → Frontmatter consistency check
/codd-generate → Generate design docs in Wave order
With Skills, building the same TaskFlow task management app looks like this:
You: /codd-init
→ Claude: codd init --project-name "taskflow" --language "typescript" \
--requirements spec.txt
You: /codd-generate
→ Claude: codd generate --wave 2 --path .
→ Claude reads every generated doc, checks scope, validates frontmatter
→ "Wave 2 design docs reviewed. Proceed to Wave 3?"
You: yes
You: /codd-generate
→ Claude: codd generate --wave 3 --path .
You: /codd-scan
→ Claude: codd scan --path .
→ "7 documents, 15 edges. No warnings."
You: (edit requirements — add SSO + audit logging)
You: /codd-impact
→ Claude: codd impact --path .
→ Green Band: auto-updates system-design, api-design, db-design, auth-design
→ Amber Band: "test-strategy is affected. Update it?"
The key difference is HITL (Human-in-the-Loop) gates are automatic. /codd-generate pauses between waves for approval. /codd-impact follows the Green/Amber/Gray protocol — auto-updating safe changes, asking before risky ones. You just make decisions.
Now combine Skills with hooks for zero-maintenance coherence:
// .claude/settings.json
{
"hooks": {
"PostToolUse": [{
"matcher": "Edit|Write",
"hooks": [{
"type": "command",
"async": true,
"command": "cd \"$CLAUDE_PROJECT_DIR\" && codd scan --path ."
}]
}]
}
}
-
PostToolUse hook: Every time AI edits a file,
codd scanruns automatically. The dependency graph stays current. -
pre-commit hook: Wire
codd validateinto git hooks so broken coherence can't be committed.
Once you add that hook, you never run codd scan manually again. Graph maintenance becomes invisible. Your workflow is just: edit files normally, run /codd-impact when you want to know what's affected. That's it.
CLI is the manual tool. Skills are the integrated workflow. CLI alone is enough, but Skills + hooks mean coherence is maintained without you ever thinking about it. Set it once. The hook does the rest.
Full setup details: Claude Code x CoDD Setup Guide
5-Minute Demo — See CoDD in Action
We'll build TaskFlow, a task management app. Write requirements in plain text, let CoDD + AI handle the rest.
Step 1: Write requirements (any format — txt, md, doc)
# TaskFlow — Requirements
## Functional Requirements
- User auth (email + Google OAuth)
- Workspace management (teams, roles, invites)
- Task CRUD with assignees, labels, due dates
- Real-time updates (WebSocket)
- File attachments (S3)
- Notification system (in-app + email)
## Constraints
- Next.js + Prisma + PostgreSQL
- Row-level security for workspace isolation
- All API endpoints rate-limited
Save as spec.txt. No special formatting needed.
Step 2: Initialize CoDD
pip install codd-dev
mkdir taskflow && cd taskflow && git init
codd init --project-name "taskflow" --language "typescript" \
--requirements spec.txt
CoDD adds frontmatter automatically — node_id, type, dependency metadata. You never touch it.
Step 3: AI generates design docs
codd generate --wave 2 # System design + API design
codd generate --wave 3 # DB design + Auth design
codd generate --wave 4 # Test strategy
wave_config is auto-generated from your requirements. Each design doc gets CoDD frontmatter — all derived, nothing manual.
Step 4: Build the graph → Change requirements → See impact
codd scan
# → 7 docs, 15 edges, zero config
PM asks for SSO and audit logging. Open docs/requirements/requirements.md and add:
## Additional Requirements (v1.1)
- SAML SSO (enterprise customers)
- Audit logging (record & export all operations)
Save the file and ask CoDD what's affected:
codd impact # detects uncommitted changes automatically
Changed files: 1
- docs/requirements/requirements.md → req:taskflow-requirements
## Green Band (high confidence, auto-propagate)
| Target | Depth | Confidence |
|-------------------------|-------|------------|
| design:system-design | 1 | 0.90 |
| design:api-design | 1 | 0.90 |
| detail:db-design | 2 | 0.90 |
| detail:auth-design | 2 | 0.90 |
## Amber Band (must review)
| Target | Depth | Confidence |
|-------------------------|-------|------------|
| test:test-strategy | 2 | 0.90 |
2 lines changed → 6 out of 7 docs affected. Green band: AI auto-updates. Amber: human reviews. You know exactly what to fix before anything breaks.
Real-World Usage
I've used CoDD on a production web app — 18 design docs connected by a dependency graph.
Every design doc, every line of code, every test was generated by AI following CoDD. I defined requirements and constraints. That's it. You can add HITL (human-in-the-loop) checkpoints at each Wave if you want, or let AI run end-to-end and review only the final output. Your call.
docs/
├── requirements/ # Requirements (plain text — the only human input)
├── design/ # System design (6 files, AI-generated)
├── detailed_design/ # Detailed design (4 files, AI-generated)
├── governance/ # ADRs (3 files)
├── plan/ # Implementation plan
├── test/ # Acceptance criteria, test strategy
├── operations/ # Runbooks
└── infra/ # Infrastructure design
18 Markdown files connected by a dependency graph.
As development progressed, design decisions naturally evolved. The initial RLS approach for tenant isolation needed refinement as detailed design revealed better table structures. That changed the DB design, which changed the API design, which changed the test strategy.
This isn't a "requirements change." It's the normal process of refining decisions as you learn more. It happens on every project.
Without CoDD, a human tracks which of 18 interconnected docs need updating. With CoDD, codd impact lists the affected artifacts with confidence bands. AI auto-updates Green band docs and surfaces Amber band items for human review.
Changes stop being scary. That's CoDD's real value. When the cost of changing a design decision drops, you can make provisional decisions early and refine them later — without the design rotting from stale assumptions that nobody updated.
What's Next
scan, impact, and validate are stable and production-tested. generate, verify, and implement are experimental. CoDD is public alpha today.
Near-term roadmap:
-
Typed dependency relations —
requires,affects,verifies,implementsas distinct relation types -
Brownfield support (
codd extract) — Reverse-generate dependency-graphed design docs from existing codebases. Bring CoDD to legacy code maintenance and enhancement, not just greenfield -
codd verifycompletion — Automated coherence verification across design ↔ code ↔ tests - Multi-harness integration examples — Documented workflows for Claude Code, Copilot, and Cursor
Long-term, CoDD aims to stand alongside TDD, BDD, and DDD as a development methodology. Not test-driven. Not behavior-driven. Not domain-driven. Coherence-driven. Development where anything can change without the design falling apart.
The Evolution, Summarized
| Methodology | Problem it solves | Era |
|---|---|---|
| Prompt Engineering | How to craft AI input | 2023-2024 |
| Context Engineering | How to manage AI's background knowledge | 2024-2025 |
| Harness Engineering | How to constrain AI execution | 2025-2026 |
| └─ CoDD | How to maintain coherence when things change | 2026+ |
CoDD is part of harness engineering — the layer nobody built yet. Existing harness engineering solves "how do I constrain AI execution." CoDD solves "how do I keep what AI built from falling apart," within the same harness framework.
Spec Kit and OpenSpec answer "how do I start?"
CoDD answers "how do I keep going when things change?"
pip install codd-dev
So — what does your project do when things change mid-development?
I also write about multi-agent systems and AI development in Japanese on Zenn. If you read Japanese (or are curious enough to translate), the CoDD deep-dive is here.
Top comments (0)