DEV Community: Reymond

How I Set Up Codex for Spec-Driven Development

Reymond — Wed, 06 May 2026 08:07:17 +0000

I wanted Codex to feel like a reliable teammate, not a fast autocomplete that occasionally rewrites half my repo.

The shift that worked for me was simple:

No approved spec, no code changes.

This post is my real setup flow based on my init.md blueprint in spec-driven-template-codex, plus how I actually use it day to day when building features.

The System in One Flow

User request
   |
   v
spec-architect drafts task spec
   |
   v
human approval gate
   |
   v
agent-router picks specialist
   |
   v
specialist implements inside scope_in only
   |
   v
validation (npm run verify)
   |
   v
commit with spec deletion + evidence
   |
   v
full-branch PR review

This ordering matters more than any individual prompt trick.

What I Build First in a New Repo

My init.md breaks setup into explicit tasks. In practice, I treat them as six foundation layers.

1) Project Standard Files (`CODEX.md` + `AGENTS.md`)

I keep two top-level files:

CODEX.md is the canonical contract.
AGENTS.md is the loader that tells Codex to follow that contract.

CODEX.md carries the rules that I do not want to renegotiate per session:

command list (dev, build, lint, verify)
architecture boundaries
domain routing table
commit policy
the hard workflow gates

I keep this file direct and non-negotiable. If a rule is optional, I remove it.

2) Behavioral Blueprint in `.codex/WORKFLOW.md`

This file is where behavior is encoded, not implied.

My key block is:

first principle: never implement without an approved spec
spec-first gate on every request
architect mode when no spec exists
mandatory subagent chain (spec-architect -> agent-router -> specialist)
model enforcement (model + model_reasoning_effort on every agent)
evidence gate tied to deleted specs

I use .codex/STRATEGY.md as the stable "why" and .codex/WORKFLOW.md as the executable "how".

3) Agent Topology in `.codex/agents/*.toml`

I split responsibilities so one agent is not making every decision end-to-end.

Core agents:

spec-architect: plans and drafts specs only
agent-router: reads approved specs and dispatches
domain specialists: implement only in scope_in
pr-reviewer: branch-level quality gate

A detail that made my setup much more predictable: every agent file pins both model and model_reasoning_effort. I do not allow inheritance.

My usual pattern:

strongest reasoning for architecture and review
medium reasoning for implementation specialists
lower-cost, fast routing for dispatch-only work

4) Spec Template as the Unit of Work

Each task is a TASK-YYYY-MM-DD-###.spec.md with strict front matter:

goal
scope_in and scope_out
constraints
validation
status
collaborators and design flags when needed

The point is not bureaucracy. The point is forcing clarity before edits begin.

I keep tasks small enough to finish in around 30 minutes. If I cannot describe it that tightly, it usually means I am hiding complexity.

5) Hard Guardrails with Hooks

This is where workflow stops being "best effort."

I add .codex/hooks/workflow-guard.sh and wire it through .codex/hooks.json (or inline in .codex/config.toml).

The guard blocks patterns that silently damage quality:

git commit --no-verify
broad staging like git add .
commit attempts without staged spec deletion
missing required agent files
missing model or model_reasoning_effort fields
missing or invalid evidence JSON for deleted specs
mismatch between evidence model values and pinned agent models

The important behavior is that policy is enforced at command time, not remembered manually.

6) Evidence + Memory

For every completed spec, I track chain evidence in:

.codex/evidence/agent-chain/<spec-id>.json

I record:

agent name
model used
chain step (architect, router, specialist)
timestamp
success status

I also initialize .codex/memory/ for persistent preferences and constraints so sessions start with context instead of re-discovery.

My Day-to-Day Execution Pattern

Once the repo is bootstrapped, feature work becomes very repeatable.

Step 1: Request -> Draft Spec

I start by spawning spec-architect and asking it to create or update a spec.

If there is no approved spec, no implementation is allowed.

Step 2: Approve Before Code

I keep status flow explicit:

draft -> approved -> in_progress -> done|blocked

Approval is where I catch wrong assumptions early, before diff churn begins.

Step 3: Route by Domain

I spawn agent-router on the approved spec.

If domain is clear, route to one specialist.
If domain is mixed, split specs first.
Parallel only for truly non-overlapping owned paths.

Step 4: Implement Only Inside Scope

Specialists are constrained by spec boundaries.

No "while I'm here" changes.
No opportunistic refactors outside scope.

This keeps diffs reviewable and rollback-friendly.

Step 5: Validate and Commit Under Policy

I run npm run verify, then commit with strict formatting.

My commit gate expects spec lifecycle completion behavior, including spec deletion and matching evidence when required by workflow.

Step 6: Run PR-Level Review

After feature specs are done, I run a full-branch review.

That catches regressions that are invisible when you only inspect one task at a time.

What Changed After I Adopted This

Three practical improvements stood out.

1) Fewer accidental repo-wide edits

Explicit scope_in stopped many "small change" cascades.

2) Faster reviews

Review conversation shifted from "what happened?" to "is this the right behavior?" because intent was already encoded in specs.

3) Better handoffs across days

When I pause and resume later, I continue from spec status and evidence instead of reconstructing context from raw diffs.

Common Failure Modes I Guard Against

"This is too small for a spec"

Small tasks are where process drift starts. I still create a tiny spec.

"Let's skip verify once"

If verify is painful, optimize verify. Skipping it just moves failure later.

"Agent touched unrelated files"

I treat that as workflow failure, not a harmless side effect. I re-scope and rerun.

"We can commit now and clean evidence later"

I avoid deferred compliance. Evidence exists to prove the actual chain that happened.

Minimal Setup Order If You Want to Copy This

If you are starting fresh, this is the shortest safe sequence:

Create CODEX.md and AGENTS.md
Add specs/templates/TASK.spec.template.md
Add .codex/WORKFLOW.md and .codex/STRATEGY.md
Create core agents in .codex/agents/
Enable hooks in .codex/config.toml and wire workflow-guard.sh
Add evidence schema path under .codex/evidence/agent-chain/
Test blocked and allowed commit scenarios

If step 7 is skipped, your rules are probably not real yet.

Final Takeaway

My Codex setup works because it converts process from documentation into enforcement:

specs define intent
agents separate responsibilities
hooks enforce non-negotiable policies
evidence proves what actually ran
PR review validates system-level safety

I still iterate prompts, but prompts are now the smallest part of the system.

The bigger win is having a workflow that stays stable even when tasks, tools, or models change.

How I Set Up My Claude Workflow (Spec-Driven and Easy to Follow)

Reymond — Tue, 05 May 2026 14:31:38 +0000

I used to jump straight into code with Claude. It felt fast, but I kept paying for it later: missed edge cases, messy commits, and "what changed?" moments during review.

So I rebuilt my setup around one idea:

Slow down before coding, so coding goes faster.

This post explains exactly how I set up my workflow from my init.md template, why each part exists, and what alternatives I might try later.

The Workflow in One Picture

User request
   |
   v
spec-architect (creates task spec)
   |
   v
Human approval (required)
   |
   v
agent-router (chooses the right specialist)
   |
   v
specialist agent(s) implement only inside scope
   |
   v
validation (npm run verify)
   |
   v
single-line commit + delete finished spec
   |
   v
PR reviewer checks full branch diff

If you only remember one thing, remember this:

No spec, no code.

Why I Changed My Old Approach

My old approach was "ask Claude, get code, patch later." It works for small one-offs, but on real projects it breaks down:

Context drift: the assistant forgets intent across long sessions.
Scope creep: "small fix" turns into touching five unrelated files.
Weak handoffs: if I return the next day, I lose the reasoning trail.
Review pain: reviewers see diffs, but not the decision process.

The new setup solves that by forcing an explicit path from request to commit.

The Core Pieces I Set Up

My template has 11 setup tasks. In plain language, they boil down to six pillars:

A project contract (CLAUDE.md)
Agent roles (architect, router, specialists, reviewer)
A strict spec template
Behavioral playbook (.claude/CLAUDE.md)
Hard enforcement (permissions + hooks)
Memory that persists across sessions

1) Project Contract (`CLAUDE.md`)

This file tells Claude:

which commands exist (dev, test, verify)
architecture boundaries
routing table for domains
commit policy and safety rules

I treat this like an engineering contract, not notes.

2) Agent Roles Instead of One "Do Everything" Assistant

I split responsibilities into clear workers:

spec-architect: turns request into atomic tasks (<= 30 min each)
agent-router: dispatches to the right domain specialist
specialists: implement only inside owned paths
pr-reviewer: reviews full branch diff at the end

This prevents one agent from improvising across the whole repo.

3) Spec Template (the Unit of Work)

Each change gets a TASK-YYYY-MM-DD-###.spec.md file with fields like:

goal
scope_in
scope_out
constraints
validation
status

A spec is tiny, explicit, and reviewable.

4) Behavioral Playbook

I keep a second file (.claude/CLAUDE.md) that says what to do in real time:

start protocol
pre-action gate
exact sequence for code-change requests
what counts as a workflow violation

Think of root CLAUDE.md as policy and .claude/CLAUDE.md as execution checklist.

5) Two-Layer Guardrails

This is the part that made the biggest difference.

Layer 1 blocks risky patterns before execution (deny list).
Layer 2 inspects runtime command context with a hook script.

Attempted command
   |
   +--> Layer 1: permissions deny list
   |      - blocks: --no-verify, --force push, git add .
   |
   +--> Layer 2: workflow-guard.sh
          - blocks: commit without spec
          - blocks: artifact staging
          - blocks: multiline or co-authored-by commit formats

This means I don't rely on memory or discipline alone. The system enforces the behavior.

6) Memory System

I keep persistent memory files for:

user preferences
team conventions
project constraints
repeated corrections

So each session starts with context, not from zero.

The Exact Request-to-Commit Flow

Here is my real flow for every code change.

Step 1: Spec First

I ask Claude to create spec(s) from the request.

Example request:

"Add retry logic to scraper API and surface retry count in admin UI."

Typical output from spec-architect:

Spec A (tools-domain): API retry behavior
Spec B (admin-domain + design collaborator): retry count UI

If a task can't fit in about 30 minutes, it gets split.

Step 2: Human Approval Gate

Specs stay draft until I approve.

This is where I fix assumptions before code exists, which is much cheaper than fixing code later.

Step 3: Route to the Right Specialist

agent-router reads the approved spec and dispatches.

Sequential by default
Parallel only if truly independent and non-overlapping paths

Step 4: Implement Inside Scope

Specialist agent implements only inside scope_in, respects scope_out, then runs validation.

No "while I'm here" edits.

Step 5: Commit Discipline

One spec equals one commit.

run validation
explicit git add <file1> <file2>
single-line commit format: type(scope): description
delete completed spec file

Then move to the next spec.

Step 6: PR-Level Review

After all specs are done, pr-reviewer checks full branch diff against base branch.

This catches cross-spec regressions that per-task review can miss.

A Simple End-to-End Example

Let me show one tiny scenario:

Request: "Add CSV import button in catalog page"

Spec 1 (catalog-domain)
- goal: add button + file input + happy-path upload
- scope_in: catalog page + upload component
- scope_out: auth module, billing module
- validation: npm run verify

Spec 2 (backend)
- goal: accept CSV endpoint + validation errors
- scope_in: api route + service
- scope_out: db schema

Flow:
spec-architect -> approval -> router -> catalog specialist + backend specialist -> verify -> commits -> PR review

Because scope is explicit, I avoid accidental touching of unrelated modules.

Why This Workflow Works (for Me)

It separates thinking from typing

Specs force me to design first, code second.

It makes context explicit

I don't depend on "assistant memory vibes." Scope and constraints are written down.

It improves recoverability

If I stop mid-work, I can resume from specs and status instantly.

It reduces blast radius

Owned paths and scope boundaries prevent silent repo-wide changes.

It makes reviews faster

Reviewers can trace: request -> spec -> diff -> validation.

What Usually Breaks and How I Handle It

"This is just a one-liner"

One-liners are where policy violations start. I still create a small spec.

"Can we skip validation this time?"

No. If validation is too slow, optimize validation. Don't skip it.

"The agent touched out-of-scope files"

I stop and correct via new scoped spec. I don't merge "almost correct" behavior.

My Current Folder Layout

project/
├── CLAUDE.md
├── specs/
│   ├── TASK-2026-05-04-001.spec.md
│   └── templates/task.spec.template.md
└── .claude/
    ├── CLAUDE.md
    ├── settings.local.json
    ├── hooks/workflow-guard.sh
    └── agents/
        ├── spec-architect.md
        ├── agent-router.md
        ├── pr-reviewer.md
        ├── backend-specialist.md
        ├── database-specialist.md
        ├── ui-ux-frontend-design-specialist.md
        └── *-domain-specialist.md

Alternatives I Want to Explore Next

This setup works well, but I still want to experiment.

Alternative 1: Lightweight Mode for Tiny Repos

For personal throwaway projects:

keep spec + approval
collapse router + specialist into one constrained specialist
keep hooks mandatory

This may cut overhead while preserving safety.

Alternative 2: Stronger CI-Based Enforcement

Right now guardrails run locally in Claude Code. Next step is mirroring the same checks in CI:

reject non-conforming commit messages
reject artifact staging patterns
reject PRs without validation status

That makes enforcement team-wide, not machine-local.

Alternative 3: Test-First Specs

I currently define validation commands in specs. I want to push this further:

require failing test case in spec for bug fixes
require new test case mapping for new behavior

This makes completion criteria even more objective.

Alternative 4: Domain Risk Scoring

Not all tasks need the same process depth. I want an auto score in spec-architect:

low risk: single-spec path
high risk: mandatory collaborator + expanded review checklist

Alternative 5: Multi-Tool Strategy

I use Claude as primary orchestrator. I may test a hybrid path where:

Claude handles decomposition + routing
another tool handles narrow code transforms
same spec/hook rules apply

The key is keeping the workflow stable even if tools change.

Practical Advice if You Want to Copy This

Start small. Do this in order:

Add the spec template
Add pre-action gate rules in CLAUDE.md
Add hook + deny list
Add router and one domain specialist
Expand domain specialists only when needed

If you skip hard enforcement, the workflow eventually degrades.

Final Takeaway

My Claude workflow is not "just prompting better." It is a small operating system for code changes:

specs define intent
agents separate responsibilities
hooks enforce rules
review closes the loop

It works because it is explicit, constrained, and difficult to bypass.

If you are already shipping production code with AI assistants, this is the shift I recommend first: make the process executable, not just documented.

DEV Community: Reymond

How I Set Up Codex for Spec-Driven Development

The System in One Flow

What I Build First in a New Repo

1) Project Standard Files (CODEX.md + AGENTS.md)

2) Behavioral Blueprint in .codex/WORKFLOW.md

3) Agent Topology in .codex/agents/*.toml

4) Spec Template as the Unit of Work

5) Hard Guardrails with Hooks

6) Evidence + Memory

My Day-to-Day Execution Pattern

Step 1: Request -> Draft Spec

Step 2: Approve Before Code

Step 3: Route by Domain

Step 4: Implement Only Inside Scope

Step 5: Validate and Commit Under Policy

Step 6: Run PR-Level Review

What Changed After I Adopted This

1) Fewer accidental repo-wide edits

2) Faster reviews

3) Better handoffs across days

Common Failure Modes I Guard Against

"This is too small for a spec"

"Let's skip verify once"

"Agent touched unrelated files"

"We can commit now and clean evidence later"

Minimal Setup Order If You Want to Copy This

Final Takeaway

How I Set Up My Claude Workflow (Spec-Driven and Easy to Follow)

The Workflow in One Picture

Why I Changed My Old Approach

The Core Pieces I Set Up

1) Project Contract (CLAUDE.md)

2) Agent Roles Instead of One "Do Everything" Assistant

3) Spec Template (the Unit of Work)

4) Behavioral Playbook

5) Two-Layer Guardrails

6) Memory System

The Exact Request-to-Commit Flow

Step 1: Spec First

Step 2: Human Approval Gate

Step 3: Route to the Right Specialist

Step 4: Implement Inside Scope

Step 5: Commit Discipline

Step 6: PR-Level Review

A Simple End-to-End Example

Why This Workflow Works (for Me)

It separates thinking from typing

It makes context explicit

It improves recoverability

It reduces blast radius

It makes reviews faster

What Usually Breaks and How I Handle It

"This is just a one-liner"

"Can we skip validation this time?"

"The agent touched out-of-scope files"

My Current Folder Layout

Alternatives I Want to Explore Next

Alternative 1: Lightweight Mode for Tiny Repos

Alternative 2: Stronger CI-Based Enforcement

Alternative 3: Test-First Specs

Alternative 4: Domain Risk Scoring

Alternative 5: Multi-Tool Strategy

Practical Advice if You Want to Copy This

Final Takeaway

1) Project Standard Files (`CODEX.md` + `AGENTS.md`)

2) Behavioral Blueprint in `.codex/WORKFLOW.md`

3) Agent Topology in `.codex/agents/*.toml`

1) Project Contract (`CLAUDE.md`)