TL;DR
- I built a sub-agent workflow framework for Claude Code that solved context exhaustion through specialized agents and structured workflows
- For 8 months, Codex CLI had no sub-agents — the framework was Claude Code-only
- Codex finally shipped sub-agent support — I expected days of migration, it took an afternoon
- What surprised me most: if you design workflows around agent roles and context separation rather than tool-specific features, your investment survives platform shifts
The 8-Month Wait
Back in July 2025, I released the first version of this workflow as a Claude Code boilerplate. By October 2025, it had evolved into a full sub-agent framework — specialized agents for every phase of development, from requirements analysis through TDD implementation through quality gates. The idea was pretty simple: break complex coding tasks into specialized roles (requirement analyzer, technical designer, task executor, quality fixer...), give each agent a fresh context, and orchestrate them through structured handoffs. No single agent ever hits the context ceiling because no single agent tries to do everything.
The problem? Codex CLI had no sub-agent capability. Codex had been around since mid-2025, and I wanted the same workflow there too. So I kept trying to bridge the gap.
First, I built an MCP server in August 2025 that let any MCP-compatible tool — Codex, Cursor, whatever — define and spawn sub-agents through a standard protocol. It worked, but MCP added a layer of indirection that wasn't there in Claude Code's native sub-agents.
Then in December 2025, Codex shipped experimental Agent Skills support. I saw an opening and built sub-agents-skills — cross-LLM sub-agent orchestration packaged as Agent Skills, routing tasks to Codex, Claude Code, Cursor, or Gemini. Closer, but still not native sub-agents.
Through all of this, my main development stayed on Claude Code. The context separation and the small context windows of the time made it the clear choice for serious work. Codex filled a supporting role — I used it for skills refinement and as an objective reviewer on complex implementations, a fresh set of eyes from a different LLM.
I don't use hooks extensively — I prefer keeping tasks small and baking quality gates into the completion criteria themselves. So what I was really waiting for was native sub-agent support in Codex, which would let the full orchestration workflow run without workarounds.
On March 16, 2026, Codex CLI shipped sub-agent support. During pre-release validation, I noticed something encouraging: Codex followed the workflow stopping points more strictly than expected. If the behavior stabilizes, it could be a viable primary development tool, not just a supporting one.
The port took almost no effort.
What "Near-Zero Migration" Actually Looks Like
When I say "the same framework," I mean it. The core architecture didn't change:
User Request
↓
requirement-analyzer → scale determination [STOP for confirmation]
↓
technical-designer → Design Doc
↓
document-reviewer [STOP for approval]
↓
work-planner → phased task breakdown [STOP]
↓
task-decomposer → atomic task files
↓
Per-task 4-step cycle:
task-executor → escalation check → quality-fixer → git commit
22 sub-agents. 26 skills. The same stopping points, the same quality gates, the same TDD enforcement.
What changed was the container format, not the content:
| Aspect | Claude Code | Codex CLI |
|---|---|---|
| Agent definitions | Markdown with YAML frontmatter (agents/*.md) |
TOML files (.codex/agents/*.toml) |
| Skills location | skills/ |
.agents/skills/ |
| Tool declarations | Explicit in frontmatter (tools: Read, Grep, Glob...) |
Not needed (inferred from sandbox mode) |
| Skill references | Comma-separated names |
[[skills.config]] arrays |
| Config directory | .claude/ |
.codex/ |
That's it. The agent instructions — the actual substance of what each agent knows and does — are the same. The workflow logic is the same. The quality criteria are the same.
Why This Worked: Design Decisions That Paid Off
It worked for a surprisingly simple reason — three choices I made early on:
1. Natural Language as the Interface Layer
Every sub-agent's behavior is defined in natural language instructions, not in platform-specific tool calls. The requirement-analyzer isn't wired to Claude Code's Agent tool or Codex's spawn_agent — it follows a written protocol: "Extract task type, determine scale (1-2 files = Small, 3-5 = Medium, 6+ = Large), identify ADR necessity, output structured JSON."
This means the instructions work on any LLM-powered agent system that can read text and follow procedures. In practice, that turned out to be enough. The framework is fundamentally a set of well-written job descriptions, not a set of API integrations.
2. Context Separation as Architecture
The core insight from the original article still applies: each agent runs in a fresh context without inheriting bias from previous steps. The document-reviewer doesn't know what the technical-designer was "thinking" — it just reviews the output. The investigator explores without confirmation bias from whoever reported the bug.
This isn't a Claude Code feature or a Codex feature. It's an architectural pattern that happens to be implementable on both platforms once they support sub-agents.
3. Structured Handoffs Over Shared State
Agents communicate through artifacts (documents, JSON outputs, task files), not through shared memory or conversation threading. The technical-designer writes a Design Doc. The work-planner reads that Design Doc. Neither needs to know which platform spawned the other.
docs/
├── prd/ # PRD artifacts
├── adr/ # Architecture decision records
├── design/ # Design documents
├── plans/ # Work plans
│ └── tasks/ # Atomic task files (1 commit each)
This file-based protocol turned out to be surprisingly platform-agnostic.
The Framework in Action
codex-workflows ships as an npm package:
npx codex-workflows install
This drops the full agent and skill infrastructure into your project. From there, you invoke recipe skills that orchestrate the sub-agents:
| Recipe | What It Does |
|---|---|
$recipe-implement |
Full lifecycle: requirements → design → implementation → quality |
$recipe-design |
Requirements → design documents only |
$recipe-build |
Execute pre-planned tasks autonomously |
$recipe-diagnose |
Problem investigation → root cause → solutions |
$recipe-front-build |
React/TypeScript implementation with frontend-specific agents |
$recipe-fullstack-implement |
Cross-layer features with layer-aware task routing |
Each recipe automatically loads the right foundational skills (coding standards, testing principles, documentation criteria) and spawns the right sequence of agents with the right stopping points.
Here's what a real session looks like. I ran $recipe-implement Add a login screen with email/password and a simple auth API in Codex CLI against a fresh project with placeholder files:
1. Requirement analysis — the first mandatory stop:
• Spawned Plato [requirement-analyzer] (gpt-5.4-mini medium)
• Updated requirement analysis is now confirmed:
- Scope: email/password login form, local mock auth abstraction,
hardcoded success credentials demo@example.com / password123,
inline success message only
- Route: fullstack
- Scale: medium
- ADR: not required
Please confirm this analysis and answer these three questions:
1. Should the auth API be a real HTTP endpoint, or is a local/mock
API abstraction enough?
2. What credentials should succeed initially?
3. On successful login, should it only show success, navigate
somewhere, or persist auth state too?
The workflow stops and waits. No implementation starts until I confirm scope.
2. Parallel design — backend and frontend at the same time:
• Spawned Volta [technical-designer] (gpt-5.4-mini medium)
└ Create a backend Design Doc...
• Spawned Pasteur [technical-designer-frontend] (gpt-5.4-mini medium)
└ Create a frontend Design Doc...
• Spawned Epicurus [document-reviewer] (gpt-5.4-mini medium)
└ Review login-backend-design.md...
• Spawned Bohr [document-reviewer] (gpt-5.4-mini medium)
└ Review login-frontend-design.md...
• Spawned Euclid [design-sync] (gpt-5.4-mini medium)
└ Verify consistency between Design Docs...
Five agents in parallel: two designers, two reviewers, one cross-layer sync checker. Each running in its own fresh context.
3. Design gate — another mandatory stop:
• The design gate is now clean enough to stop for approval.
- Backend design: approved
- Frontend design: approved
- Cross-layer sync: NO_CONFLICTS
Aligned contract:
- Success: { ok: true }
- Failure: { ok: false, error }
- Inline success copy is UI-owned, not backend-owned
Please approve the design docs so I can move to
acceptance-test generation and the work plan.
4. Autonomous execution after batch approval:
› batch approval
• Spawned Ohm [task-decomposer] (gpt-5.4-mini medium)
• Verification passed:
- npm test
- npm run build
After batch approval, the framework decomposed the work plan into tasks and executed them autonomously — no more stopping points until the quality gates pass.
The whole flow from $recipe-implement to green tests took one session. The same flow, the same stopping points, the same agent roles that I've been running on Claude Code for months.
Getting Started
The framework is open source:
- Codex CLI version: codex-workflows
- Claude Code version: claude-code-workflows
If you're already using the Claude Code version, the Codex version follows the same patterns. If you're new to both, pick whichever CLI you're already using — the workflow knowledge transfers either way.
npx codex-workflows install
The whole port changed config file formats and directory conventions. The agent instructions — the part that actually matters — didn't need a single edit. That's the thing I'd want to know if I were deciding whether to invest time in workflow design for AI coding tools.
If you've been running sub-agent workflows with either Claude Code or Codex CLI, I'd be curious how your setup compares. What worked? What broke?
Top comments (0)