π TL;DR
agentflow turns Claude Code into a coordinated team of specialized agents. A feature request flows through a fixed pipeline β
BA β branch β plan β implement β review β verify β test β shipβ
where the slow multi-agent middle runs as one dynamic workflow that decomposes the work into file-disjoint waves and fans agents over it in parallel. A human only steps in at two points: approving the spec, and approving the push.
π Table of Contents
- The problem with "one big agent"
- The shape of it
- Two halves: planning & execution
- Inside the workflow: 6 phases
- The gate
- Discipline as a feature
- Skills
- Try it
- Why I built it this way
π΅ The problem with "one big agent"
Most people use AI coding agents the same way: open a chat, describe a feature, and let a single agent run wild across the codebase. It works for toy tasks. It falls apart on real ones.
You've probably hit all of these:
| β Symptom | What actually happens |
|---|---|
| No plan | The agent edits 14 files in random order β and two edits stomp on each other. |
| No review | Whatever it writes is whatever ships. Nobody checks for a swallowed exception or a leaked secret. |
| No tests | "Done" means "it compiled once on my machine." |
| No discipline | It refactors code you didn't ask about, deletes "dead" code that wasn't dead, and force-pushes because it felt confident. |
The instinct is to write a longer system prompt. But a single context window doing everything β planning, coding, reviewing, testing, git β is a generalist with no separation of concerns. The reviewer and the author are the same entity, so the review is theater.
π‘ agentflow takes the opposite bet: give each concern its own agent, wire them into a fixed pipeline, and keep humans only at the decision points that actually matter.
πΊοΈ The shape of it
A feature request flows through this pipeline:
User request
β
βΌ
βββββββββββ asks questions, writes OpenSpec specs β΅ interactive
β @ba β
βββββββββββ
β
βΌ
βββββββββββ creates feature branch from `dev` β΅ interactive
β @devops β
βββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββ
β opsx-feature-core (background workflow) β
β Plan β Implement β Review β Verify β β
β Critic β Test β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
β gate: no blocking findings AND tests pass? β
β no β surface findings β re-run / fix (max 3 rounds)
β yes
βΌ
user approves push β @devops opens PR β `dev` β΅ interactive
β
βΌ
π report
Notice where the human is: at the two ends only. You approve the spec, and you approve the push. Everything in between β the tedious, error-prone part β runs autonomously. (A workflow can't pause to ask a question, so the interactive ends live outside it on purpose.)
You kick the whole thing off with one command:
/opsx:feature
π§© Two halves: planning & execution
agentflow pairs two things usually missing from "one big agent" setups.
1οΈβ£ OpenSpec β structured planning
Before any code is written, the BA agent (running on Opus, because requirements gathering is the high-leverage step) asks you 2β3 rounds of questions, reads your codebase, and writes structured artifacts:
openspec/changes/<change-id>/
βββ proposal.md # what & why
βββ design.md # how
βββ tasks.md # the durable checklist β source of truth
βββ specs/<capability>/spec.md # delta requirements + scenarios
This is the contract. The agents downstream don't get to improvise scope β they implement this.
2οΈβ£ opsx-feature-core β the execution workflow
The heart of agentflow: a single Claude Code workflow that automates everything between "specs exist" and "ready to push." It runs six phases β
βοΈ Inside the workflow: 6 phases
β Plan β decompose into waves
A dev-lead agent reads the OpenSpec change and emits a structured task plan (not prose). The key idea is file-disjoint, dependency-ordered waves:
- π’ Tasks within a wave touch disjoint files β safe to run in parallel.
- π΅ Dependent tasks go in a later wave β prerequisites always finish first.
The plan is forced into a JSON schema so it's machine-consumable:
const PLAN_SCHEMA = {
type: 'object',
properties: {
waves: {
type: 'array',
description: 'Ordered waves. Tasks within a wave touch DISJOINT files ' +
'(safe to run in parallel). Dependent tasks go in a later wave.',
items: { /* tasks: id, title, description, repo, layer, files, commit */ },
},
},
required: ['waves'],
}
π οΈ Design note: an earlier version routed tasks through an external issue DB (Beads) β the planner imported tasks, then DEV agents pulled them back out. Slow, and a dependency on the critical path. Now the planner emits tasks as structured output and the workflow fans agents over it directly, in memory. No external DB blocks the build.
β‘ Implement β parallel DEV agents per wave
For each wave, dev-be (Python/FastAPI) and dev-fe (React/Vite) pick up their tasks in parallel β safe, because the planner guaranteed disjoint files. Each task is self-contained; DEV agents never re-read the full spec. They commit on the branch. They never push.
β±οΈ Waves run sequentially; tasks within a wave run concurrently. Wall-clock cost β number of waves, not number of tasks.
β’ Review β four specialists fan out
Once the diff exists, four reviewers run simultaneously, each with a narrow mandate:
| π Reviewer | Hunts for |
|---|---|
python-reviewer |
Python correctness, idioms, type issues |
typescript-reviewer |
TS/React correctness and patterns |
security-reviewer |
OWASP Top 10, leaked secrets |
silent-failure-hunter |
Swallowed errors, empty catch, bad fallbacks |
β οΈ The reviewer is not the author. That separation is what makes the review real instead of self-congratulatory.
β£ Verify β adversarial voting
Reviewers produce false positives. So every CRITICAL/HIGH finding is put to a vote: three independent "perspective lenses" judge whether it's real. A finding is kept only on a β₯ 2/3 vote β otherwise dropped as noise. Runs on a cheaper model (focused yes/no, not open-ended analysis).
β€ Critic β completeness check
A completeness critic asks the opposite question: what did the reviewers miss? It hunts specifically for HIGH/CRITICAL issues that slipped through the four specialists β the "what's not here" pass that single-agent setups never do.
β₯ Test β write missing tests, then run them
The tester agent writes the unit tests that should exist for the new code, then runs the full suite. No green suite, no gate.
π¦ The gate, and why it matters
After the workflow runs, there's a hard gate:
blockingFindings === 0 AND tests pass β β
ready to push
If not, the workflow surfaces the findings and loops β re-running or applying targeted fixes β for up to 3 rounds before escalating to you. There is no path where unreviewed, untested code reaches a "ready" state.
And then it stops. agentflow never pushes or opens a PR on its own. It commits locally and waits. Only after you approve does the devops agent push branches and open one PR per repo into dev.
π Leaving work local-only is acceptable. Pushing without permission is not.
π§± Discipline as a first-class feature
The part I'm most opinionated about isn't the orchestration β it's the constraints. agentflow bakes in rules that stop agents from doing the annoying things agents do:
- βοΈ Surgical changes only β touch only what the task requires. Don't "improve" adjacent code. Every changed line traces to the task.
- π Surface, don't assume β if a task has multiple interpretations, present them instead of silently picking one.
- π·οΈ Mention dead code, don't delete it β unrelated dead code gets flagged, not removed.
- π Never push without approval β full stop.
These live in a tool-agnostic AGENTS.md (so Cursor, Aider, or Codex read the same rules), which CLAUDE.md imports for Claude Code specifically.
ποΈ Skills: changing how agents work
Beyond the agents, agentflow ships ~48 behavioral skills β small instruction modules that change how an agent works rather than what it does. Each agent loads only what's relevant to its role:
- π§ͺ Engineering discipline β
test-driven-development,systematic-debugging,verification-before-completion - π Backend β
python-patterns,fastapi-templates,postgres-patterns,api-design-principles - βοΈ Frontend & design β
frontend-patterns,vercel-react-best-practices,web-design-guidelines - π Review & security β
requesting-code-review,security-review
Plus 17 single-purpose commands β /polish Β· /harden Β· /audit Β· /simplify Β· /optimize and more β for tightening existing UI or code on demand.
π¦ Try it in 3 steps
agentflow is a template β nothing is tied to a specific product. You copy it in and replace a handful of placeholders.
# 1. Clone agentflow
git clone https://github.com/VoVuongThanhDat/agentflow.git
# 2. Copy the toolkit into your project
cp -r agentflow/.claude /path/to/your-project/.claude
cp agentflow/CLAUDE.md /path/to/your-project/CLAUDE.md
cp agentflow/AGENTS.md /path/to/your-project/AGENTS.md
# 3. Fill in the Target Repos table + placeholders in AGENTS.md, then:
# /opsx:feature
Replace <PROJECT_NAME> / <specs-repo> / <backend-repo> / <frontend-repo>, fill the Target Repos table (one row per repo agents may touch), then run /opsx:feature, describe what you want, approve the spec, and let the pipeline run. Approve the push when the gate goes green.
π§· Single repo? Treat the specs repo and your code repo as the same directory β the agents handle both layouts.
π€ Why I built it this way
The lesson, for me: autonomy and control aren't opposites β they're a question of where you put the human.
Put a human in the middle of every code edit β you've built a slow pair-programmer.
Remove the human entirely β you've built a liability.
agentflow puts the human at exactly two points β the spec and the push β and makes the machine earn its way past a real gate in between.
π Links
- β Repo: github.com/VoVuongThanhDat/agentflow
- π§© Built with OpenSpec Β· Superpowers Β· Claude Code
- π MIT licensed
If you try it, I'd love to hear what your pipeline produced β drop a comment! π
Top comments (0)