Võ Vương Thành Đạt

Posted on Jun 1

From Spec to PR: agentflow's Dynamic Multi-Agent Workflow for Claude Code

#ai #webdev #programming #productivity

🚀 TL;DR

agentflow turns Claude Code into a coordinated team of specialized agents. A feature request flows through a fixed pipeline —
BA → branch → plan → implement → review → verify → test → ship —
where the slow multi-agent middle runs as one dynamic workflow that decomposes the work into file-disjoint waves and fans agents over it in parallel. A human only steps in at two points: approving the spec, and approving the push.

😵 The problem with "one big agent"

Most people use AI coding agents the same way: open a chat, describe a feature, and let a single agent run wild across the codebase. It works for toy tasks. It falls apart on real ones.

You've probably hit all of these:

❌ Symptom	What actually happens
No plan	The agent edits 14 files in random order — and two edits stomp on each other.
No review	Whatever it writes is whatever ships. Nobody checks for a swallowed exception or a leaked secret.
No tests	"Done" means "it compiled once on my machine."
No discipline	It refactors code you didn't ask about, deletes "dead" code that wasn't dead, and force-pushes because it felt confident.

The instinct is to write a longer system prompt. But a single context window doing everything — planning, coding, reviewing, testing, git — is a generalist with no separation of concerns. The reviewer and the author are the same entity, so the review is theater.

💡 agentflow takes the opposite bet: give each concern its own agent, wire them into a fixed pipeline, and keep humans only at the decision points that actually matter.

🗺️ The shape of it

A feature request flows through this pipeline:

  User request
       │
       ▼
  ┌─────────┐   asks questions, writes OpenSpec specs        ⟵ interactive
  │  @ba    │
  └─────────┘
       │
       ▼
  ┌─────────┐   creates feature branch from `dev`            ⟵ interactive
  │ @devops │
  └─────────┘
       │
       ▼
  ┌──────────────────────────────────────────────┐
  │  opsx-feature-core  (background workflow)      │
  │  Plan → Implement → Review → Verify →          │
  │  Critic → Test                                 │
  └──────────────────────────────────────────────┘
       │
       ▼
  ◇ gate: no blocking findings AND tests pass? ◇
       │ no  → surface findings → re-run / fix (max 3 rounds)
       │ yes
       ▼
  user approves push → @devops opens PR → `dev`    ⟵ interactive
       │
       ▼
  📋 report

Notice where the human is: at the two ends only. You approve the spec, and you approve the push. Everything in between — the tedious, error-prone part — runs autonomously. (A workflow can't pause to ask a question, so the interactive ends live outside it on purpose.)

You kick the whole thing off with one command:

/opsx:feature

🧩 Two halves: planning & execution

agentflow pairs two things usually missing from "one big agent" setups.

1️⃣ OpenSpec — structured planning

Before any code is written, the BA agent (running on Opus, because requirements gathering is the high-leverage step) asks you 2–3 rounds of questions, reads your codebase, and writes structured artifacts:

openspec/changes/<change-id>/
├── proposal.md   # what & why
├── design.md     # how
├── tasks.md      # the durable checklist  ← source of truth
└── specs/<capability>/spec.md   # delta requirements + scenarios

This is the contract. The agents downstream don't get to improvise scope — they implement this.

2️⃣ opsx-feature-core — the execution workflow

The heart of agentflow: a single Claude Code workflow that automates everything between "specs exist" and "ready to push." It runs six phases ↓

⚙️ Inside the workflow: 6 phases

`① Plan` — decompose into waves

A dev-lead agent reads the OpenSpec change and emits a structured task plan (not prose). The key idea is file-disjoint, dependency-ordered waves:

🟢 Tasks within a wave touch disjoint files → safe to run in parallel.
🔵 Dependent tasks go in a later wave → prerequisites always finish first.

The plan is forced into a JSON schema so it's machine-consumable:

const PLAN_SCHEMA = {
  type: 'object',
  properties: {
    waves: {
      type: 'array',
      description: 'Ordered waves. Tasks within a wave touch DISJOINT files ' +
        '(safe to run in parallel). Dependent tasks go in a later wave.',
      items: { /* tasks: id, title, description, repo, layer, files, commit */ },
    },
  },
  required: ['waves'],
}

🛠️ Design note: an earlier version routed tasks through an external issue DB (Beads) — the planner imported tasks, then DEV agents pulled them back out. Slow, and a dependency on the critical path. Now the planner emits tasks as structured output and the workflow fans agents over it directly, in memory. No external DB blocks the build.

`② Implement` — parallel DEV agents per wave

For each wave, dev-be (Python/FastAPI) and dev-fe (React/Vite) pick up their tasks in parallel — safe, because the planner guaranteed disjoint files. Each task is self-contained; DEV agents never re-read the full spec. They commit on the branch. They never push.

⏱️ Waves run sequentially; tasks within a wave run concurrently. Wall-clock cost ≈ number of waves, not number of tasks.

`③ Review` — four specialists fan out

Once the diff exists, four reviewers run simultaneously, each with a narrow mandate:

👀 Reviewer	Hunts for
`python-reviewer`	Python correctness, idioms, type issues
`typescript-reviewer`	TS/React correctness and patterns
`security-reviewer`	OWASP Top 10, leaked secrets
`silent-failure-hunter`	Swallowed errors, empty `catch`, bad fallbacks

⚠️ The reviewer is not the author. That separation is what makes the review real instead of self-congratulatory.

`④ Verify` — adversarial voting

Reviewers produce false positives. So every CRITICAL/HIGH finding is put to a vote: three independent "perspective lenses" judge whether it's real. A finding is kept only on a ≥ 2/3 vote — otherwise dropped as noise. Runs on a cheaper model (focused yes/no, not open-ended analysis).

`⑤ Critic` — completeness check

A completeness critic asks the opposite question: what did the reviewers miss? It hunts specifically for HIGH/CRITICAL issues that slipped through the four specialists — the "what's not here" pass that single-agent setups never do.

`⑥ Test` — write missing tests, then run them

The tester agent writes the unit tests that should exist for the new code, then runs the full suite. No green suite, no gate.

🚦 The gate, and why it matters

After the workflow runs, there's a hard gate:

  blockingFindings === 0   AND   tests pass   ⇒   ✅ ready to push

If not, the workflow surfaces the findings and loops — re-running or applying targeted fixes — for up to 3 rounds before escalating to you. There is no path where unreviewed, untested code reaches a "ready" state.

And then it stops. agentflow never pushes or opens a PR on its own. It commits locally and waits. Only after you approve does the devops agent push branches and open one PR per repo into dev.

🔒 Leaving work local-only is acceptable. Pushing without permission is not.

🧱 Discipline as a first-class feature

The part I'm most opinionated about isn't the orchestration — it's the constraints. agentflow bakes in rules that stop agents from doing the annoying things agents do:

✂️ Surgical changes only — touch only what the task requires. Don't "improve" adjacent code. Every changed line traces to the task.
🙋 Surface, don't assume — if a task has multiple interpretations, present them instead of silently picking one.
🏷️ Mention dead code, don't delete it — unrelated dead code gets flagged, not removed.
🛑 Never push without approval — full stop.

These live in a tool-agnostic AGENTS.md (so Cursor, Aider, or Codex read the same rules), which CLAUDE.md imports for Claude Code specifically.

🎛️ Skills: changing how agents work

Beyond the agents, agentflow ships ~48 behavioral skills — small instruction modules that change how an agent works rather than what it does. Each agent loads only what's relevant to its role:

🧪 Engineering discipline — test-driven-development, systematic-debugging, verification-before-completion
🐍 Backend — python-patterns, fastapi-templates, postgres-patterns, api-design-principles
⚛️ Frontend & design — frontend-patterns, vercel-react-best-practices, web-design-guidelines
🔐 Review & security — requesting-code-review, security-review

Plus 17 single-purpose commands — /polish · /harden · /audit · /simplify · /optimize and more — for tightening existing UI or code on demand.

📦 Try it in 3 steps

agentflow is a template — nothing is tied to a specific product. You copy it in and replace a handful of placeholders.

# 1. Clone agentflow
git clone https://github.com/VoVuongThanhDat/agentflow.git

# 2. Copy the toolkit into your project
cp -r agentflow/.claude   /path/to/your-project/.claude
cp    agentflow/CLAUDE.md  /path/to/your-project/CLAUDE.md
cp    agentflow/AGENTS.md  /path/to/your-project/AGENTS.md

# 3. Fill in the Target Repos table + placeholders in AGENTS.md, then:
#    /opsx:feature

Replace <PROJECT_NAME> / <specs-repo> / <backend-repo> / <frontend-repo>, fill the Target Repos table (one row per repo agents may touch), then run /opsx:feature, describe what you want, approve the spec, and let the pipeline run. Approve the push when the gate goes green.

🧷 Single repo? Treat the specs repo and your code repo as the same directory — the agents handle both layouts.

🤔 Why I built it this way

The lesson, for me: autonomy and control aren't opposites — they're a question of where you put the human.

Put a human in the middle of every code edit → you've built a slow pair-programmer.
Remove the human entirely → you've built a liability.

agentflow puts the human at exactly two points — the spec and the push — and makes the machine earn its way past a real gate in between.

🔗 Links

⭐ Repo: github.com/VoVuongThanhDat/agentflow
🧩 Built with OpenSpec · Superpowers · Claude Code
📄 MIT licensed

If you try it, I'd love to hear what your pipeline produced — drop a comment! 👇

DEV Community

From Spec to PR: agentflow's Dynamic Multi-Agent Workflow for Claude Code

🚀 TL;DR

📑 Table of Contents

😵 The problem with "one big agent"

🗺️ The shape of it

🧩 Two halves: planning & execution

1️⃣ OpenSpec — structured planning

2️⃣ opsx-feature-core — the execution workflow

⚙️ Inside the workflow: 6 phases

`① Plan` — decompose into waves

`② Implement` — parallel DEV agents per wave

`③ Review` — four specialists fan out

`④ Verify` — adversarial voting

`⑤ Critic` — completeness check

`⑥ Test` — write missing tests, then run them

🚦 The gate, and why it matters

🧱 Discipline as a first-class feature

🎛️ Skills: changing how agents work

📦 Try it in 3 steps

🤔 Why I built it this way

🔗 Links

Top comments (0)

🚀 TL;DR

📑 Table of Contents

😵 The problem with "one big agent"

🗺️ The shape of it

🧩 Two halves: planning & execution

1️⃣ OpenSpec — structured planning

2️⃣ opsx-feature-core — the execution workflow

⚙️ Inside the workflow: 6 phases

① Plan — decompose into waves

② Implement — parallel DEV agents per wave

③ Review — four specialists fan out

④ Verify — adversarial voting

⑤ Critic — completeness check

⑥ Test — write missing tests, then run them

🚦 The gate, and why it matters

🧱 Discipline as a first-class feature

🎛️ Skills: changing how agents work

📦 Try it in 3 steps

🤔 Why I built it this way

🔗 Links

`① Plan` — decompose into waves

`② Implement` — parallel DEV agents per wave

`③ Review` — four specialists fan out

`④ Verify` — adversarial voting

`⑤ Critic` — completeness check

`⑥ Test` — write missing tests, then run them