DEV Community

Cover image for From Spec to PR: agentflow's Dynamic Multi-Agent Workflow for Claude Code

From Spec to PR: agentflow's Dynamic Multi-Agent Workflow for Claude Code

πŸš€ TL;DR

agentflow turns Claude Code into a coordinated team of specialized agents. A feature request flows through a fixed pipeline β€”
BA β†’ branch β†’ plan β†’ implement β†’ review β†’ verify β†’ test β†’ ship β€”
where the slow multi-agent middle runs as one dynamic workflow that decomposes the work into file-disjoint waves and fans agents over it in parallel. A human only steps in at two points: approving the spec, and approving the push.


πŸ“‘ Table of Contents


😡 The problem with "one big agent"

Most people use AI coding agents the same way: open a chat, describe a feature, and let a single agent run wild across the codebase. It works for toy tasks. It falls apart on real ones.

You've probably hit all of these:

❌ Symptom What actually happens
No plan The agent edits 14 files in random order β€” and two edits stomp on each other.
No review Whatever it writes is whatever ships. Nobody checks for a swallowed exception or a leaked secret.
No tests "Done" means "it compiled once on my machine."
No discipline It refactors code you didn't ask about, deletes "dead" code that wasn't dead, and force-pushes because it felt confident.

The instinct is to write a longer system prompt. But a single context window doing everything β€” planning, coding, reviewing, testing, git β€” is a generalist with no separation of concerns. The reviewer and the author are the same entity, so the review is theater.

πŸ’‘ agentflow takes the opposite bet: give each concern its own agent, wire them into a fixed pipeline, and keep humans only at the decision points that actually matter.


πŸ—ΊοΈ The shape of it

A feature request flows through this pipeline:

  User request
       β”‚
       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   asks questions, writes OpenSpec specs        ⟡ interactive
  β”‚  @ba    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   creates feature branch from `dev`            ⟡ interactive
  β”‚ @devops β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  opsx-feature-core  (background workflow)      β”‚
  β”‚  Plan β†’ Implement β†’ Review β†’ Verify β†’          β”‚
  β”‚  Critic β†’ Test                                 β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
  β—‡ gate: no blocking findings AND tests pass? β—‡
       β”‚ no  β†’ surface findings β†’ re-run / fix (max 3 rounds)
       β”‚ yes
       β–Ό
  user approves push β†’ @devops opens PR β†’ `dev`    ⟡ interactive
       β”‚
       β–Ό
  πŸ“‹ report
Enter fullscreen mode Exit fullscreen mode

Notice where the human is: at the two ends only. You approve the spec, and you approve the push. Everything in between β€” the tedious, error-prone part β€” runs autonomously. (A workflow can't pause to ask a question, so the interactive ends live outside it on purpose.)

You kick the whole thing off with one command:

/opsx:feature
Enter fullscreen mode Exit fullscreen mode

🧩 Two halves: planning & execution

agentflow pairs two things usually missing from "one big agent" setups.

1️⃣ OpenSpec β€” structured planning

Before any code is written, the BA agent (running on Opus, because requirements gathering is the high-leverage step) asks you 2–3 rounds of questions, reads your codebase, and writes structured artifacts:

openspec/changes/<change-id>/
β”œβ”€β”€ proposal.md   # what & why
β”œβ”€β”€ design.md     # how
β”œβ”€β”€ tasks.md      # the durable checklist  ← source of truth
└── specs/<capability>/spec.md   # delta requirements + scenarios
Enter fullscreen mode Exit fullscreen mode

This is the contract. The agents downstream don't get to improvise scope β€” they implement this.

2️⃣ opsx-feature-core β€” the execution workflow

The heart of agentflow: a single Claude Code workflow that automates everything between "specs exist" and "ready to push." It runs six phases ↓


βš™οΈ Inside the workflow: 6 phases

β‘  Plan β€” decompose into waves

A dev-lead agent reads the OpenSpec change and emits a structured task plan (not prose). The key idea is file-disjoint, dependency-ordered waves:

  • 🟒 Tasks within a wave touch disjoint files β†’ safe to run in parallel.
  • πŸ”΅ Dependent tasks go in a later wave β†’ prerequisites always finish first.

The plan is forced into a JSON schema so it's machine-consumable:

const PLAN_SCHEMA = {
  type: 'object',
  properties: {
    waves: {
      type: 'array',
      description: 'Ordered waves. Tasks within a wave touch DISJOINT files ' +
        '(safe to run in parallel). Dependent tasks go in a later wave.',
      items: { /* tasks: id, title, description, repo, layer, files, commit */ },
    },
  },
  required: ['waves'],
}
Enter fullscreen mode Exit fullscreen mode

πŸ› οΈ Design note: an earlier version routed tasks through an external issue DB (Beads) β€” the planner imported tasks, then DEV agents pulled them back out. Slow, and a dependency on the critical path. Now the planner emits tasks as structured output and the workflow fans agents over it directly, in memory. No external DB blocks the build.

β‘‘ Implement β€” parallel DEV agents per wave

For each wave, dev-be (Python/FastAPI) and dev-fe (React/Vite) pick up their tasks in parallel β€” safe, because the planner guaranteed disjoint files. Each task is self-contained; DEV agents never re-read the full spec. They commit on the branch. They never push.

⏱️ Waves run sequentially; tasks within a wave run concurrently. Wall-clock cost β‰ˆ number of waves, not number of tasks.

β‘’ Review β€” four specialists fan out

Once the diff exists, four reviewers run simultaneously, each with a narrow mandate:

πŸ‘€ Reviewer Hunts for
python-reviewer Python correctness, idioms, type issues
typescript-reviewer TS/React correctness and patterns
security-reviewer OWASP Top 10, leaked secrets
silent-failure-hunter Swallowed errors, empty catch, bad fallbacks

⚠️ The reviewer is not the author. That separation is what makes the review real instead of self-congratulatory.

β‘£ Verify β€” adversarial voting

Reviewers produce false positives. So every CRITICAL/HIGH finding is put to a vote: three independent "perspective lenses" judge whether it's real. A finding is kept only on a β‰₯ 2/3 vote β€” otherwise dropped as noise. Runs on a cheaper model (focused yes/no, not open-ended analysis).

β‘€ Critic β€” completeness check

A completeness critic asks the opposite question: what did the reviewers miss? It hunts specifically for HIGH/CRITICAL issues that slipped through the four specialists β€” the "what's not here" pass that single-agent setups never do.

β‘₯ Test β€” write missing tests, then run them

The tester agent writes the unit tests that should exist for the new code, then runs the full suite. No green suite, no gate.


🚦 The gate, and why it matters

After the workflow runs, there's a hard gate:

  blockingFindings === 0   AND   tests pass   β‡’   βœ… ready to push
Enter fullscreen mode Exit fullscreen mode

If not, the workflow surfaces the findings and loops β€” re-running or applying targeted fixes β€” for up to 3 rounds before escalating to you. There is no path where unreviewed, untested code reaches a "ready" state.

And then it stops. agentflow never pushes or opens a PR on its own. It commits locally and waits. Only after you approve does the devops agent push branches and open one PR per repo into dev.

πŸ”’ Leaving work local-only is acceptable. Pushing without permission is not.


🧱 Discipline as a first-class feature

The part I'm most opinionated about isn't the orchestration β€” it's the constraints. agentflow bakes in rules that stop agents from doing the annoying things agents do:

  • βœ‚οΈ Surgical changes only β€” touch only what the task requires. Don't "improve" adjacent code. Every changed line traces to the task.
  • πŸ™‹ Surface, don't assume β€” if a task has multiple interpretations, present them instead of silently picking one.
  • 🏷️ Mention dead code, don't delete it β€” unrelated dead code gets flagged, not removed.
  • πŸ›‘ Never push without approval β€” full stop.

These live in a tool-agnostic AGENTS.md (so Cursor, Aider, or Codex read the same rules), which CLAUDE.md imports for Claude Code specifically.


πŸŽ›οΈ Skills: changing how agents work

Beyond the agents, agentflow ships ~48 behavioral skills β€” small instruction modules that change how an agent works rather than what it does. Each agent loads only what's relevant to its role:

  • πŸ§ͺ Engineering discipline β€” test-driven-development, systematic-debugging, verification-before-completion
  • 🐍 Backend β€” python-patterns, fastapi-templates, postgres-patterns, api-design-principles
  • βš›οΈ Frontend & design β€” frontend-patterns, vercel-react-best-practices, web-design-guidelines
  • πŸ” Review & security β€” requesting-code-review, security-review

Plus 17 single-purpose commands β€” /polish Β· /harden Β· /audit Β· /simplify Β· /optimize and more β€” for tightening existing UI or code on demand.


πŸ“¦ Try it in 3 steps

agentflow is a template β€” nothing is tied to a specific product. You copy it in and replace a handful of placeholders.

# 1. Clone agentflow
git clone https://github.com/VoVuongThanhDat/agentflow.git

# 2. Copy the toolkit into your project
cp -r agentflow/.claude   /path/to/your-project/.claude
cp    agentflow/CLAUDE.md  /path/to/your-project/CLAUDE.md
cp    agentflow/AGENTS.md  /path/to/your-project/AGENTS.md

# 3. Fill in the Target Repos table + placeholders in AGENTS.md, then:
#    /opsx:feature
Enter fullscreen mode Exit fullscreen mode

Replace <PROJECT_NAME> / <specs-repo> / <backend-repo> / <frontend-repo>, fill the Target Repos table (one row per repo agents may touch), then run /opsx:feature, describe what you want, approve the spec, and let the pipeline run. Approve the push when the gate goes green.

🧷 Single repo? Treat the specs repo and your code repo as the same directory β€” the agents handle both layouts.


πŸ€” Why I built it this way

The lesson, for me: autonomy and control aren't opposites β€” they're a question of where you put the human.

Put a human in the middle of every code edit β†’ you've built a slow pair-programmer.
Remove the human entirely β†’ you've built a liability.

agentflow puts the human at exactly two points β€” the spec and the push β€” and makes the machine earn its way past a real gate in between.


πŸ”— Links

If you try it, I'd love to hear what your pipeline produced β€” drop a comment! πŸ‘‡

Top comments (0)