DEV Community

林宗賢
林宗賢

Posted on

I built a 9-agent AI dev team in a Claude Code plugin — here's what happened

The moment I realized AI coding assistants were broken

I was building a side project — a simple task manager app. I opened Claude Code, typed:

"Add user authentication with email and password login"

…and hit enter.

Twenty minutes later, I had code. A lot of code. Authentication logic, routes, middleware, even some basic tests.

But there was a problem.

The frontend (me, on a different day) had assumed a different API shape. The tests only covered the happy path. There was no architecture decision to reference — I just picked JWT because it felt right. And the docker-compose.yml? It didn't exist yet.

I had AI-generated code, but no real software development workflow.


What was actually missing

Good software isn't just code. Before you write a single line, you need:

  • A spec that everyone (including future-you) agrees on
  • An architecture decision that explains the why
  • Backend and frontend designed to talk to each other
  • Tests that prove things actually work
  • A code review that catches security holes before they ship
  • A deployment config that someone can actually run

Normally, a team handles all of this. A PM writes the spec. An architect proposes options. Engineers implement and review each other's work. A DevOps person sets up CI/CD.

What if AI could fill all those roles?


Building the pipeline

I built claude-dev-pipeline — a Claude Code plugin that orchestrates a team of specialized AI agents, each with a specific job.

GitHub logo airwaves778899 / claude-dev-pipeline

7-agent full-stack development pipeline plugin for Claude Code — PM → Architect → Backend → Frontend → QA → Reviewer → DevOps

claude-dev-pipeline

A Claude Code plugin that orchestrates 7 specialized AI agents to take your feature request all the way from requirements analysis to production deployment — with a human-in-the-loop checkpoint at every phase.

中文說明


Why?

Writing a feature involves more than just code. You need:

  • A clear spec that everyone agrees on
  • An architecture decision before you write a single line
  • Backend and frontend that actually fit together
  • Tests that prove things work
  • A code review that catches security holes
  • A deployment config that actually runs

claude-dev-pipeline encodes that workflow as a Claude Code plugin. Each agent is an expert. You stay in control at every gate.


The 7 Agents

# Agent Role Output
0 Discovery Clarifies vague requirements via dialogue Confirmed requirement
1 Exploration Scans existing codebase in parallel .pipeline/exploration.md
2 PM Writes a structured PRD with user stories & acceptance criteria .pipeline/pm.md
3 Architect Proposes 2–3 architecture options

claude-dev-pipeline GitHub repo

The idea was simple: instead of one big AI doing everything, use multiple agents in sequence — each expert at one job — with you approving the output at every important gate.

Nine agents, nine phases:

# Agent Role Output
0 Discovery Clarifies vague requirements via dialogue Confirmed requirement
1 Exploration Scans existing codebase in parallel .pipeline/exploration.md
2 PM Writes a structured PRD with user stories & acceptance criteria .pipeline/pm.md
3 Architect Proposes 2–3 architecture options with trade-offs .pipeline/architect.md
4a Backend Implements REST APIs, services, repositories src/backend/
4b Frontend Implements React UI, hooks, API client src/frontend/
5 QA Writes and runs unit, integration, and E2E tests tests/
6 Reviewer Audits code for security, bugs, and quality (confidence ≥ 80) .pipeline/review.md
7 DevOps Creates Dockerfile, docker-compose, GitHub Actions CI/CD deploy/

Nine agents and their roles

Phases 4a and 4b run in parallel.

The full flow:

User requirement
│
[Discovery]    ← asks clarifying questions if vague
│  confirmed requirement
[Exploration]  ← scans codebase in parallel (2 sub-agents)
│  exploration.md
[PM]           ← you review & approve PRD
│  pm.md
[Architect]    ← you choose 1 of 3 architecture options
│  architect.md
┌────┴────┐
[Backend]  [Frontend]  ← run in parallel
└────┬────┘
[QA]
│  tests green
[Reviewer] ← pipeline pauses if Critical issues > 3
│  review.md clean
[DevOps]
│
🎉 Done
Enter fullscreen mode Exit fullscreen mode

Pipeline flow diagram


A few design decisions I'm proud of

Human-in-the-loop at every critical gate.
You approve the PRD before architecture begins. You choose one of three architecture options before code is written. The Reviewer pauses everything if it finds more than 3 critical issues. AI does the work; you stay in control.

Backend and Frontend run in parallel.
Since both agents were given the same architecture document, they fit together. This shaves real time off the pipeline and eliminates the classic "your API doesn't match what I expected" problem.

Every agent writes a structured artifact.
The PM writes .pipeline/pm.md. The Architect writes .pipeline/architect.md. These files become the living memory of your project — persistent knowledge that survives across pipeline runs and future features.

Git auto-commits after each approved phase.
Every milestone is tracked:

pipeline: PM — add PRD for <feature>
pipeline: Architect — add architecture for <feature>
pipeline: Implement <feature> (backend + frontend)
Enter fullscreen mode Exit fullscreen mode

How to try it

# 1. Clone the repo
git clone https://github.com/airwaves778899/claude-dev-pipeline.git

# 2. Register as a local marketplace
claude plugin marketplace add "C:\path\to\claude-dev-pipeline"   # Windows
claude plugin marketplace add "/path/to/claude-dev-pipeline"     # macOS/Linux

# 3. Install
claude plugin install claude-dev-pipeline

# 4. Verify
claude plugin list
# should show: ✓ claude-dev-pipeline  enabled
Enter fullscreen mode Exit fullscreen mode

Then, inside any project with Claude Code:

/claude-dev-pipeline:dev-pipeline start "Add user authentication with email + password login"
Enter fullscreen mode Exit fullscreen mode

You can also target a single agent or resume from a specific phase:

/claude-dev-pipeline:dev-pipeline run --agent architect
/claude-dev-pipeline:dev-pipeline run --from qa
/claude-dev-pipeline:dev-pipeline status
Enter fullscreen mode Exit fullscreen mode

Stack profiles let you skip the tech-stack configuration prompt:

/dev-pipeline start "Add payment processing" --stack python
/dev-pipeline start "Build mobile onboarding" --stack flutter
Enter fullscreen mode Exit fullscreen mode

Supported: ts-node (default), ts-react, python, go, flutter.


The unexpected hard part

I thought the hardest part would be writing the agent prompts. It wasn't.

The hardest part was handoffs.

Each agent needs to know exactly what the previous agent decided. The PM agent's output has to be structured in a way the Architect can actually parse. The Architect's decision has to be specific enough that Backend and Frontend can implement without contradicting each other.

I went through many iterations. The current solution: every agent reads the previous .pipeline/*.md files as context, and writes its own output in a documented schema. Structured handoffs, not vibes.

The second hard part was encoding my own opinions into prompts. When I write code alone, I make dozens of micro-decisions automatically. Teaching an agent to make those same decisions consistently — and to explain its reasoning — took real effort. It's essentially writing a very opinionated style guide for each role.


What I learned

Structure beats raw intelligence. A well-prompted agent that always produces a specific output format is more useful than a powerful model that does something different every time.

Approval gates are not friction — they're the whole point. The value of this pipeline isn't speed. It's that you understand every decision that was made. You approved the PRD. You chose the architecture. You reviewed the tests. When something breaks in production, you know why — because you were part of every decision.

AI agents need to read before they write. The Exploration agent was a late addition, but it turned out to be essential. Without it, the Backend agent would generate code with no awareness of existing patterns, naming conventions, or architecture choices already in the codebase. Reading first changed everything.


What's next

The project is open source (MIT) and actively evolving.

Recent additions in v3.0.0 include:

  • Security Agent — runs an OWASP Top 10 audit between QA and Reviewer
  • Troubleshooter Agent — structured bug-fix loop: Reproduce → Isolate → Diagnose → Fix → Verify, triggered with /dev-pipeline fix "description of the bug"

I'm planning to add support for multi-repo pipelines and a web-based progress dashboard.


⭐ If this resonates, give it a star:
github.com/airwaves778899/claude-dev-pipeline

I'm curious: what's the most painful part of your own dev workflow that you wish an AI could handle? Drop it in the comments.


This plugin is built on Claude Code, Anthropic's CLI tool for agentic coding. The plugin system lets you define custom agents and slash commands that Claude Code can load and orchestrate.

Top comments (0)