林宗賢

Posted on May 29

I built a 9-agent AI dev team in a Claude Code plugin — here's what happened

#claudecode #ai #programming #devtools

The moment I realized AI coding assistants were broken

I was building a side project — a simple task manager app. I opened Claude Code, typed:

"Add user authentication with email and password login"

…and hit enter.

Twenty minutes later, I had code. A lot of code. Authentication logic, routes, middleware, even some basic tests.

But there was a problem.

The frontend (me, on a different day) had assumed a different API shape. The tests only covered the happy path. There was no architecture decision to reference — I just picked JWT because it felt right. And the docker-compose.yml? It didn't exist yet.

I had AI-generated code, but no real software development workflow.

What was actually missing

Good software isn't just code. Before you write a single line, you need:

A spec that everyone (including future-you) agrees on
An architecture decision that explains the why
Backend and frontend designed to talk to each other
Tests that prove things actually work
A code review that catches security holes before they ship
A deployment config that someone can actually run

Normally, a team handles all of this. A PM writes the spec. An architect proposes options. Engineers implement and review each other's work. A DevOps person sets up CI/CD.

What if AI could fill all those roles?

Building the pipeline

I built claude-dev-pipeline — a Claude Code plugin that orchestrates a team of specialized AI agents, each with a specific job.

airwaves778899 / claude-dev-pipeline

7-agent full-stack development pipeline plugin for Claude Code — PM → Architect → Backend → Frontend → QA → Reviewer → DevOps

claude-dev-pipeline

A Claude Code plugin that orchestrates 7 specialized AI agents to take your feature request all the way from requirements analysis to production deployment — with a human-in-the-loop checkpoint at every phase.

中文說明

Why?

Writing a feature involves more than just code. You need:

A clear spec that everyone agrees on
An architecture decision before you write a single line
Backend and frontend that actually fit together
Tests that prove things work
A code review that catches security holes
A deployment config that actually runs

claude-dev-pipeline encodes that workflow as a Claude Code plugin. Each agent is an expert. You stay in control at every gate.

The 7 Agents

#	Agent	Role	Output
0	Discovery	Clarifies vague requirements via dialogue	Confirmed requirement
1	Exploration	Scans existing codebase in parallel	`.pipeline/exploration.md`
2	PM	Writes a structured PRD with user stories & acceptance criteria	`.pipeline/pm.md`
3	Architect	Proposes 2–3 architecture options

…

View on GitHub

The idea was simple: instead of one big AI doing everything, use multiple agents in sequence — each expert at one job — with you approving the output at every important gate.

Nine agents, nine phases:

#	Agent	Role	Output
0	Discovery	Clarifies vague requirements via dialogue	Confirmed requirement
1	Exploration	Scans existing codebase in parallel	`.pipeline/exploration.md`
2	PM	Writes a structured PRD with user stories & acceptance criteria	`.pipeline/pm.md`
3	Architect	Proposes 2–3 architecture options with trade-offs	`.pipeline/architect.md`
4a	Backend	Implements REST APIs, services, repositories	`src/backend/`
4b	Frontend	Implements React UI, hooks, API client	`src/frontend/`
5	QA	Writes and runs unit, integration, and E2E tests	`tests/`
6	Reviewer	Audits code for security, bugs, and quality (confidence ≥ 80)	`.pipeline/review.md`
7	DevOps	Creates Dockerfile, docker-compose, GitHub Actions CI/CD	`deploy/`

Phases 4a and 4b run in parallel.

The full flow:

User requirement
│
[Discovery]    ← asks clarifying questions if vague
│  confirmed requirement
[Exploration]  ← scans codebase in parallel (2 sub-agents)
│  exploration.md
[PM]           ← you review & approve PRD
│  pm.md
[Architect]    ← you choose 1 of 3 architecture options
│  architect.md
┌────┴────┐
[Backend]  [Frontend]  ← run in parallel
└────┬────┘
[QA]
│  tests green
[Reviewer] ← pipeline pauses if Critical issues > 3
│  review.md clean
[DevOps]
│
🎉 Done

A few design decisions I'm proud of

Human-in-the-loop at every critical gate.
You approve the PRD before architecture begins. You choose one of three architecture options before code is written. The Reviewer pauses everything if it finds more than 3 critical issues. AI does the work; you stay in control.

Backend and Frontend run in parallel.
Since both agents were given the same architecture document, they fit together. This shaves real time off the pipeline and eliminates the classic "your API doesn't match what I expected" problem.

Every agent writes a structured artifact.
The PM writes .pipeline/pm.md. The Architect writes .pipeline/architect.md. These files become the living memory of your project — persistent knowledge that survives across pipeline runs and future features.

Git auto-commits after each approved phase.
Every milestone is tracked:

pipeline: PM — add PRD for <feature>
pipeline: Architect — add architecture for <feature>
pipeline: Implement <feature> (backend + frontend)

How to try it

# 1. Clone the repo
git clone https://github.com/airwaves778899/claude-dev-pipeline.git

# 2. Register as a local marketplace
claude plugin marketplace add "C:\path\to\claude-dev-pipeline"   # Windows
claude plugin marketplace add "/path/to/claude-dev-pipeline"     # macOS/Linux

# 3. Install
claude plugin install claude-dev-pipeline

# 4. Verify
claude plugin list
# should show: ✓ claude-dev-pipeline  enabled

Then, inside any project with Claude Code:

/claude-dev-pipeline:dev-pipeline start "Add user authentication with email + password login"

You can also target a single agent or resume from a specific phase:

/claude-dev-pipeline:dev-pipeline run --agent architect
/claude-dev-pipeline:dev-pipeline run --from qa
/claude-dev-pipeline:dev-pipeline status

Stack profiles let you skip the tech-stack configuration prompt:

/dev-pipeline start "Add payment processing" --stack python
/dev-pipeline start "Build mobile onboarding" --stack flutter

Supported: ts-node (default), ts-react, python, go, flutter.

The unexpected hard part

I thought the hardest part would be writing the agent prompts. It wasn't.

The hardest part was handoffs.

Each agent needs to know exactly what the previous agent decided. The PM agent's output has to be structured in a way the Architect can actually parse. The Architect's decision has to be specific enough that Backend and Frontend can implement without contradicting each other.

I went through many iterations. The current solution: every agent reads the previous .pipeline/*.md files as context, and writes its own output in a documented schema. Structured handoffs, not vibes.

The second hard part was encoding my own opinions into prompts. When I write code alone, I make dozens of micro-decisions automatically. Teaching an agent to make those same decisions consistently — and to explain its reasoning — took real effort. It's essentially writing a very opinionated style guide for each role.

What I learned

Structure beats raw intelligence. A well-prompted agent that always produces a specific output format is more useful than a powerful model that does something different every time.

Approval gates are not friction — they're the whole point. The value of this pipeline isn't speed. It's that you understand every decision that was made. You approved the PRD. You chose the architecture. You reviewed the tests. When something breaks in production, you know why — because you were part of every decision.

AI agents need to read before they write. The Exploration agent was a late addition, but it turned out to be essential. Without it, the Backend agent would generate code with no awareness of existing patterns, naming conventions, or architecture choices already in the codebase. Reading first changed everything.

What's next

The project is open source (MIT) and actively evolving.

Recent additions in v3.0.0 include:

Security Agent — runs an OWASP Top 10 audit between QA and Reviewer
Troubleshooter Agent — structured bug-fix loop: Reproduce → Isolate → Diagnose → Fix → Verify, triggered with /dev-pipeline fix "description of the bug"

I'm planning to add support for multi-repo pipelines and a web-based progress dashboard.

⭐ If this resonates, give it a star:
github.com/airwaves778899/claude-dev-pipeline

I'm curious: what's the most painful part of your own dev workflow that you wish an AI could handle? Drop it in the comments.

This plugin is built on Claude Code, Anthropic's CLI tool for agentic coding. The plugin system lets you define custom agents and slash commands that Claude Code can load and orchestrate.

Top comments (2)

Harjot Singh • May 30

the single-vendor lock-in is the real cost. moonshift uses multi-model routing (deepseek/qwen/claude per phase), $3 flat per shipped saas, no monthly. first run free, no card. moonshift.io

Some comments may only be visible to logged-in visitors. Sign in to view all comments.