I Built a 10-Agent AI Product Team in Claude Code - Part I

Mir Majeed — Thu, 30 Apr 2026 19:29:10 +0000

I run an AI product team with 10 specialized agents. A researcher, a PRD writer, a designer who shows me HTML mockups of 3 different UI approaches, an engineer who follows strict TDD, a security auditor that gets cross-model code review, and a few more.

Each agent proposes alternatives before committing. Every stage has a "Grill Me" session where the AI challenges my assumptions. The whole thing runs under a single Claude Max subscription. No extra API costs.

This started as an OpenClaw setup. It's now 10 markdown files.

The problem

If you're building a real product, not vibe-coding a side project, you need structure. Someone to validate the idea. Someone to write the spec. Someone to code it. Someone else to break it. Someone to make sure nobody shipped hardcoded API keys to production.

That's a team. One big LLM can't replace a team. Context drift, decision fatigue, and the inability to challenge its own work mean you end up with mediocre output even from the smartest model.

So I built mine as a team of agents:

Stage 1: Ideation
Stage 2: Research & Feasibility
Stage 3: PRD
Stage 4: Design & Architecture
Stage 5: Engineering
Stage 6: QA & Staging
Stage 7: Production Deployment
Stage 8: Go-To-Market

Each stage has quality gates. An auditor reviews every stage. The lead agent never lets bad work advance.

The question was which tool to orchestrate it.

What I was using: OpenClaw

OpenClaw is an open-source AI assistant framework with 300K+ GitHub stars. It runs as a local gateway, connects to messaging platforms (WhatsApp, Telegram, Discord, Slack), and lets you define specialized agents with personas, tools, and skills.

My setup had 9 agents: Athina as orchestrator plus 8 specialized agents. It worked. But the friction was real.

Gateway server, WebSocket connections, port configuration, agent process management. Real infrastructure overhead.

Every agent session hits the API. Token costs add up fast when 8 agents are doing serious work.

Sub-agents could only talk through the orchestrator. Never directly to each other.

No native TDD. No design alternatives. No cross-model review.

When agent handoffs failed, finding out which one and why was a slog.

Then I discovered Claude Code's Agent Teams feature and took the chance to redesign the workflow.

Enter Claude Code Agent Teams

Agent Teams lets you coordinate multiple Claude Code sessions as a team. One session is the lead. The others are specialized teammates. Each runs in its own context window. They can message each other directly.

Each agent is a markdown file. Drop them in .claude/agents/, set up settings.json, and you have a team.

Here's the structure:

your-project/
├── CLAUDE.md                         ← Shared architecture, coding standards
└── .claude/
    ├── settings.json                 ← Enables Agent Teams, sets lead agent
    ├── project-config.md             ← Project identity, paths, IDs
    └── agents/
        ├── athina.md                 ← Lead orchestrator
        ├── scout.md                  ← Market researcher
        ├── spectra.md                ← PRD writer
        ├── pixel.md                  ← Designer & architect
        ├── builder.md                ← Engineer
        ├── auditor.md                ← Compliance reviewer
        ├── bugsy.md                  ← QA tester
        ├── piper.md                  ← DevOps
        ├── nova.md                   ← Marketing
        └── quill.md                  ← Content writer

10 agents. 5 on Opus 4.6 (Athina, Scout, Spectra, Pixel, Builder), 5 on Sonnet 4.6 (Auditor, Bugsy, Piper, Nova, Quill). Opus for the agents that do open-ended reasoning, Sonnet for the ones that follow checklists and templates. No infrastructure beyond markdown files in a folder.

This also keeps token cost down. Sonnet runs cheaper than Opus, and the procedural agents don't need Opus-level reasoning, so there's no point paying for it on every Auditor pass and every QA run.

Meet the team

Athina, lead PM and orchestrator

The brain of the operation. She runs as the default agent on every session.

She coordinates all other agents and enforces sequential stage execution. She manages Linear issues (creates them before any agent spawns, hard rule). She updates 00_PROJECT_CONTEXT.md in Obsidian at 8 touchpoints per stage. She runs Grill Me sessions where she challenges my assumptions. She pushes for velocity and never lets me get stuck on a decision. And she shows a Pipeline Dashboard on every session startup.

Her personality is warm and cheerful. Time-appropriate greetings, celebrates wins, calls out blockers. She works as an accountability partner more than a router.

A snippet from her system prompt:

### Velocity & Momentum (Keep Mir Moving)
You are Mir's accountability partner. Your job is to minimize delays.

**After completing any stage work:**
- Don't just report completion, immediately propose the next action
- "Stage 2 is done and Auditor passed! Ready for Grill Me?"
- Never end a message with just a status update.

**When waiting on Mir for approval gates:**
- Present the decision with all context so Mir can decide immediately
- If Mir doesn't respond: gently nudge
- "Hey Mir, it's been 3 days since Pixel finished the specs. 
   The team is ready to start building. Want to review now?"

Scout, market researcher

Stage 2. Validates the product idea, analyzes competitors, defines target personas, recommends Go/No-Go.

Spectra, PRD writer

Stage 3. Before writing the PRD, she proposes 2-3 product approaches with scope, timeline, and risk tradeoffs. I pick one. Then she writes the PRD for the chosen direction.

Spectra: "I see three viable directions:

  Approach A: Full Platform — 25 REQs, 8 weeks, max market share
  Approach B: Focused MVP — 15 REQs, 4 weeks, fast validation
  Approach C: API-First — 12 REQs, 3 weeks, developer market

  I recommend Approach B for fastest time to market. Your call."

This kills scope creep before it starts. I never commit to a direction without seeing alternatives.

Pixel, designer and architect

Stage 4. The most involved agent in the pipeline. She works in three phases.

First, architecture alternatives. She proposes 2-3 approaches (monolith vs modular monolith vs event-driven) with cost estimates and tradeoffs.

Second, UI/UX alternatives with HTML mockups. She generates 3 self-contained HTML files I open in Chrome to compare visually.

Third, full specs. Once I pick architecture and UI direction, she writes the design spec and tech spec.

The HTML mockups are what I want to highlight. Instead of describing UI approaches in text, Pixel produces:

project-root/
  mockups/
    approach-a-dashboard.html    ← like Linear/Stripe
    approach-b-wizard.html       ← like TurboTax
    approach-c-chat.html         ← like ChatGPT

Each is a single file with Tailwind CDN, all 6 key screens (landing, auth, primary, detail, settings, empty state) on one scrollable page, with realistic placeholder data.

I open all three in Chrome tabs, compare side by side, pick one. Total time: 5 minutes.

Builder, full-stack engineer

Stage 5. This is where I added the Superpowers plugin. Builder follows strict TDD with micro-task planning.

Work breaks into 20-40 micro-tasks of 2-5 minutes each. For each task: write a failing test first (RED), write minimum code to pass (GREEN), refactor, commit. Two-stage review per task: spec compliance, then code quality. Subagents handle the autonomous execution and can run for hours without my input.

A snippet from builder.md:

### Test-Driven Development (RED-GREEN-REFACTOR)
For EVERY micro-task, strictly follow this order:
1. RED — Write a failing test first. Run it. Confirm it fails.
2. GREEN — Write the MINIMUM code to make the test pass.
3. REFACTOR — Clean up if needed, keeping tests green.
4. COMMIT — One commit per micro-task with its test.

⛔ Never write implementation code before its test.
   If you catch yourself writing code first, STOP, delete it,
   write the test first, then rewrite the code.

By the time the PR reaches the auditor, every line of code has a test that was written first.

Auditor, compliance and security

Runs after every stage from 2 through 7. Reviews against a stage-specific checklist. For code stages (5-7), Athina runs OpenAI Codex first via the /codex:adversarial-review plugin, then passes those findings to Auditor.

Claude writes the code. GPT reviews it. Then Auditor layers her own checklist on top. Three-layer review.

For Stage 5, she also verifies TDD compliance by checking the git history to confirm tests were committed before implementation code. If they weren't, that's a High finding.

Bugsy, QA test engineer

Stage 6. Tests against live staging, not against static code. Uses Playwright for E2E (npx playwright test). Runs security tests covering IDOR, auth bypass, XSS, SQL injection, rate limits, and webhook replay attacks.

Piper, DevOps engineer

Stages 6-7. Splits work into autonomous prep (env validation, Docker builds, migration dry-runs, deploy script generation) and attended deploy (Azure approval prompts that need me at the computer). Piper always tells me how long the attended portion will take so I can plan around it.

Nova and Quill, marketing and content

Stage 8. Go-to-market strategy and SEO content writing. They tag in once the product is live.

The pipeline dashboard

This is what I see every time I open Claude Code:

👋 Good morning, Mir! Hope you're having a great start to the day!

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 BidScore — Pipeline Dashboard
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ Stage 1: Ideation — Complete
✅ Stage 2: Research — Complete (Auditor: PASS)
✅ Stage 3: PRD — Complete (22 REQs, Auditor: PASS)
✅ Stage 4: Design & Architecture — Complete (Auditor: PASS)
🔄 Stage 5: Engineering — IN PROGRESS
⬜ Stage 6: QA & Staging
⬜ Stage 7: Production Deployment
⬜ Stage 8: Go-To-Market

📍 Current: Stage 5 — Builder is implementing Phase A (code)
⏳ Blocker: None
📋 Next: Builder finishes → unit tests → E2E scaffold → 
         Grill Me → Codex review → Auditor
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Builder should be wrapping up the implementation soon. 
Want me to check on progress? 🚀

Athina reads 00_PROJECT_CONTEXT.md (which she keeps updated at 8 touchpoints per stage) and renders this on startup. I never have to ask "where are we?"

What's actually different

The workflow changed in a few real ways.

Design alternatives at Stages 3 and 4

Spectra proposes 2-3 product approaches before writing the PRD. Pixel proposes 2-3 architecture options and 2-3 UI approaches with HTML mockups. I never commit to a direction without seeing alternatives.

Grill Me sessions

After every stage from 2 through 7, before the auditor runs, Athina challenges me:

"Is the market size real? What's the weakest assumption? Which REQs will be hardest to build? Are the success metrics realistic? What edge cases are we ignoring?"

She saves my responses to grillme_stage[N].md and feeds them to the auditor. If I flagged a concern during Grill Me and the deliverable doesn't address it, the auditor marks it as a High finding.

TDD enforcement via Superpowers

The Superpowers plugin (146K+ stars) auto-triggers TDD on Stage 5. RED-GREEN-REFACTOR per micro-task, with subagent-driven execution. Every line of code has a test written before it.

Cross-model code review

For Stages 5-7, Athina runs the Codex plugin (/codex:adversarial-review) to get OpenAI's GPT to challenge Claude's code from a different model's perspective. Findings go to the auditor on top of the checklist review and Grill Me concerns.

Even with the mixed Opus/Sonnet split, every agent in the pipeline is still a Claude model. Same training data, same RLHF biases, same failure modes. Codex (GPT) brings genuinely independent review that catches what same-family review misses.

Centralized project configuration

Nothing is hardcoded. A .claude/project-config.md file stores all project-specific details: name, vault paths, Linear IDs, GitHub repo, target customers, monetization model. On first run, Athina interviews me to collect every required field. She refuses to work until everything is filled. All 10 agents read from this one file.

The setup is portable. Drop it in a new project, run Claude Code, and Athina configures everything fresh through the interview.

Athina's personality

She greets me by time of day, celebrates wins, calls out blockers, pushes for velocity. When I want to skip a quality gate, she pushes back warmly:

"I totally understand wanting to move fast! But skipping the Auditor on Stage 5 could mean we ship a security gap to staging. How about we run a quick audit focused just on the critical gates? Should take 5 minutes."

It sounds small, but having an AI that wants to ship and pushes me to keep moving has changed how often I actually finish things.

OpenClaw vs Claude Code Agent Teams

Feature	OpenClaw	Claude Code Agent Teams
Setup	Gateway + WebSocket	Markdown files in `.claude/`
Cost	API tokens per agent	Flat under Max subscription
Communication	Through orchestrator only	Direct between teammates
TDD	Manual	Enforced via Superpowers
Code review	Single	Codex + Auditor + Grill Me
Design alternatives	None	2-3 with HTML mockups
Version control	Code on GitHub	Agent definitions on GitHub
Background running	Always-on	When Claude Code is open

Cost is the biggest factor here. With OpenClaw, every agent session burns API tokens. 9 agents working a real product through 8 stages adds up fast, especially when Builder is iterating through Stage 5 and the Auditor is rerunning checks after each fix. Claude Code Agent Teams runs flat under the Max subscription. Same workflow, same agents, same number of stages, but the bill stops climbing.

The honest tradeoff: OpenClaw can run 24/7 with messaging platform integration. Claude Code Agent Teams runs when you open it. For products you're actively building, that's fine.

Closing thoughts

Two things surprised me building this.

The first was how much personality matters. Athina pushing me to keep moving, celebrating wins, calling out blockers, that emotional layer is what makes this go from a tool I tried once to a tool I use every day. I wouldn't have predicted that.

The second was alternatives before commitments. Spectra proposing 3 product directions or Pixel showing me 3 UI mockups before locking anything in has saved me weeks I would have wasted going down the wrong path.

Next post: the parts of this that are working better than I expected, and the parts I'm already rethinking.

Until then, happy shipping.

Built with Claude Code, the Superpowers plugin, the Codex plugin, and OpenClaw inspiration.

DEV Community: Mir Majeed