vikash ruhil

Posted on Feb 8

How I Built an 8-Agent AI Team That Ships Code Autonomously

#ai #showdev #agents #automation

A plugin system where AI agents plan, build, review, and deploy — like a real engineering team

The Problem Nobody Talks About

Every developer using AI assistants hits the same wall.

You ask an AI to "add user authentication." It generates code. Maybe good code. But it doesn't know your project's patterns. It doesn't check if the code is secure. It doesn't create a pull request. It doesn't track what was done or what comes next.

You're still the glue holding everything together.

I wanted something different. I wanted AI agents that work like a real engineering team — where one agent plans the work, another builds it, another reviews it, and another tries to break it. Where knowledge accumulates across sessions. Where "add user authentication" triggers an entire workflow that ends with a reviewed, tested pull request.

So I built AI Agent Manager.

What Is AI Agent Manager?

AI Agent Manager is a Claude Code plugin that provides 8 specialized AI agents for software development. Each agent has a distinct role, distinct tools, and a distinct personality — just like teammates on a real engineering team.

It's not a framework. It's not a SaaS product. It's a set of Markdown prompt files with YAML frontmatter that plug directly into Claude Code's CLI. No servers. No APIs. No infrastructure. Just agents that understand your codebase and get work done.

Here's what makes it different from "just using AI":

Agents have roles. A Code Reviewer doesn't write features. A Worker doesn't do planning.
Agents have memory. They remember your project's patterns across sessions.
Agents collaborate. The output of one becomes the input of the next.
Work is tracked. Every task, subtask, and review decision is recorded.
Execution is parallel. Multiple workers build simultaneously using git worktrees.

Meet the Team

1. Launch Pad — The Readiness Planner

Before any code is written, Launch Pad takes your raw goal and prepares it for execution. It runs a 6-phase process:

VALIDATE > DISCOVER > ANALYZE > DECOMPOSE > PACKAGE > REFINE & SAVE

Give it "add user authentication" and it will:

Scan your codebase to understand existing patterns
Identify which files will be impacted
Estimate parallelism opportunities
Produce a "Supervisor-Ready Brief" — a structured document that tells the Supervisor exactly what to do

Think of it as the architect who draws blueprints before construction begins.

/launch-pad goal: "add JWT authentication with refresh tokens"

2. Supervisor — The Parallel Orchestrator

The Supervisor is the project manager. It takes a task (or a Launch Pad brief) and drives it to completion through 6 phases:

INIT > ACQUIRE > PLAN > EXECUTE > FINALIZE > LOOP

The magic is in the EXECUTE phase. The Supervisor analyzes subtasks, identifies which can run in parallel, spins up isolated git worktrees, and dispatches Workers to build simultaneously. No file conflicts. No merge chaos.

When workers finish, it merges everything sequentially, runs reviews, creates a PR, and moves to the next task.

/supervisor job: .supervisor/jobs/2025-01-15-jwt-auth.md

3. Context-Keeper — The Memory Manager

Every system needs state management. The Context-Keeper is the sole writer of the Supervisor's state file. It runs on a fast, lightweight model (Haiku) with a 3-turn limit — just enough to read, update, and confirm.

This externalized state means the Supervisor itself holds only ~800 tokens of context, leaving room for actual work.

4. Worker — The Builder

Workers are the hands. Each one operates in an isolated git worktree, implementing a single subtask. They have no access to git operations (no commits, no pushes) — they just write code. This isolation is intentional: a Worker can't accidentally break the main branch.

When done, they produce a structured WORKER_RESULT block that the Supervisor validates.

5. Product Owner — The Requirements Translator

Vague requirements kill projects. The Product Owner takes business problems and translates them into structured user stories with acceptance criteria in Given/When/Then format.

/product-owner problem: "users are abandoning checkout"

It reads your domain context, runs discovery, and produces stories that any developer (human or AI) can implement.

6. Orchestrator — The Task Architect

The Orchestrator breaks goals into dependency graphs:

EPIC — The big feature
TASK — Implementation work (30-60 minutes each)
SUBTASK — Review gates that block the next task

Every implementation task gets a paired review subtask. You can't skip reviews. This is quality enforcement by design.

7. Code Reviewer — The Quality Gatekeeper

After code is written, the Code Reviewer inspects it against your project's patterns. It outputs one of three decisions:

PASS — Ship it
FAIL — Fix these issues and come back
NEEDS_HUMAN — I found something that requires human judgment

It checks type safety, security, performance, test coverage, and pattern alignment. And because it has persistent memory, it gets better at reviewing your specific codebase over time.

/code-reviewer src/auth/

8. Red Team Reviewer — The Adversarial Auditor

The Red Team Reviewer's job is to break things. It attacks assumptions, verifies claims against actual documentation, and explores 6 attack vectors to find what would fail in production.

Findings are rated: FATAL, CRITICAL, WARNING, WEAKNESS.

/red-team-reviewer --focus security

The Architecture: Surprisingly Simple

Here's what surprises people: the entire system is just Markdown files.

ai-agent-manager-plugin/
├── agents/           # 8 Markdown prompt files
├── commands/         # Slash command entry points
├── skills/           # 35 focused implementation guides
└── hooks/            # Quality gate automation

Each agent is a .md file with YAML frontmatter that specifies:

Which tools it can use
Which model it runs on
Which skills are pre-loaded
Whether it has persistent memory

---
tools: [Read, Glob, Grep, Bash, Write, Edit]
model: sonnet
memory: project
skills:
  - supervisor-readiness
  - context-setup
  - quality-checklist
---

No Docker. No Kubernetes. No microservices. Just prompt engineering with structure.

Skills: Reusable Knowledge Packets

Skills are the secret weapon. Instead of agents re-discovering patterns every session, skills pre-inject focused knowledge at spawn time.

There are 35 skills covering:

Framework patterns — NestJS, Next.js, API Gateway
Workflow patterns — state management, async orchestration, context summarization
Quality patterns — commit conventions, review checklists, pattern detection
Testing patterns — Playwright E2E
Database patterns — TypeORM, Drizzle ORM, MySQL

When the Supervisor spawns, it already knows 5 skills. The Code Reviewer already knows quality criteria. No file reads needed. No context wasted.

Parallel Execution: The Git Worktree Trick

This is the technical insight that makes everything work.

When the Supervisor needs to run 3 workers simultaneously, it can't have them all editing src/auth/login.ts at the same time. Traditional approaches use file locks or merge strategies. I used git worktrees.

A git worktree is a separate working directory linked to the same repository. Each worker gets its own worktree, its own branch, its own filesystem. They can build in parallel without knowing about each other.

project/                    # Main worktree (Supervisor)
../project-BD-12a/          # Worker A worktree
../project-BD-12b/          # Worker B worktree (blocked, waiting)
../project-BD-12c/          # Worker C worktree

When workers finish, the Supervisor merges branches sequentially into the feature branch. Conflicts are rare because the Orchestrator designed the subtasks to touch different files.

And for simple tasks? A single subtask skips worktrees entirely and builds directly. No overhead when it's not needed.

Persistent Memory: Agents That Learn

Four agents have persistent memory:

Launch Pad — Remembers common file impacts per goal type
Code Reviewer — Remembers recurring issues and codebase conventions
Red Team Reviewer — Remembers past vulnerabilities and attack patterns
Product Owner — Remembers domain terminology and stakeholder preferences

Memory is stored in .claude/agent-memory/ and accumulates automatically. The Code Reviewer that's seen your codebase 50 times catches things a fresh reviewer never would.

Plan-First Philosophy

The biggest lesson I learned building this system: planning is not overhead — it's the highest-leverage activity.

Early versions jumped straight into execution. The Supervisor would take a goal, generate tasks, and start building. It worked for simple things. For complex features, it produced fragmented, inconsistent code.

The Launch Pad agent changed everything. By spending 60 seconds analyzing the codebase, estimating file impacts, and packaging a structured brief, the execution phase became dramatically more reliable.

The workflow is now:

Raw goal → Launch Pad → Supervisor-Ready Brief → Supervisor → Shipped PR

That brief is typically 200-400 lines of structured analysis. It tells the Supervisor:

Exactly which files to touch
Which subtasks can run in parallel
What patterns to follow
What risks to watch for

The Supervisor then skips its own discovery phases (saving ~500 tokens of context) and goes straight to execution.

Quality Gates: Trust But Verify

Every piece of code goes through at least two quality gates:

Gate 1 — Plugin Hooks (automated, no extra agents):

SubagentStop: Verifies workers produced valid results
TaskCompleted: Prevents premature task closure

Gate 2 — Code Reviewer (full review):

Pattern matching against your codebase
Security checks
Clear PASS/FAIL/NEEDS_HUMAN decision

Gate 3 — Red Team Reviewer (optional, pre-launch):

Adversarial audit
6 attack vector exploration
Severity-rated findings

The hook system is lightweight — it uses a fast model with a 30-second timeout. Just enough to catch obvious failures before the expensive review step.

What I'd Do Differently

1. Start with the state management. I retrofitted externalized state in v3. It should have been there from v1. The Context-Keeper pattern (dedicated agent for state mutations) solved coordination bugs that plagued earlier versions.

2. Make skills smaller. Some skills are 200+ lines. The best ones are under 100. Focused knowledge beats comprehensive documentation.

3. Test the prompts, not just the code. Agent behavior is determined by prompts, and prompts are code. I should have built prompt regression tests earlier.

Getting Started

AI Agent Manager is open source and works with any Claude Code installation.

GitHub: github.com/vikashruhilgit/ai-agent-manager

# From the ai-agent-manager directory
/plugin marketplace add ./
/plugin install ai-agent-manager-plugin@ai-agent-manager-marketplace

# In your project
/launch-pad goal: "describe what you want to build"
/supervisor  # Let it run

It works with any programming language, any framework, any project structure. The agents read your CLAUDE.md to understand your specific patterns.

The Bigger Picture

AI Agent Manager isn't about replacing developers. It's about giving developers a team.

Most of us work solo or in small teams. We context-switch between planning, coding, reviewing, and debugging. Each switch costs focus. Each role requires a different mindset.

What if your planning mindset was always available? What if code review happened instantly after every change? What if security audits were a command away, not a quarterly event?

That's what an AI team gives you. Not replacement. Amplification.

The agents aren't perfect. They hallucinate sometimes. They miss edge cases. They need human judgment for the hard decisions (that's what NEEDS_HUMAN is for).

But they're tireless, consistent, and they remember everything. And they're getting better every session.

AI Agent Manager is open source. Star it on GitHub: github.com/vikashruhilgit/ai-agent-manager

DEV Community