A plugin system where AI agents plan, build, review, and deploy — like a real engineering team
The Problem Nobody Talks About
Every developer using AI assistants hits the same wall.
You ask an AI to "add user authentication." It generates code. Maybe good code. But it doesn't know your project's patterns. It doesn't check if the code is secure. It doesn't create a pull request. It doesn't track what was done or what comes next.
You're still the glue holding everything together.
I wanted something different. I wanted AI agents that work like a real engineering team — where one agent plans the work, another builds it, another reviews it, and another tries to break it. Where knowledge accumulates across sessions. Where "add user authentication" triggers an entire workflow that ends with a reviewed, tested pull request.
So I built AI Agent Manager.
What Is AI Agent Manager?
AI Agent Manager is a Claude Code plugin that provides 8 specialized AI agents for software development. Each agent has a distinct role, distinct tools, and a distinct personality — just like teammates on a real engineering team.
It's not a framework. It's not a SaaS product. It's a set of Markdown prompt files with YAML frontmatter that plug directly into Claude Code's CLI. No servers. No APIs. No infrastructure. Just agents that understand your codebase and get work done.
Here's what makes it different from "just using AI":
- Agents have roles. A Code Reviewer doesn't write features. A Worker doesn't do planning.
- Agents have memory. They remember your project's patterns across sessions.
- Agents collaborate. The output of one becomes the input of the next.
- Work is tracked. Every task, subtask, and review decision is recorded.
- Execution is parallel. Multiple workers build simultaneously using git worktrees.
Meet the Team
1. Launch Pad — The Readiness Planner
Before any code is written, Launch Pad takes your raw goal and prepares it for execution. It runs a 6-phase process:
VALIDATE > DISCOVER > ANALYZE > DECOMPOSE > PACKAGE > REFINE & SAVE
Give it "add user authentication" and it will:
- Scan your codebase to understand existing patterns
- Identify which files will be impacted
- Estimate parallelism opportunities
- Produce a "Supervisor-Ready Brief" — a structured document that tells the Supervisor exactly what to do
Think of it as the architect who draws blueprints before construction begins.
/launch-pad goal: "add JWT authentication with refresh tokens"
2. Supervisor — The Parallel Orchestrator
The Supervisor is the project manager. It takes a task (or a Launch Pad brief) and drives it to completion through 6 phases:
INIT > ACQUIRE > PLAN > EXECUTE > FINALIZE > LOOP
The magic is in the EXECUTE phase. The Supervisor analyzes subtasks, identifies which can run in parallel, spins up isolated git worktrees, and dispatches Workers to build simultaneously. No file conflicts. No merge chaos.
When workers finish, it merges everything sequentially, runs reviews, creates a PR, and moves to the next task.
/supervisor job: .supervisor/jobs/2025-01-15-jwt-auth.md
3. Context-Keeper — The Memory Manager
Every system needs state management. The Context-Keeper is the sole writer of the Supervisor's state file. It runs on a fast, lightweight model (Haiku) with a 3-turn limit — just enough to read, update, and confirm.
This externalized state means the Supervisor itself holds only ~800 tokens of context, leaving room for actual work.
4. Worker — The Builder
Workers are the hands. Each one operates in an isolated git worktree, implementing a single subtask. They have no access to git operations (no commits, no pushes) — they just write code. This isolation is intentional: a Worker can't accidentally break the main branch.
When done, they produce a structured WORKER_RESULT block that the Supervisor validates.
5. Product Owner — The Requirements Translator
Vague requirements kill projects. The Product Owner takes business problems and translates them into structured user stories with acceptance criteria in Given/When/Then format.
/product-owner problem: "users are abandoning checkout"
It reads your domain context, runs discovery, and produces stories that any developer (human or AI) can implement.
6. Orchestrator — The Task Architect
The Orchestrator breaks goals into dependency graphs:
- EPIC — The big feature
- TASK — Implementation work (30-60 minutes each)
- SUBTASK — Review gates that block the next task
Every implementation task gets a paired review subtask. You can't skip reviews. This is quality enforcement by design.
7. Code Reviewer — The Quality Gatekeeper
After code is written, the Code Reviewer inspects it against your project's patterns. It outputs one of three decisions:
- PASS — Ship it
- FAIL — Fix these issues and come back
- NEEDS_HUMAN — I found something that requires human judgment
It checks type safety, security, performance, test coverage, and pattern alignment. And because it has persistent memory, it gets better at reviewing your specific codebase over time.
/code-reviewer src/auth/
8. Red Team Reviewer — The Adversarial Auditor
The Red Team Reviewer's job is to break things. It attacks assumptions, verifies claims against actual documentation, and explores 6 attack vectors to find what would fail in production.
Findings are rated: FATAL, CRITICAL, WARNING, WEAKNESS.
/red-team-reviewer --focus security
The Architecture: Surprisingly Simple
Here's what surprises people: the entire system is just Markdown files.
ai-agent-manager-plugin/
├── agents/ # 8 Markdown prompt files
├── commands/ # Slash command entry points
├── skills/ # 35 focused implementation guides
└── hooks/ # Quality gate automation
Each agent is a .md file with YAML frontmatter that specifies:
- Which tools it can use
- Which model it runs on
- Which skills are pre-loaded
- Whether it has persistent memory
---
tools: [Read, Glob, Grep, Bash, Write, Edit]
model: sonnet
memory: project
skills:
- supervisor-readiness
- context-setup
- quality-checklist
---
No Docker. No Kubernetes. No microservices. Just prompt engineering with structure.
Skills: Reusable Knowledge Packets
Skills are the secret weapon. Instead of agents re-discovering patterns every session, skills pre-inject focused knowledge at spawn time.
There are 35 skills covering:
- Framework patterns — NestJS, Next.js, API Gateway
- Workflow patterns — state management, async orchestration, context summarization
- Quality patterns — commit conventions, review checklists, pattern detection
- Testing patterns — Playwright E2E
- Database patterns — TypeORM, Drizzle ORM, MySQL
When the Supervisor spawns, it already knows 5 skills. The Code Reviewer already knows quality criteria. No file reads needed. No context wasted.
Parallel Execution: The Git Worktree Trick
This is the technical insight that makes everything work.
When the Supervisor needs to run 3 workers simultaneously, it can't have them all editing src/auth/login.ts at the same time. Traditional approaches use file locks or merge strategies. I used git worktrees.
A git worktree is a separate working directory linked to the same repository. Each worker gets its own worktree, its own branch, its own filesystem. They can build in parallel without knowing about each other.
project/ # Main worktree (Supervisor)
../project-BD-12a/ # Worker A worktree
../project-BD-12b/ # Worker B worktree (blocked, waiting)
../project-BD-12c/ # Worker C worktree
When workers finish, the Supervisor merges branches sequentially into the feature branch. Conflicts are rare because the Orchestrator designed the subtasks to touch different files.
And for simple tasks? A single subtask skips worktrees entirely and builds directly. No overhead when it's not needed.
Persistent Memory: Agents That Learn
Four agents have persistent memory:
- Launch Pad — Remembers common file impacts per goal type
- Code Reviewer — Remembers recurring issues and codebase conventions
- Red Team Reviewer — Remembers past vulnerabilities and attack patterns
- Product Owner — Remembers domain terminology and stakeholder preferences
Memory is stored in .claude/agent-memory/ and accumulates automatically. The Code Reviewer that's seen your codebase 50 times catches things a fresh reviewer never would.
Plan-First Philosophy
The biggest lesson I learned building this system: planning is not overhead — it's the highest-leverage activity.
Early versions jumped straight into execution. The Supervisor would take a goal, generate tasks, and start building. It worked for simple things. For complex features, it produced fragmented, inconsistent code.
The Launch Pad agent changed everything. By spending 60 seconds analyzing the codebase, estimating file impacts, and packaging a structured brief, the execution phase became dramatically more reliable.
The workflow is now:
Raw goal → Launch Pad → Supervisor-Ready Brief → Supervisor → Shipped PR
That brief is typically 200-400 lines of structured analysis. It tells the Supervisor:
- Exactly which files to touch
- Which subtasks can run in parallel
- What patterns to follow
- What risks to watch for
The Supervisor then skips its own discovery phases (saving ~500 tokens of context) and goes straight to execution.
Quality Gates: Trust But Verify
Every piece of code goes through at least two quality gates:
Gate 1 — Plugin Hooks (automated, no extra agents):
-
SubagentStop: Verifies workers produced valid results -
TaskCompleted: Prevents premature task closure
Gate 2 — Code Reviewer (full review):
- Pattern matching against your codebase
- Security checks
- Clear PASS/FAIL/NEEDS_HUMAN decision
Gate 3 — Red Team Reviewer (optional, pre-launch):
- Adversarial audit
- 6 attack vector exploration
- Severity-rated findings
The hook system is lightweight — it uses a fast model with a 30-second timeout. Just enough to catch obvious failures before the expensive review step.
What I'd Do Differently
1. Start with the state management. I retrofitted externalized state in v3. It should have been there from v1. The Context-Keeper pattern (dedicated agent for state mutations) solved coordination bugs that plagued earlier versions.
2. Make skills smaller. Some skills are 200+ lines. The best ones are under 100. Focused knowledge beats comprehensive documentation.
3. Test the prompts, not just the code. Agent behavior is determined by prompts, and prompts are code. I should have built prompt regression tests earlier.
Getting Started
AI Agent Manager is open source and works with any Claude Code installation.
GitHub: github.com/vikashruhilgit/ai-agent-manager
# From the ai-agent-manager directory
/plugin marketplace add ./
/plugin install ai-agent-manager-plugin@ai-agent-manager-marketplace
# In your project
/launch-pad goal: "describe what you want to build"
/supervisor # Let it run
It works with any programming language, any framework, any project structure. The agents read your CLAUDE.md to understand your specific patterns.
The Bigger Picture
AI Agent Manager isn't about replacing developers. It's about giving developers a team.
Most of us work solo or in small teams. We context-switch between planning, coding, reviewing, and debugging. Each switch costs focus. Each role requires a different mindset.
What if your planning mindset was always available? What if code review happened instantly after every change? What if security audits were a command away, not a quarterly event?
That's what an AI team gives you. Not replacement. Amplification.
The agents aren't perfect. They hallucinate sometimes. They miss edge cases. They need human judgment for the hard decisions (that's what NEEDS_HUMAN is for).
But they're tireless, consistent, and they remember everything. And they're getting better every session.
AI Agent Manager is open source. Star it on GitHub: github.com/vikashruhilgit/ai-agent-manager








Top comments (0)