Dorothy J Aubrey

Posted on Feb 23 • Originally published at github.com

Mother CLAUDE: Custom Agents, or How We Accidentally Built a Team

#ai #productivity #automation #devrel

TL;DR: Custom agents let you split one generalist AI into a team of specialists—each with its own context, tools, and personality. Every major AI coding tool now supports them. And if you've been building a documentation system like the one in this series, you already have the blueprints for your agents. You just didn't know it yet.

Who this is for: Anyone who's gotten comfortable with AI coding assistants and is ready for the next step—giving them specialized roles instead of asking one AI to do everything.

Part 7 of the Designing AI Teammates series. Yes, Part 7. I know Part 6 was literally called "Clean Your Room and Eat Your Vegetables" and had a whole "thank you for reading all six parts" sign-off. I put a bow on it. I said "now go eat your vegetables." I was done. I'd been saving that line since the first article — six parts of setup for one perfect closing metaphor.

And then—like every project I've ever shipped—I immediately thought of One. More. Thing. (The last. I swear. For now.)

In my defense, this one isn't scope creep. This is the post-credits scene where Nick Fury shows up and says "I'd like to talk to you about the Avengers Initiative." Except instead of assembling superheroes, we're assembling... markdown files with YAML frontmatter. Which is very on brand for this series, and so much more exciting, yes?

Here's what happened: we'd been aware of custom agents for a while—had even set up a few simple ones. Claude Code introduced them around mid-2025, and by now every major tool has some version. But I was thinking about them as a feature to configure, not as something we'd already built.

Then one day I was staring at our Mother CLAUDE documentation and it hit me: we'd already written the agents. We just hadn't promoted them yet.

Custom Agents: The Feature That's Been Hiding in Plain Sight

Custom agents aren't new. They've been rolling out across AI coding tools since mid-2025. By now, every major tool has some version:

Tool	What They Call It	Format	Available Since
Claude Code	Custom subagents	`.claude/agents/*.md` (YAML frontmatter + markdown)	~Mid 2025
Cursor	Subagents + Skills	`SKILL.md` + subagent config	Early 2026
Windsurf	Agent instructions	`AGENTS.md` (plain markdown)	2025
GitHub Copilot	Custom agents	`.github-private` repo configs	2025-2026
Cline	MCP extensions	Model Context Protocol tools	2025

The implementations vary. The concept is identical: stop asking one AI to do everything, and start giving specialists their own context, tools, and instructions.

And yes—they're still just markdown files. Our tagline lives on.

So why am I writing about this now? Because having the feature and using it well are different things. The agents we built weren't designed in a vacuum—they fell naturally out of the documentation system we'd been building for months. That's the interesting part.

The Problem Custom Agents Solve

Throughout this series, we've been giving Claude more responsibility:

Part	What We Did	How
Part 1	Taught it our codebase	CLAUDE.md
Part 2	Gave it memory	Session handoffs
Part 3	Made it automatic	Hooks
Part 4	Assigned quality enforcement	Checkpoint checklist
Part 5	Gave it permission to push back	Collaboration preferences
Part 6	Wrapped it all in a metaphor	Vegetables

All of that worked. But it created a new problem: one AI doing everything in the same context window.

Picture your most productive coworker. Now imagine asking them to simultaneously:

Manage your Jira tickets
Review code quality against a 30-item checklist
Search across 9 repositories for cross-repo impact
AND build the feature you actually need

No human team would operate this way. You'd have a project manager, a code reviewer, and a platform architect—each with clear responsibility boundaries and their own specialty.

Custom agents apply that same organizational structure to AI.

Each agent gets:

Context isolation — Its own context window, no interference between tasks
Specific tools — The Jira agent doesn't need file editing; the code reviewer doesn't need curl
A focused system prompt — Instructions for one job, not twenty
Automatic delegation — The AI routes tasks to the right specialist without anyone asking

The result is lower cognitive load for everyone. Engineers stop context-switching between "build the feature" and "manage the ticket" and "check the quality." Each concern has an owner.

The Moment It Clicked

We were working on a multi-repo platform—five repositories across PHP, React Native, Node.js, and GitHub Actions. Mother CLAUDE had grown to include:

Jira integration instructions (project codes, API patterns, smart commits)
A checkpoint checklist for quality reviews
A cross-project path map showing how all repos connect
Role-based collaboration preferences

All of it lived in CLAUDE.md files and shared documentation. It worked. But every session, Claude was loading all of this context—the Jira patterns, the quality checklist, the repo map—even when it only needed one piece.

Then we looked at the custom agents feature and had the obvious-in-retrospect realization:

Each section of our documentation was already a specialist agent. We just hadn't given them their own rooms yet.

The Jira instructions? That's a project management agent.
The checkpoint checklist? That's a quality review agent.
The cross-repo path map? That's an architecture analysis agent.

We didn't need to invent agents from scratch. We needed to promote existing documentation into standalone specialists.

What We Built

Three agents, each extracted from documentation we already had. Here's how they work—sanitized so you can adapt them for your own projects.

Agent 1: The Project Manager

What it does: Manages tickets, links commits to issues, creates tickets when work isn't tracked.

Extracted from: The Jira integration section of our shared CLAUDE.md.

---
name: project-tracker
description: Ticket management and smart commits. Use when creating,
  updating, or querying project tickets, or when preparing commits
  that should reference tracked work.
tools: Bash, Read, Grep, Glob
model: sonnet
---

You are a project management agent.

## Connection Details

- **Instance**: `your-instance.atlassian.net`
- **Auth**: Basic auth using `$JIRA_EMAIL` and `$JIRA_TOKEN` environment variables
- **API**: REST API v3

## Determining the Right Project

Detect from the current working directory:
- Path contains `frontend` → FE
- Path contains `backend` → BE
- Path contains `infrastructure` → INFRA

## Smart Commits

**Every commit should reference a ticket.** This is a core workflow rule.

1. Check if there's an existing ticket for the work being done
2. If no ticket exists, create one
3. Format the commit message with the ticket key at the start
4. After committing, add a brief comment to the ticket
5. If the commit completes the work, transition the ticket to Done

Why it's an agent, not a CLAUDE.md instruction: This needs Bash access for API calls, runs independently from feature work, and benefits from its own context window so ticket management doesn't pollute your coding session.

Agent 2: The Code Reviewer

What it does: Runs instant retrospectives against your team's quality standards.

Extracted from: The checkpoint checklist (Part 4 of this series).

---
name: retro
description: Quality checkpoint and code review. Use when a feature
  is complete, before creating a commit, at end of session, after
  fixing a bug, or after a large refactoring.
tools: Read, Grep, Glob, Bash
model: sonnet
---

You are a quality review agent. You run instant retrospectives.

## The Meta Question

Ask this first: **"If a new developer (or a fresh AI session) looked
at this code tomorrow, would they understand it without anyone
explaining anything?"**

## What to Review

### 1. Get Context

Check git diff for staged and unstaged changes.
Read the changed files.

### 2. Run the Checklist

**Architecture & Design**
- Does each file/function have a single clear responsibility?
- Are there hardcoded values that should be constants or config?

**Code Quality**
- Any magic numbers or strings?
- Duplicated logic that should be extracted?
- Dead code or unused imports?

**Security**
- SQL injection risk? (Parameterized queries only)
- XSS risk? (Escape user input)
- Hardcoded credentials?

**Testing**
- Do changed components have corresponding tests?
- Are test names descriptive?

## How to Report

**Critical** (must fix before commit)
**Warning** (should fix soon)
**Suggestion** (consider)
**What's Good** (always include — call out smart decisions)

## Tone

Be direct but constructive. You're a teammate, not a critic.
If everything looks clean, say so — don't manufacture issues.

Why it's an agent, not a hook: A hook can trigger a review, but it can't reason about code quality. The retro agent reads files, understands context, applies judgment, and produces a structured report. That requires an agent's reasoning capabilities.

Agent 3: The Architect

What it does: Searches across all your repositories to analyze cross-project impact.

Extracted from: The project paths table in our shared CLAUDE.md.

---
name: cross-repo
description: Search and analyze across all repositories. Use when
  checking impact of changes or understanding how code connects
  across projects.
tools: Read, Grep, Glob, Bash
model: sonnet
---

You are a cross-repository analysis agent.

## Repository Locations

| Repo | Path | Stack |
|------|------|-------|
| Frontend | `/path/to/frontend` | React/TypeScript |
| Backend API | `/path/to/api` | Node.js/Express |
| Admin Dashboard | `/path/to/admin` | PHP |
| CI/CD | `/path/to/pipelines` | GitHub Actions |
| Shared Docs | `/path/to/docs` | Markdown |

## How the Repos Connect

Frontend  ──calls──>  Backend API
    │                      │
    │ built by             │ monitored by
    v                      v
  CI/CD              Admin Dashboard

## Common Tasks

### Impact Analysis
When asked "what would break if I change X?":
1. Search ALL repos for references
2. Check code AND documentation references
3. Flag cross-repo contracts (API endpoints, config keys)
4. Report findings grouped by repo with file paths

### Sync Check
When asked "are the repos in sync?":
1. Check that API endpoints match what the frontend calls
2. Check that feature flag names match between API and frontend
3. Check that shared documentation is up to date

Why it's an agent, not a script: Cross-repo analysis requires reading files across multiple directories, reasoning about connections, and producing a nuanced report. A grep script finds text matches. An agent understands meaning.

The Design Decision: Agent vs. Hook vs. CLAUDE.md

This is the question nobody else seems to be answering. You have three ways to extend your AI's behavior. When do you use which?

Mechanism	Best For	Example
CLAUDE.md	Always-on context, preferences, standards	"Use NativeWind for styling"
Hooks	Automatic triggers, simple actions, no reasoning	Auto-save session handoff on compact
Custom Agents	Specialized tasks requiring reasoning + tools	Quality review, cross-repo analysis

Use CLAUDE.md when:

The information is relevant to every task
It's preferences or standards, not a workflow
It fits in under 100 lines (context cost matters)
Example: "Dorothy appreciates proactive suggestions"

Use Hooks when:

The action should be automatic and invisible
No reasoning is needed—just trigger → script → done
Speed matters (hooks should complete in seconds)
Example: Generate session handoff when context compacts

Use Custom Agents when:

The task requires reading files, searching code, and making judgments
It benefits from isolated context (separate from your main work)
It needs specific tools that other tasks don't
It would pollute your main context if done inline
Example: "Review this code against our quality checklist"

The Spectrum

CLAUDE.md          Hooks              Agents
│                  │                  │
│ Passive context  │ Automatic        │ Active reasoning
│ Always loaded    │ Event-triggered  │ On-demand
│ No tools         │ Scripts only     │ Full tool access
│ Zero cost        │ Low cost         │ Higher cost
│                  │                  │
└──── Simpler ─────┴──── More capable ┘

Rules of thumb:

If Claude needs to know it → CLAUDE.md
If it should happen automatically without thinking → Hook
If it requires judgment and tools → Agent

How Agents Get Invoked

This is one of the best parts: you don't have to manually call them.

When you create an agent with a good description, your AI assistant matches tasks to agents automatically. The description field in the agent definition is the key:

---
name: retro
description: Quality checkpoint and code review. Use when a feature
  is complete, before creating a commit, at end of session...
---

When you say "let's do a quick review before committing," Claude sees the word "review," matches it to the retro agent's description, and delegates automatically.

You can also invoke them explicitly—Claude Code uses @retro or the /agents command, Cursor uses subagent configuration, Windsurf reads AGENTS.md. The syntax varies; the concept is the same.

The Tool-Agnostic Pattern

If you've been following this series, you know our recurring theme: it's all just markdown files.

Custom agents continue that pattern. Regardless of which tool you use:

Write a markdown file describing the agent's role
Include the knowledge it needs (extracted from your docs)
Specify its tools (what it's allowed to do)
Place it where your tool expects to find it

The format varies slightly:

Claude Code (~/.claude/agents/retro.md):

---
name: retro
description: Quality review agent
tools: Read, Grep, Glob, Bash
model: sonnet
---
Your system prompt here...

Windsurf (AGENTS.md):

## Code Review Agent
When reviewing code, follow these steps...

Cursor: Uses SKILL.md files and subagent configuration in project settings.

GitHub Copilot: Custom agent definitions in .github-private repository.

The content is transferable even if the container isn't. If you write a solid quality review prompt for Claude Code, you can adapt it for Cursor or Windsurf in minutes. The intelligence is in the instructions, not the format.

The Progression: Documentation → Agents

Looking back at the whole series, there's a clear evolution:

Part 1: Write documentation        (passive knowledge)
    ↓
Part 2: Persist it across sessions (memory)
    ↓
Part 3: Automate the workflow      (hooks)
    ↓
Part 4: Assign responsibilities    (quality enforcement)
    ↓
Part 5: Shape the relationship     (permission to act)
    ↓
Part 6: Wrap it in a metaphor      (vegetables)
    ↓
Part 7: Split into specialists     (custom agents)  ← you are here

Each step built on the last. And here's the insight that surprised us:

We didn't build agents. We promoted documentation.

The Jira agent isn't new logic we invented. It's the Jira section of CLAUDE.md, given its own context window and tool access.

The retro agent isn't a new quality system. It's the checkpoint checklist from Part 4, given the ability to read files and report findings.

The cross-repo agent isn't a new architecture tool. It's the project paths table from Mother CLAUDE, given permission to search across directories.

If you've been building good documentation, you're already building agents. You just haven't extracted them yet.

Scoping: Global vs. Project

One decision you'll face: should an agent be available everywhere or just in one project?

Scope	Location (Claude Code)	When to Use
Global	`~/.claude/agents/`	Cross-project workflows (Jira, cross-repo analysis)
Project	`.claude/agents/`	Project-specific standards (your app's review rules)

Our three agents are all global—they work across every project. But you might want project-specific agents for things like:

A migration agent that knows your specific database schema
A testing agent that knows your specific test framework patterns
A deploy agent that knows your specific CI/CD pipeline

Start global. Move to project-specific when you need specialization that doesn't apply everywhere.

Model Selection: Not Everything Needs the Big Brain

Most agent frameworks let you choose which model powers each agent. This matters for cost and speed:

Task	Model	Why
Quality review	Sonnet/GPT-4o	Needs nuance and judgment
Jira ticket management	Sonnet	Needs to follow API patterns correctly
Cross-repo search	Sonnet	Needs to reason about connections
Session handoff generation	Haiku/GPT-4o-mini	Structured summarization—doesn't need genius

Rule of thumb: Use the smartest model for tasks requiring judgment. Use the cheapest model for tasks that are mostly structured formatting.

What We Didn't Expect

1. Context Window Relief

The biggest surprise wasn't the agents themselves—it was what happened to the main session. With Jira management, quality reviews, and cross-repo searches handled by specialists, the main context window stays focused on actual feature work.

Before: One cluttered conversation trying to do everything for everyone.
After: A clean session focused on building, with specialists handling the side quests.

2. Automatic Delegation Actually Works

We expected engineers would need to manually invoke agents. In practice, Claude (and Cursor, and others) routes tasks to agents automatically based on the description. Someone says "let's review this before committing" and the retro agent spins up. "Check if this change breaks the API" and the cross-repo agent takes over.

The descriptions are the routing mechanism. Write good descriptions, get good routing.

3. Agents Reference Your Documentation

Because our agents were extracted from Mother CLAUDE, they naturally reference the same standards. The retro agent checks against the checkpoint checklist that lives in shared docs. The cross-repo agent uses the same path table that Mother CLAUDE maintains.

The agents don't replace the documentation. They operationalize it.

How to Start

If You Have Mother CLAUDE (or any documentation system):

Identify the sections of your docs that describe workflows, not just context
Extract each workflow into its own markdown file
Add the agent frontmatter (name, description, tools)
Place it in your tool's agent directory
Test it by describing a task that matches the agent's description

If You're Starting From Scratch:

Pick one recurring task you do in every session (code review, ticket management, etc.)
Write down the steps you follow (this is your agent's system prompt)
Specify what tools the agent needs
Save it as a markdown file in the right location
Iterate based on what the agent gets right and wrong

The Mother CLAUDE Agent Starter Kit

We've added sanitized versions of all three agents to the Mother CLAUDE repo:

mother-claude/
├── agents/
│   ├── project-tracker.md    # Jira/ticket management
│   ├── retro.md              # Quality review
│   ├── cross-repo.md         # Multi-repo analysis
│   ├── migration-planner.md  # SQL migration validation
│   └── docs-writer.md        # Documentation maintenance

These are templates. Fill in your project paths, your ticket system details, your quality standards. The structure is ready; the specifics are yours.

What Else Could You Build?

We started with three agents. We've since added two more—and the ideas keep coming. Here's what we've found works as agents versus what's better left to other tools.

Agents We've Added

Migration Planner — We kept shipping SQL migrations that referenced columns that didn't exist yet, or forgot foreign key constraints. A grep script catches typos. An agent reads your schema, understands relationships, and validates that your migration will actually run.

---
name: migration-planner
description: Validate SQL migrations against the current schema.
  Use before running migrations or when planning schema changes.
tools: Read, Grep, Glob, Bash
---

Docs Writer — Code changes, documentation doesn't. The docs writer watches what you changed and updates the relevant CLAUDE.md, shared docs, and README sections. It knows your documentation architecture (because it was extracted from it).

---
name: docs-writer
description: Update project documentation when code changes.
  Use after features, refactors, or API changes.
tools: Read, Grep, Glob, Edit, Write
---

Agents Worth Considering

Agent	What It Does	Why It Needs Reasoning
Security auditor	OWASP checks, secret scanning, dependency vulnerabilities	Needs to understand context, not just pattern match
Test analyst	Run tests, analyze failures, suggest fixes	Why did a test fail, not just which test failed
Dependency manager	Outdated packages, breaking changes, upgrade paths	Needs judgment about risk and compatibility

What's Better as Hooks (Not Agents)

Tool	Why Not an Agent
Linting (ESLint, PHPStan)	Just run the command — no reasoning needed
Formatting (Prettier, php-cs-fixer)	Pre-commit hook, done
Type checking (tsc, phpstan)	Automated pass/fail

The dividing line: if it needs to read, reason, and report — it's an agent. If it just needs to run — it's a hook.

Common Objections

"This seems like overkill for a small team"

Fair point for simple projects. But if your team is working across multiple repos, or if the same quality checks keep getting skipped under deadline pressure, or if ticket management is eating into engineering time—agents pay for themselves fast.

Start with one. The retro agent is the easiest win: extract your quality checklist, give it file access, and let it enforce standards consistently regardless of who's coding or when.

"Won't this cost more in API tokens?"

Each agent uses its own context window, so yes—there's a cost. But agent context windows are typically smaller and more focused than your main session. A retro agent that reads 5 changed files and produces a review costs a fraction of what your main session costs over an hour.

The trade-off: slightly higher token cost for significantly better focus and quality.

"My tool doesn't support custom agents yet"

The pattern still works. Write the agent as a markdown file. When you need that specialist's help, paste the relevant instructions into your session. It's manual, but the documentation is ready for when your tool catches up.

Remember: they're just markdown files.

"What about MCP servers?"

MCP (Model Context Protocol) servers and custom agents solve different problems. MCP gives your AI tools — structured access to external APIs like Jira, GitHub, or databases. Agents give your AI reasoning — a focused context, specific instructions, and judgment about how to use those tools.

They can work together: an agent can use MCP tools. But here's the thing — for a lot of use cases, an agent with curl and environment variables is simpler and more reliable than an MCP server.

Our Jira agent exists because the Atlassian MCP server kept dropping authorization. We replaced it with an agent that uses curl + $JIRA_EMAIL + $JIRA_TOKEN. No external dependency. No mysterious auth failures. Just HTTP calls we can debug ourselves.

If your MCP servers are rock solid, great — use them. If they're flaky, an agent with direct API calls might be the more pragmatic choice.

"How is this different from just having a really good CLAUDE.md?"

CLAUDE.md loads everything into one context. Agents get their own context. The difference is focus.

When your CLAUDE.md grows to include Jira patterns, quality checklists, cross-repo maps, AND project-specific context—you're paying context costs for everything, all the time. Agents load only when needed, with only what they need.

Think of it this way: CLAUDE.md is your team's shared wiki. Agents are individual team members who've read the wiki and specialize in their area.

The Core Insight

Documentation is the training data for your agents. Organizational structure is the design pattern.

This isn't about prompts or model selection or clever tricks. It's about applying the same principles you'd use to build a high-functioning human team — clear responsibility boundaries, context isolation, separation of concerns — to your AI tooling.

Every section of your CLAUDE.md that describes a workflow is a specialist waiting to be born. Every checklist is a quality agent. Every integration guide is an automation agent. Every cross-project map is an architecture agent.

The tools — Claude Code, Cursor, Windsurf, Copilot — just gave us the runtime.

We thought Part 6 was the end. Turns out, it was the foundation. The "vegetables" we'd been eating (documentation, handoffs, quality checks, collaboration preferences) weren't just good habits. They were blueprints for a team of specialists—and the tools had been ready for months. We just needed to look at our own docs from the right angle.

Mother CLAUDE isn't just a documentation system anymore. She's a team lead.

This article was written collaboratively with Claude, who—true to the Permission Effect from Part 5—suggested three structural changes to this article that I didn't ask for. All three made it better. The system continues to work.

Resources

The Mother CLAUDE system, including sanitized agent templates, is open source:

GitHub: github.com/Kobumura/mother-claude
Agent templates: agents/ directory
All articles: articles/devto/
Hook scripts: hooks/

Fork it, adapt it, build your own team.

Licensed under CC BY 4.0. Free to use and adapt with attribution to Dorothy J. Aubrey.

Top comments (2)

Comment deleted

Dorothy J Aubrey • Feb 23

So, it's actually very interesting and timely that you brought this up. RIGHT after I posted this article, this showed up in my feed dev.to/dannwaneri/who-said-what-to... which is a different and much more serious problem than what I was initially addressing.

Our agents work well because they have cleanly separated tasks - the retro agent reviews code, the Jira agent manages tickets, the cross-repo agent searches. They don't step on each other because they're doing different jobs. At the orchestration level, it's simple: the main session delegates, gets results back, decides what's next. And, I also have high, electric fence guardrails that my coding agents work within.

But Daniel is pointing at something underneath that that is now really bouncing around my brain: the infrastructure itself doesn't reliably track who said what. Shared session IDs across subagents, transcript gaps, permission bypass, turn-order confusion. Those are architectural issues that affect everyone using subagents, REGARDLESS of how cleanly you've separated responsibilities, so now I feel like this is coming back at me like a boomerang sometime soon. :D becuase we haven't been bitten by this yet -- in all likelihood because part of my concern, on the org level, is to make sure we eat our veggies and keep our room clean -- so our agents are read-heavy (search, review, query) rather than write-heavy. The retro agent reads code and reports findings. The cross-repo agent searches files. The Jira agent makes API calls to an external system. None of them are making commits or editing files in parallel with the main session.

Our main use of coding agents is in mobile apps (React Native) so our uses are very small and isolated and we don't get a lot of churning as far as changes in our libraries (and one of our uses is actually a build-your-own-app-bring-your-own-ai-assistant -- and these are the vibe users for whom I originally started writing these articles :) ) Also, on our platform, because I found early on that sometimes my overly eager AI assistant brought some weird offerings to the party, we have what a lot of people would consider OVER testing -- so I think we catch bad code early, when it's still quick, cheap, and easy to fix. But if I had agents that were writing code alongside each other Daniel's concerns would keep me up at night. The turn-order confusion he describes -- Claude acting on its own output as if it were a user instruction -- that's not a coordination problem you can design around with better agent descriptions. That's a platform issue.

I think the honest answer to your question is: inter-agent coordination works fine at the task level if your agents have clear boundaries. But at the infrastructure level, the traceability and accountability problems Daniel documents are real and unresolved. For now, our mitigation is keeping agents focused on reading/analyzing/querying rather than making concurrent writes. But now that I have some more insight into this, I realize that's a workaround, not a solution.