Andrej Karpathy's vision of orchestrating AI agents is now a terminal command away
A year ago, Andrej Karpathy coined "vibe coding"—the gleefully reckless way of prompting AI, accepting everything it spits out, and iterating by pasting error messages back in. It was fun. It was fast. It was chaos.
Last week, Karpathy declared that era over:
"Agentic engineering: AI does the implementation, human owns the architecture, quality, and correctness."
The shift is fundamental. You're no longer prompting a single AI. You're orchestrating a team of agents—coding assistants that execute, test, and refine code while you act as architect, reviewer, and decision-maker.
Karpathy described this at Y Combinator's AI Startup School:
"You're not just coding with AI, you're managing a team of AIs, each playing a specific role in your pipeline. It's like Zapier meets GitHub Copilot meets chaos."
The punchline? Claude Code just shipped this.
It's called Agent Teams, and it turns Karpathy's vision of agentic engineering into something you can run from your terminal today.
What: Your First Agent Swarm
Karpathy talked about "partial autonomy"—tools with an "autonomy slider" that let you dial in how much control AI has, from simple suggestions to fully automated generation. Agent Teams are exactly this, cranked to eleven.
One Claude session acts as the team lead. It spawns teammates—separate Claude instances, each in their own context window—that work independently on assigned tasks. They communicate directly with each other, share a task list, and synthesize findings.
| Karpathy's Vision | Claude Code's Implementation |
|---|---|
| Orchestrating multiple AI agents | Team Lead coordinates teammates |
| Each agent plays a specific role | Teammates spawned with specialized prompts |
| Fast generation-verification loops | Hooks for quality gates at task completion |
| Human owns architecture + correctness | Lead reviews plans, approves before implementation |
| Autonomy slider | Delegate mode restricts lead to coordination-only |
This is Karpathy's "managing a team of AIs" made real. Not one agent doing everything. Multiple specialized agents, coordinated through a shared task list, communicating directly to converge on solutions.
Karpathy also described LLMs as "brilliant interns with perfect recall but no judgment." Agent Teams address this by:
- Requiring plan approval before implementation
- Enforcing quality gates via hooks
- Keeping humans in the loop as decision-makers
The team lead is still Claude. But the human owns the architecture.
How: Building Your First Agent Team
Enable the Feature
// settings.json
{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
}
}
Start a Team
Describe what you want in natural language:
Create an agent team to refactor our authentication module.
Spawn three teammates:
- One for frontend auth components
- One for backend API endpoints
- One for test coverage
Claude creates the team, spawns teammates, assigns work via a shared task list, and synthesizes results.
The Architecture
| Component | Role |
|---|---|
| Team Lead | Creates team, spawns teammates, coordinates work |
| Teammates | Separate Claude instances working assigned tasks |
| Task List | Shared work items teammates claim and complete |
| Mailbox | Messaging for direct inter-agent communication |
Display Modes
In-process mode (default): All teammates run inside your terminal. Use Shift+Up/Down to select and message teammates directly.
Split-pane mode: Each teammate gets its own pane. Requires tmux or iTerm2.
{
"teammateMode": "tmux"
}
Control Patterns
Specify models and count:
Create a team with 4 teammates using Sonnet.
Require plan approval:
Spawn an architect teammate to refactor auth.
Require plan approval before any changes.
The lead reviews plans and either approves or rejects with feedback—exactly like a senior engineer reviewing a junior's approach.
Delegate mode (Shift+Tab twice): Restricts the lead to coordination-only. Pure orchestration, no implementation.
Talk directly to teammates: In split-pane mode, click any pane. In in-process mode, Shift+Up/Down to select, then type.
Quality Gates with Hooks
Karpathy emphasized "fast generation-verification loops where AI generates work and humans quickly audit it." Hooks enforce this:
{
"hooks": {
"TeammateIdle": [{
"hooks": [{
"type": "prompt",
"prompt": "Did the teammate complete all acceptance criteria? Exit code 2 to send back."
}]
}],
"TaskCompleted": [{
"hooks": [{
"type": "prompt",
"prompt": "Are tests passing? Documentation updated? Exit code 2 to prevent completion."
}]
}]
}
}
The autonomy slider in action—you control how much verification happens at each step.
When: Use Cases That Work
Karpathy said we're in the "decade of agents" rather than just the "year of agents." Agent Teams are most effective when parallel exploration adds real value.
Research and Review
A single reviewer gravitates toward one issue type at a time. Split the work:
Create an agent team to review PR #142:
- One focused on security implications
- One checking performance impact
- One validating test coverage
Each applies a different filter. The lead synthesizes across all three.
Competing Hypotheses (Scientific Debate)
When root cause is unclear, make teammates adversarial:
Users report the app exits after one message.
Spawn 5 teammates to investigate different hypotheses.
Have them debate and try to disprove each other's theories.
Update the findings doc with consensus.
The theory that survives multiple challenges is more likely correct.
New Modules or Features
Teammates each own a separate piece:
Create a team for the new notification system:
- Frontend components
- Backend services
- Database schema and migrations
Cross-Layer Coordination
Changes spanning frontend, backend, and tests—each owned by a different teammate with the lead coordinating handoffs.
When NOT to Use
Karpathy noted that fully autonomous agents still "get tripped up just a couple of steps into a job." Agent Teams add coordination overhead and use significantly more tokens. Avoid them for:
- Sequential tasks where each step depends on the previous
- Same-file edits where teammates would conflict
- High-dependency work requiring constant context sharing
- Simple tasks where coordination overhead exceeds benefit
For these, use subagents instead—they spawn within a single session and report back.
| Subagents | Agent Teams | |
|---|---|---|
| Context | Results return to caller | Fully independent |
| Communication | Report to main agent only | Message each other |
| Coordination | Main agent manages all | Self-coordination via task list |
| Best for | Focused tasks, only result matters | Complex work needing discussion |
| Token cost | Lower | Higher |
The Compound Effect
Karpathy's vision of Software 3.0 is that "the role of the engineer shifts from direct author to orchestrator." With Agent Teams + Skills + Hooks, this compounds:
Week 1: Single Claude session. Manual coordination. Frequent context switches.
Week 4: Agent Teams for complex work. Teammates inherit shared CLAUDE.md learnings. Task lists capture institutional knowledge.
Week 12: Your project runs like Karpathy's agentic engineering vision—specialized agents, coordinated workflows, quality gates enforced by hooks, learnings persisted across sessions.
The pieces are all here:
- Skills package domain knowledge that grows over time
- Hooks create verification loops
- Agent Teams enable parallel orchestration
- CLAUDE.md stores accumulated learnings every teammate inherits
The Bottom Line
Karpathy said agentic engineering is "a serious engineering discipline involving autonomous agents." It's professionally legible. You can say it to your VP of Engineering without embarrassment.
Agent Teams make this concrete:
Vibe coding = YOLO
Agentic engineering = AI does implementation, human owns architecture
The infrastructure is here. Multiple Claude instances. Shared task lists. Direct inter-agent messaging. Quality gates via hooks. Human-in-the-loop plan approval.
The decade of agents starts with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.
Start with research and review. The rest follows.
Resources
- Agent Teams Documentation
- Subagents Documentation
- Hooks Reference
- Andrej Karpathy: Software Is Changing (Again) - YC AI Startup School
- Addy Osmani: Agentic Engineering
How are you using multi-agent orchestration in your workflows? Share your patterns in the comments.
Top comments (0)