DEV Community

Ashok Naik
Ashok Naik

Posted on

Agent Teams: Agentic Engineering Comes to Claude Code

Andrej Karpathy's vision of orchestrating AI agents is now a terminal command away


A year ago, Andrej Karpathy coined "vibe coding"—the gleefully reckless way of prompting AI, accepting everything it spits out, and iterating by pasting error messages back in. It was fun. It was fast. It was chaos.

Last week, Karpathy declared that era over:

"Agentic engineering: AI does the implementation, human owns the architecture, quality, and correctness."

The shift is fundamental. You're no longer prompting a single AI. You're orchestrating a team of agents—coding assistants that execute, test, and refine code while you act as architect, reviewer, and decision-maker.

Karpathy described this at Y Combinator's AI Startup School:

"You're not just coding with AI, you're managing a team of AIs, each playing a specific role in your pipeline. It's like Zapier meets GitHub Copilot meets chaos."

The punchline? Claude Code just shipped this.

It's called Agent Teams, and it turns Karpathy's vision of agentic engineering into something you can run from your terminal today.


What: Your First Agent Swarm

Karpathy talked about "partial autonomy"—tools with an "autonomy slider" that let you dial in how much control AI has, from simple suggestions to fully automated generation. Agent Teams are exactly this, cranked to eleven.

One Claude session acts as the team lead. It spawns teammates—separate Claude instances, each in their own context window—that work independently on assigned tasks. They communicate directly with each other, share a task list, and synthesize findings.

Karpathy's Vision Claude Code's Implementation
Orchestrating multiple AI agents Team Lead coordinates teammates
Each agent plays a specific role Teammates spawned with specialized prompts
Fast generation-verification loops Hooks for quality gates at task completion
Human owns architecture + correctness Lead reviews plans, approves before implementation
Autonomy slider Delegate mode restricts lead to coordination-only

This is Karpathy's "managing a team of AIs" made real. Not one agent doing everything. Multiple specialized agents, coordinated through a shared task list, communicating directly to converge on solutions.

Karpathy also described LLMs as "brilliant interns with perfect recall but no judgment." Agent Teams address this by:

  • Requiring plan approval before implementation
  • Enforcing quality gates via hooks
  • Keeping humans in the loop as decision-makers

The team lead is still Claude. But the human owns the architecture.


How: Building Your First Agent Team

Enable the Feature

// settings.json
{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}
Enter fullscreen mode Exit fullscreen mode

Start a Team

Describe what you want in natural language:

Create an agent team to refactor our authentication module.
Spawn three teammates:
- One for frontend auth components
- One for backend API endpoints
- One for test coverage
Enter fullscreen mode Exit fullscreen mode

Claude creates the team, spawns teammates, assigns work via a shared task list, and synthesizes results.

The Architecture

Component Role
Team Lead Creates team, spawns teammates, coordinates work
Teammates Separate Claude instances working assigned tasks
Task List Shared work items teammates claim and complete
Mailbox Messaging for direct inter-agent communication

Display Modes

In-process mode (default): All teammates run inside your terminal. Use Shift+Up/Down to select and message teammates directly.

Split-pane mode: Each teammate gets its own pane. Requires tmux or iTerm2.

{
  "teammateMode": "tmux"
}
Enter fullscreen mode Exit fullscreen mode

Control Patterns

Specify models and count:

Create a team with 4 teammates using Sonnet.
Enter fullscreen mode Exit fullscreen mode

Require plan approval:

Spawn an architect teammate to refactor auth.
Require plan approval before any changes.
Enter fullscreen mode Exit fullscreen mode

The lead reviews plans and either approves or rejects with feedback—exactly like a senior engineer reviewing a junior's approach.

Delegate mode (Shift+Tab twice): Restricts the lead to coordination-only. Pure orchestration, no implementation.

Talk directly to teammates: In split-pane mode, click any pane. In in-process mode, Shift+Up/Down to select, then type.

Quality Gates with Hooks

Karpathy emphasized "fast generation-verification loops where AI generates work and humans quickly audit it." Hooks enforce this:

{
  "hooks": {
    "TeammateIdle": [{
      "hooks": [{
        "type": "prompt",
        "prompt": "Did the teammate complete all acceptance criteria? Exit code 2 to send back."
      }]
    }],
    "TaskCompleted": [{
      "hooks": [{
        "type": "prompt",
        "prompt": "Are tests passing? Documentation updated? Exit code 2 to prevent completion."
      }]
    }]
  }
}
Enter fullscreen mode Exit fullscreen mode

The autonomy slider in action—you control how much verification happens at each step.


When: Use Cases That Work

Karpathy said we're in the "decade of agents" rather than just the "year of agents." Agent Teams are most effective when parallel exploration adds real value.

Research and Review

A single reviewer gravitates toward one issue type at a time. Split the work:

Create an agent team to review PR #142:
- One focused on security implications
- One checking performance impact
- One validating test coverage
Enter fullscreen mode Exit fullscreen mode

Each applies a different filter. The lead synthesizes across all three.

Competing Hypotheses (Scientific Debate)

When root cause is unclear, make teammates adversarial:

Users report the app exits after one message.
Spawn 5 teammates to investigate different hypotheses.
Have them debate and try to disprove each other's theories.
Update the findings doc with consensus.
Enter fullscreen mode Exit fullscreen mode

The theory that survives multiple challenges is more likely correct.

New Modules or Features

Teammates each own a separate piece:

Create a team for the new notification system:
- Frontend components
- Backend services
- Database schema and migrations
Enter fullscreen mode Exit fullscreen mode

Cross-Layer Coordination

Changes spanning frontend, backend, and tests—each owned by a different teammate with the lead coordinating handoffs.


When NOT to Use

Karpathy noted that fully autonomous agents still "get tripped up just a couple of steps into a job." Agent Teams add coordination overhead and use significantly more tokens. Avoid them for:

  • Sequential tasks where each step depends on the previous
  • Same-file edits where teammates would conflict
  • High-dependency work requiring constant context sharing
  • Simple tasks where coordination overhead exceeds benefit

For these, use subagents instead—they spawn within a single session and report back.

Subagents Agent Teams
Context Results return to caller Fully independent
Communication Report to main agent only Message each other
Coordination Main agent manages all Self-coordination via task list
Best for Focused tasks, only result matters Complex work needing discussion
Token cost Lower Higher

The Compound Effect

Karpathy's vision of Software 3.0 is that "the role of the engineer shifts from direct author to orchestrator." With Agent Teams + Skills + Hooks, this compounds:

Week 1: Single Claude session. Manual coordination. Frequent context switches.

Week 4: Agent Teams for complex work. Teammates inherit shared CLAUDE.md learnings. Task lists capture institutional knowledge.

Week 12: Your project runs like Karpathy's agentic engineering vision—specialized agents, coordinated workflows, quality gates enforced by hooks, learnings persisted across sessions.

The pieces are all here:

  • Skills package domain knowledge that grows over time
  • Hooks create verification loops
  • Agent Teams enable parallel orchestration
  • CLAUDE.md stores accumulated learnings every teammate inherits

The Bottom Line

Karpathy said agentic engineering is "a serious engineering discipline involving autonomous agents." It's professionally legible. You can say it to your VP of Engineering without embarrassment.

Agent Teams make this concrete:

Vibe coding = YOLO
Agentic engineering = AI does implementation, human owns architecture

The infrastructure is here. Multiple Claude instances. Shared task lists. Direct inter-agent messaging. Quality gates via hooks. Human-in-the-loop plan approval.

The decade of agents starts with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.

Start with research and review. The rest follows.


Resources


How are you using multi-agent orchestration in your workflows? Share your patterns in the comments.

Top comments (0)