Ashok Naik

Posted on Feb 10

Agent Teams: Agentic Engineering Comes to Claude Code

#ai #claudecode #agents #programming

Andrej Karpathy's vision of orchestrating AI agents is now a terminal command away

A year ago, Andrej Karpathy coined "vibe coding"—the gleefully reckless way of prompting AI, accepting everything it spits out, and iterating by pasting error messages back in. It was fun. It was fast. It was chaos.

Last week, Karpathy declared that era over:

"Agentic engineering: AI does the implementation, human owns the architecture, quality, and correctness."

The shift is fundamental. You're no longer prompting a single AI. You're orchestrating a team of agents—coding assistants that execute, test, and refine code while you act as architect, reviewer, and decision-maker.

Karpathy described this at Y Combinator's AI Startup School:

"You're not just coding with AI, you're managing a team of AIs, each playing a specific role in your pipeline. It's like Zapier meets GitHub Copilot meets chaos."

The punchline? Claude Code just shipped this.

It's called Agent Teams, and it turns Karpathy's vision of agentic engineering into something you can run from your terminal today.

What: Your First Agent Swarm

Karpathy talked about "partial autonomy"—tools with an "autonomy slider" that let you dial in how much control AI has, from simple suggestions to fully automated generation. Agent Teams are exactly this, cranked to eleven.

One Claude session acts as the team lead. It spawns teammates—separate Claude instances, each in their own context window—that work independently on assigned tasks. They communicate directly with each other, share a task list, and synthesize findings.

Karpathy's Vision	Claude Code's Implementation
Orchestrating multiple AI agents	Team Lead coordinates teammates
Each agent plays a specific role	Teammates spawned with specialized prompts
Fast generation-verification loops	Hooks for quality gates at task completion
Human owns architecture + correctness	Lead reviews plans, approves before implementation
Autonomy slider	Delegate mode restricts lead to coordination-only

This is Karpathy's "managing a team of AIs" made real. Not one agent doing everything. Multiple specialized agents, coordinated through a shared task list, communicating directly to converge on solutions.

Karpathy also described LLMs as "brilliant interns with perfect recall but no judgment." Agent Teams address this by:

Requiring plan approval before implementation
Enforcing quality gates via hooks
Keeping humans in the loop as decision-makers

The team lead is still Claude. But the human owns the architecture.

How: Building Your First Agent Team

Enable the Feature

// settings.json
{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

Start a Team

Describe what you want in natural language:

Create an agent team to refactor our authentication module.
Spawn three teammates:
- One for frontend auth components
- One for backend API endpoints
- One for test coverage

Claude creates the team, spawns teammates, assigns work via a shared task list, and synthesizes results.

The Architecture

Component	Role
Team Lead	Creates team, spawns teammates, coordinates work
Teammates	Separate Claude instances working assigned tasks
Task List	Shared work items teammates claim and complete
Mailbox	Messaging for direct inter-agent communication

Display Modes

In-process mode (default): All teammates run inside your terminal. Use Shift+Up/Down to select and message teammates directly.

Split-pane mode: Each teammate gets its own pane. Requires tmux or iTerm2.

{
  "teammateMode": "tmux"
}

Control Patterns

Specify models and count:

Create a team with 4 teammates using Sonnet.

Require plan approval:

Spawn an architect teammate to refactor auth.
Require plan approval before any changes.

The lead reviews plans and either approves or rejects with feedback—exactly like a senior engineer reviewing a junior's approach.

Delegate mode (Shift+Tab twice): Restricts the lead to coordination-only. Pure orchestration, no implementation.

Talk directly to teammates: In split-pane mode, click any pane. In in-process mode, Shift+Up/Down to select, then type.

Quality Gates with Hooks

Karpathy emphasized "fast generation-verification loops where AI generates work and humans quickly audit it." Hooks enforce this:

{
  "hooks": {
    "TeammateIdle": [{
      "hooks": [{
        "type": "prompt",
        "prompt": "Did the teammate complete all acceptance criteria? Exit code 2 to send back."
      }]
    }],
    "TaskCompleted": [{
      "hooks": [{
        "type": "prompt",
        "prompt": "Are tests passing? Documentation updated? Exit code 2 to prevent completion."
      }]
    }]
  }
}

The autonomy slider in action—you control how much verification happens at each step.

When: Use Cases That Work

Karpathy said we're in the "decade of agents" rather than just the "year of agents." Agent Teams are most effective when parallel exploration adds real value.

Research and Review

A single reviewer gravitates toward one issue type at a time. Split the work:

Create an agent team to review PR #142:
- One focused on security implications
- One checking performance impact
- One validating test coverage

Each applies a different filter. The lead synthesizes across all three.

Competing Hypotheses (Scientific Debate)

When root cause is unclear, make teammates adversarial:

Users report the app exits after one message.
Spawn 5 teammates to investigate different hypotheses.
Have them debate and try to disprove each other's theories.
Update the findings doc with consensus.

The theory that survives multiple challenges is more likely correct.

New Modules or Features

Teammates each own a separate piece:

Create a team for the new notification system:
- Frontend components
- Backend services
- Database schema and migrations

Cross-Layer Coordination

Changes spanning frontend, backend, and tests—each owned by a different teammate with the lead coordinating handoffs.

When NOT to Use

Karpathy noted that fully autonomous agents still "get tripped up just a couple of steps into a job." Agent Teams add coordination overhead and use significantly more tokens. Avoid them for:

Sequential tasks where each step depends on the previous
Same-file edits where teammates would conflict
High-dependency work requiring constant context sharing
Simple tasks where coordination overhead exceeds benefit

For these, use subagents instead—they spawn within a single session and report back.

	Subagents	Agent Teams
Context	Results return to caller	Fully independent
Communication	Report to main agent only	Message each other
Coordination	Main agent manages all	Self-coordination via task list
Best for	Focused tasks, only result matters	Complex work needing discussion
Token cost	Lower	Higher

The Compound Effect

Karpathy's vision of Software 3.0 is that "the role of the engineer shifts from direct author to orchestrator." With Agent Teams + Skills + Hooks, this compounds:

Week 1: Single Claude session. Manual coordination. Frequent context switches.

Week 4: Agent Teams for complex work. Teammates inherit shared CLAUDE.md learnings. Task lists capture institutional knowledge.

Week 12: Your project runs like Karpathy's agentic engineering vision—specialized agents, coordinated workflows, quality gates enforced by hooks, learnings persisted across sessions.

The pieces are all here:

Skills package domain knowledge that grows over time
Hooks create verification loops
Agent Teams enable parallel orchestration
CLAUDE.md stores accumulated learnings every teammate inherits

The Bottom Line

Karpathy said agentic engineering is "a serious engineering discipline involving autonomous agents." It's professionally legible. You can say it to your VP of Engineering without embarrassment.

Agent Teams make this concrete:

Vibe coding = YOLO
Agentic engineering = AI does implementation, human owns architecture

The infrastructure is here. Multiple Claude instances. Shared task lists. Direct inter-agent messaging. Quality gates via hooks. Human-in-the-loop plan approval.

The decade of agents starts with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.

Start with research and review. The rest follows.

Resources

How are you using multi-agent orchestration in your workflows? Share your patterns in the comments.

DEV Community