The Problem with Traditional Agent Architecture
Most AI agent systems follow the same pattern: a powerful model (like Opus) orchestrates everything — reading context, planning tasks, distributing work, and synthesizing results. The smaller models just execute subtasks.
This works, but it's expensive. The orchestrator consumes premium tokens even for trivial decisions.
Enter the Advisor Strategy
Released on April 9, 2026, Anthropic's Advisor Strategy (advisor_20260301) flips this pattern:
- Executor (Sonnet or Haiku): Drives the entire task end-to-end
- Advisor (Opus): Only intervenes when the executor encounters hard decisions
- Auto-escalation: The executor model automatically decides when to call the advisor
The advisor generates short plans — typically 400-700 tokens — instead of processing the full context. Think of it like a junior developer who works independently and only pings the senior architect when truly stuck.
Benchmark Results
| Benchmark | Combination | Performance | Cost |
|---|---|---|---|
| SWE-bench Multilingual | Sonnet + Advisor | +2.7pp vs Sonnet solo | -11.9% |
| BrowseComp | Haiku + Advisor | 41.2% (vs 19.7% solo) | — |
| Cost extreme | Haiku + Advisor | -29% vs Sonnet solo | -85% |
The Haiku + Advisor combination is the standout for batch workloads. You lose 29% of Sonnet-solo quality but save 85% on cost.
Implementation
It's a single tool addition to the Messages API:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6", # executor
max_tokens=16384,
tools=[
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6", # advisor
"max_uses": 3, # limit calls
},
# your existing tools work alongside
],
messages=[{"role": "user", "content": "Complex task..."}],
betas=["advisor-tool-2026-03-01"],
)
Key points:
- No extra roundtrips: Handoff happens within a single API request
- Auto-decision: The executor decides when to call the advisor
- Cost tracking: Advisor tokens are reported separately in the usage block
- max_uses: Cap the number of advisor calls for predictable costs
Best Use Cases
Good fit:
- Coding agents (Sonnet writes code, Opus reviews architecture)
- Multi-step research pipelines
- Bulk document extraction (Haiku processes, Opus designs structure)
Not a fit:
- Single-turn Q&A (nothing to plan = no advisor needed)
Real-World Feedback
- Bolt CEO: "Architecture decisions are night-and-day on complex tasks, zero overhead on simple ones"
- Eve Legal ML Engineer: "Frontier model quality at 5x lower cost" using Haiku + Opus
Getting Started (3 Steps)
- Add beta header:
anthropic-beta: advisor-tool-2026-03-01 - Add
advisor_20260301to your tools array - Tune
max_usesbased on task complexity (start with 2-3)
The cost war in AI agents is officially on. What's your current approach to model routing and cost optimization?
Top comments (0)