Claude Advisor Strategy: Cut AI Agent Costs by 85% While Improving Performance

The Problem with Traditional Agent Architecture

Most AI agent systems follow the same pattern: a powerful model (like Opus) orchestrates everything — reading context, planning tasks, distributing work, and synthesizing results. The smaller models just execute subtasks.

This works, but it's expensive. The orchestrator consumes premium tokens even for trivial decisions.

Enter the Advisor Strategy

Released on April 9, 2026, Anthropic's Advisor Strategy (advisor_20260301) flips this pattern:

Executor (Sonnet or Haiku): Drives the entire task end-to-end
Advisor (Opus): Only intervenes when the executor encounters hard decisions
Auto-escalation: The executor model automatically decides when to call the advisor

The advisor generates short plans — typically 400-700 tokens — instead of processing the full context. Think of it like a junior developer who works independently and only pings the senior architect when truly stuck.

Benchmark Results

Benchmark	Combination	Performance	Cost
SWE-bench Multilingual	Sonnet + Advisor	+2.7pp vs Sonnet solo	-11.9%
BrowseComp	Haiku + Advisor	41.2% (vs 19.7% solo)	—
Cost extreme	Haiku + Advisor	-29% vs Sonnet solo	-85%

The Haiku + Advisor combination is the standout for batch workloads. You lose 29% of Sonnet-solo quality but save 85% on cost.

Implementation

It's a single tool addition to the Messages API:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",  # executor
    max_tokens=16384,
    tools=[
        {
            "type": "advisor_20260301",
            "name": "advisor",
            "model": "claude-opus-4-6",  # advisor
            "max_uses": 3,  # limit calls
        },
        # your existing tools work alongside
    ],
    messages=[{"role": "user", "content": "Complex task..."}],
    betas=["advisor-tool-2026-03-01"],
)

Key points:

No extra roundtrips: Handoff happens within a single API request
Auto-decision: The executor decides when to call the advisor
Cost tracking: Advisor tokens are reported separately in the usage block
max_uses: Cap the number of advisor calls for predictable costs

Best Use Cases

Good fit:

Coding agents (Sonnet writes code, Opus reviews architecture)
Multi-step research pipelines
Bulk document extraction (Haiku processes, Opus designs structure)

Not a fit:

Single-turn Q&A (nothing to plan = no advisor needed)

Real-World Feedback

Bolt CEO: "Architecture decisions are night-and-day on complex tasks, zero overhead on simple ones"
Eve Legal ML Engineer: "Frontier model quality at 5x lower cost" using Haiku + Opus