Claude Code Advisor: Opus Steering Sonnet Inside a Single API Call
Anthropic shipped the Advisor Strategy on April 9, 2026. It's not a new model. It's a pattern: a fast executor (Haiku/Sonnet) does the work, and at decision points it calls a stronger advisor (Opus) for strategic guidance — all inside a single /v1/messages request, handled server-side.
The numbers are the headline.
The Benchmark Story
SWE-bench Multilingual (real coding tasks):
- Sonnet 4.6 solo: 72.1%
- Sonnet 4.6 + Opus advisor: 74.8% (+2.7pp)
- Cost: 11.9% cheaper (yes, higher quality and cheaper)
BrowseComp (web research):
- Haiku 4.5 solo: 19.7%
- Haiku 4.5 + Opus advisor: 41.2% (2x+)
- Cost: 85% cheaper than Sonnet solo
That last one is the jaw-dropper. You get Haiku latency, Opus judgment, and a price point below Sonnet.
How It Works
The Advisor is server-side. From the client's perspective, it's just another tool in the tools array:
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6", # executor
max_tokens=4096,
tools=[
{
"type": "advisor_20260301",
"model": "claude-opus-4-7", # must be >= executor
"max_uses": 5,
"caching": {"type": "ephemeral", "ttl": "5m"}
}
],
extra_headers={
"anthropic-beta": "advisor-tool-2026-03-01"
},
messages=[{"role": "user", "content": "Refactor the payment module."}]
)
# Per-call token breakdown in usage.iterations[]
for it in response.usage.iterations:
print(it.model, it.input_tokens, it.output_tokens)
What happens internally:
- Executor emits a
server_tool_useblock (name="advisor",input={}) - Anthropic's server forwards current context to the advisor
- Advisor (Opus 4.7) runs extended thinking, returns guidance
- An
advisor_tool_resultblock lands in the stream - Executor continues
The advisor does not call tools. The advisor does not respond to the user. It only whispers strategy to the executor.
When to Call the Advisor
Anthropic's official prompting guide says four moments:
- Before substantive work — before writing, before committing to an interpretation
- When the task feels complete — but persist the deliverable first
- When stuck — same error recurring, approach not converging
- When considering a change of approach
What not to call it for: pure mechanical work, single-turn Q&A, workloads that need top-tier intelligence on every turn.
The Trimming Trick
Add this line to your system prompt:
"The advisor should respond in under 100 words and use enumerated steps, not explanations."
Per Anthropic's own measurements: 35-45% token reduction on advisor calls with no quality degradation. Pair it with max_uses to cap per-request spend.
Claude Code Users: One Command
If you're using Claude Code CLI, skip the SDK entirely:
/advisor
That's it. The toggle flips, and every subsequent agent loop uses the Executor-Advisor pattern.
Error Codes to Know
| Code | Cause | Fix |
|---|---|---|
max_uses_exceeded |
Per-request cap hit | Raise max_uses or tighten executor prompt |
overloaded |
Advisor capacity | Retry with backoff, fallback to Opus 4.6 |
context_mismatch |
You stripped advisor_tool_result blocks |
Keep full assistant response in multi-turn |
invalid_advisor_model |
Advisor weaker than executor | Always use Opus 4.7 as advisor |
Customer Evidence
- Bolt (Eric Simmons, CEO): "Architectural decisions on complex tasks visibly improved. The plans and trajectories are completely different."
- Eve Legal (Anuraj Pandey): "Haiku 4.5 with Opus 4.6 advisory hit frontier quality at 5x less cost on structured document extraction."
The Takeaway for Indie Developers
The model-selection dilemma — "Opus quality but at Haiku prices" — just got structurally solved. If you're running long-horizon coding agents, multi-step research pipelines, or Computer Use flows, this pattern changes your cost curve.
Three lines of code to try it. One CLI command if you're in Claude Code.
Reference: Anthropic Blog — The Advisor Strategy
Top comments (0)