Anthropic's Advisor Strategy: Opus Intelligence at Sonnet Prices

#ai #productivity #claudecode #automation

Anthropic's Advisor Strategy lets Sonnet or Haiku consult Opus mid-task, keeping near-Opus quality at executor-level pricing
Sonnet plus Opus advisor improved 2.7 points on SWE-bench Multilingual while cutting per-task cost by 11.9%
Haiku plus Opus advisor more than doubled BrowseComp scores from 19.7% to 41.2% while costing 85% less than Sonnet
One API call handles everything with zero orchestration, the executor invokes the advisor like any other tool
Best for agentic workflows where most turns are routine but a few critical decisions need frontier-level reasoning

Running Claude Opus on every API call is like hiring a senior architect to hang drywall. You get perfect walls, but you also get a massive invoice. Anthropic just shipped a fix for that problem, and it changes the economics of building with Claude.

The Advisor Strategy, now in public beta, lets a cheaper executor model (Sonnet or Haiku) call on Opus for guidance only when it actually needs help. Everything else runs at the lower rate. The result: near-Opus intelligence at a fraction of the cost.

How the Advisor Strategy Works

The concept is straightforward. You designate Sonnet or Haiku as your executor, the model that handles the full task end to end. You add Opus as the advisor. The executor runs the conversation, calls tools, processes results, and generates output. When it hits a decision it cannot confidently make alone, it pauses and consults the advisor.

The advisor reads the full conversation transcript. Every message, every tool call, every result. It produces a short plan or correction, typically 400 to 700 tokens, then hands control back to the executor. The executor continues with the advice baked into its context.

Here is what makes this different from a simple "ask the smart model" pattern: the advisor never calls tools. It never produces user-facing output. It only provides strategic guidance. The executor stays in control the entire time, which means you do not need orchestration logic, task decomposition, or worker pools. One model does the work. The other model thinks about the work.

All of this happens inside a single API request. No extra round trips. No state management on your side. The executor emits a server_tool_use block, Anthropic runs a separate inference pass on Opus server-side, and the result flows back as an advisor_tool_result block. Your application sees one request, one response.


response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=[{
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-6",
        "max_uses": 3
    }],
    messages=[{"role": "user", "content": "Your task here"}]
)

Three parameters. That is the entire configuration. type and name are fixed strings, model picks your advisor, and max_uses caps how many times the executor can ask for help per request.

The supported pairings right now: Sonnet 4.6 as executor with Opus 4.6 as advisor, Haiku 4.5 with Opus 4.6, or even Opus 4.6 with Opus 4.6 (useful when you want a second opinion within the same request). The advisor must be at least as capable as the executor. Invalid pairs return a clear error.

The Numbers That Actually Matter

Benchmarks are noisy, but these tell a clear story.

Sonnet with Opus advisor scored 2.7 percentage points higher on SWE-bench Multilingual compared to Sonnet alone. That is a meaningful improvement on a benchmark designed to test real-world coding across multiple languages. The cost per agentic task dropped 11.9% at the same time. Better results, lower bill.

The Haiku numbers are more dramatic. BrowseComp, a benchmark for web research and browsing tasks, jumped from 19.7% (Haiku solo) to 41.2% (Haiku plus Opus advisor). That is more than double the performance. The cost? 85% less than running Sonnet solo for the same tasks.

Think about what that means for high-volume applications. If you process thousands of research queries per day, Haiku plus advisor gives you 2x the quality at roughly one-sixth the price of the model you were probably using. The advisor typically generates 1,400 to 1,800 total tokens (including thinking), so even at Opus rates, each consultation adds a predictable and small cost.

Terminal-Bench 2.0 and BrowseComp2 showed similar patterns. The advisor consistently lifts the executor's performance on tasks that require multi-step reasoning while barely touching the bill for straightforward turns. The model gets smarter only when it asks to get smarter.

The billing is clean too. Executor tokens bill at executor rates. Advisor tokens bill at Opus rates. The API response includes a detailed usage.iterations array that breaks down exactly which tokens came from which model. No guessing what you are paying for.

When This Pattern Wins (and When It Doesn't)

The advisor pattern shines in long-horizon agentic workloads. Coding agents that read files, run tests, and iterate on solutions. Research pipelines that browse the web and synthesize information. Multi-step workflows where 90% of the turns are mechanical but the remaining 10% determine whether the whole thing succeeds or fails.

Anthropic's own testing found the biggest gains when the executor calls the advisor early (after initial exploration but before committing to an approach) and late (after writing code or producing output, before declaring the task done). Two to three advisor calls per task seems to be the sweet spot.

The advisor is a weaker fit for single-turn question answering. If every request needs Opus-level reasoning, just use Opus. There is no savings from adding a middleman when the middleman gets called every turn. Same for very short tasks where the next action is obvious from context. The advisor adds value when there is a real decision to make.

One subtlety worth knowing: the advisor does not stream. When the executor calls for advice, the response stream pauses while Opus runs its inference pass. For interactive chat applications where latency matters on every token, this pause could be noticeable. For background agentic tasks running autonomously, it is completely irrelevant. Your agent does not care about a two-second pause if it saves you 60% on compute.

There is also a caching layer that matters for longer conversations. Enable prompt caching on the advisor with a simple config flag, and the second and third advisor calls in a conversation reuse the transcript prefix from the first call. That slashes advisor input costs for multi-turn agent loops. The break-even is roughly three advisor calls per conversation. Below that, caching overhead exceeds savings. Above it, you save progressively more per call.

The max_uses parameter lets you hard-cap cost per request. Set it to 3, and the executor gets three consultations before the advisor starts returning errors. The executor handles those errors gracefully and continues without further advice. For conversation-level caps, you track calls client-side and remove the advisor tool from the tools array when you hit your budget.

What This Changes for AI Application Architecture

Before the advisor pattern, you had three choices for building AI-powered tools. Run a cheap model and accept lower quality. Run an expensive model and eat the cost. Or build a complex routing layer that sends different requests to different models based on heuristics you have to maintain.

The advisor pattern is a fourth option: let the cheap model decide when it needs help. This is fundamentally different from model routing because the decision happens mid-generation, not before it. The executor has context that no external router could have. It knows what tools it already called, what results it got, and where it feels uncertain. That makes the advisor-calling decision far more accurate than any prompt-classification approach.

For teams building agents, this means you can start with Sonnet plus advisor as your default and only switch to full Opus for tasks where benchmarks show the advisor pattern is not enough. That is a much easier engineering decision than trying to predict task difficulty up front.

The caching layer is worth noting too. If you enable prompt caching on the advisor (a simple config flag), the advisor's second and third calls in a conversation reuse the transcript prefix from the first call. That cuts advisor input costs significantly for longer conversations. The break-even point is roughly three advisor calls per conversation. Below that, caching costs more than it saves. Above it, you save progressively more per call.

Building With It Today

The advisor tool is in public beta with the advisor-tool-2026-03-01 beta header. It works with every SDK (Python, TypeScript, Go, Ruby, PHP, C#) and composes with existing tools. You can run web search, code execution, and custom tools alongside the advisor in the same request.

For coding agents, Anthropic recommends a specific system prompt pattern: tell the executor to call the advisor before substantive work, after producing output, and whenever it feels stuck. Their internal testing showed this pattern produces the highest intelligence-to-cost ratio.

The practical advice for prompting the advisor itself: keep instructions short. A single line telling the advisor to respond in under 100 words with enumerated steps cut advisor output tokens by 35 to 45% in Anthropic's testing without changing call frequency or advice quality. Fewer tokens from the advisor means lower cost with no quality loss.

Pair the advisor with medium effort settings on the executor for maximum savings, or default effort for maximum intelligence. Both configurations outperform the executor running alone.

There is a prompting trick from Anthropic's internal testing worth stealing. Adding a single line to the system prompt telling the advisor to respond in under 100 words with enumerated steps cut total advisor output tokens by 35 to 45% without changing advice quality. Fewer tokens from the advisor means lower cost, and the executor does not need a five-paragraph essay to know what to do next. It needs a numbered plan.

For teams migrating from full Opus deployments, the transition path is clean. Swap the model parameter from Opus to Sonnet, add the advisor tool definition, and run your eval suite. In most cases you will see equivalent or better results at lower cost. For the minority of tasks where the advisor pattern underperforms, keep those on direct Opus and route everything else through the advisor pattern.

If you are building agentic applications and your Opus bills make you wince, this is the pattern you have been waiting for. One API change, no architecture overhaul, and your costs drop while your quality holds or improves. That is the kind of upgrade that makes the accountants and the engineers happy at the same time.