DEV Community

speedy_devv
speedy_devv

Posted on

Claude Opus 4.7: What Actually Changed for Agentic Coding

Anthropic shipped Opus 4.7 on April 16, 2026. Same $5/$25 pricing as 4.6. Same 1M context. Same 128K output ceiling. Different model entirely when you put it in an agentic coding loop.

If you run Claude Code or build agent pipelines on the Claude API, this post is the migration brief I wish someone had handed me on release day. Benchmarks, the five behavioral shifts that actually matter, the new API features, and the breaking changes you will hit if you just flip the model ID.

The Benchmark Table First

| Benchmark              | Opus 4.6 | Opus 4.7              |
|------------------------|----------|-----------------------|
| CursorBench            | 58%      | 70%                   |
| Rakuten-SWE-Bench      | Baseline | 3x resolution         |
| XBOW visual-acuity     | 54.5%    | 98.5%                 |
| OfficeQA Pro errors    | Baseline | 21% fewer             |
| BigLaw Bench (Harvey)  | Lower    | 90.9% at high effort  |
| Notion Agent errors    | Baseline | 1/3 the errors        |
| Factory Droids         | Baseline | +10-15% task success  |
| Bolt long-running apps | Baseline | +10% best case        |
| CodeRabbit recall      | Baseline | +10%                  |
Enter fullscreen mode Exit fullscreen mode

Rakuten's 3x is the one I keep coming back to. That is real production SWE tasks, not a synthetic eval. CursorBench moving 58 to 70 is what you actually feel in a Claude Code session: fewer rounds per feature.

Five Behavioral Shifts

1. Self-Verification Is New

Prior Claude models did not verify their own assumptions before acting unless you prompted them to. 4.7 does.

Vercel reports the model runs proofs on systems code before starting work. Hex reports it flags missing data instead of inventing plausible-but-wrong fallbacks, and resists dissonant-data traps that 4.6 falls for.

In a Claude Code session this looks like:

Before I write this migration, let me verify the actual shape
of the response object, because my assumption here might be wrong.

[reads file]
[runs grep]

Confirmed. Writing migration now.
Enter fullscreen mode Exit fullscreen mode

You did not ask for that step. The model added it.

2. Long Runs Stay Coherent

Devin reports 4.7 works coherently for hours on hard problems instead of giving up. Genspark measured loop rates: prior models looped indefinitely on roughly 1 in 18 queries, 4.7 posts the highest quality-per-tool-call ratio they have ever measured.

Notion saw tool errors cut to a third of 4.6's rate on multi-step workflows. That is not a small optimization. That is a different failure mode.

3. Fewer Tool Calls, More Thinking

Default behavior shifted. 4.7 thinks more and acts less. Ramp reports less need for step-by-step guidance on cross-tool, cross-codebase debugging.

If you want the old tool-heavy behavior, raise effort:

claude --model claude-opus-4-7 --effort high
# or
claude --model claude-opus-4-7 --effort xhigh
Enter fullscreen mode Exit fullscreen mode

4. Instructions Get Read Literally

Prompts that relied on 4.6 quietly filling in the gaps will hit unexpected behavior on 4.7. The fix is usually shorter and more explicit, not longer.

5. Fewer Subagents by Default

If your orchestrator relied on 4.6 fanning out aggressively to specialists, 4.7 will pull that back. You can still request parallel subagent work explicitly.

The xhigh Effort Tier

Adaptive thinking gained a fifth setting. Ordering is now:

low < medium < high < xhigh < max
Enter fullscreen mode Exit fullscreen mode

xhigh sits between high and max. Claude Code raised the default to xhigh on all plans.

For API work, Anthropic recommends starting at xhigh and dialing down:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    thinking={"type": "adaptive", "effort": "xhigh"},
    messages=[{"role": "user", "content": "Your agentic task here"}]
)
Enter fullscreen mode Exit fullscreen mode

Hex's quote is the cost-math line worth memorizing: "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6." Same quality, lower tier, fewer tokens.

Task Budgets (Public Beta)

Task budgets are the feature I keep recommending first. An advisory cap on a full agentic loop: thinking, tool calls, tool results, final output. The model sees a running countdown and self-paces against it.

Minimum value is 20k tokens. Distinct from max_tokens, which is a hard ceiling the model never sees.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    extra_headers={
        "anthropic-beta": "task-budgets-2026-03-13"
    },
    task_budget=200000,
    thinking={"type": "adaptive", "effort": "xhigh"},
    messages=[{"role": "user", "content": "Build this feature end to end"}]
)
Enter fullscreen mode Exit fullscreen mode

The model can now plan: "I have 200k tokens to spend. I will use 30k on exploration, 90k on writing, 40k on review, and leave a buffer." A hard max_tokens just cuts off mid-thought.

Vision: 3x Resolution

First Claude model with high-resolution image support. Max image resolution went from 1568px / 1.15MP on prior models to 2576px / 3.75MP on 4.7. Roughly 3x the pixel budget.

What changes in practice:

  • Dense screenshots where small text has to be readable
  • Complex diagrams with nested labels
  • High-fidelity design mockups
  • Pointing, measuring, counting, bounding-box localization
  • Coordinates returned are 1-to-1 with real image pixels (no scale conversion)

Cost note: a full-resolution image can consume up to roughly 4,784 tokens, versus roughly 1,600 on prior models. Downsample when you do not need the fidelity.

Breaking Changes in the API

This is the section you have to read before flipping the model ID.

Extended Thinking Is Removed

# 4.6 and earlier (still valid on 4.6)
thinking={"type": "enabled", "budget_tokens": 8192}

# 4.7 (old shape returns 400)
thinking={"type": "adaptive", "effort": "xhigh"}
Enter fullscreen mode Exit fullscreen mode

Adaptive thinking is now off by default on 4.7. A request with no thinking field runs with no thinking at all.

Sampling Parameters Are Removed

Setting temperature, top_p, or top_k to non-default values returns a 400:

# Returns 400 on 4.7
response = client.messages.create(
    model="claude-opus-4-7",
    temperature=0.7,  # <-- remove this
    top_p=0.9,        # <-- remove this
    messages=[...]
)
Enter fullscreen mode Exit fullscreen mode

Drop them. Use prompting to shape behavior.

Thinking Content Is Omitted by Default

Thinking blocks still appear in the response stream but their thinking field is empty unless you opt in:

thinking={
    "type": "adaptive",
    "effort": "xhigh",
    "display": "summarized"
}
Enter fullscreen mode Exit fullscreen mode

If your UI streamed thinking to users on 4.6, the new default reads as a long pause before output begins. Add display: "summarized" back.

Prefill Is Blocked

Prefilling assistant messages returns a 400. Use structured outputs or output_config.format instead. This one carries over from 4.6.

New Tokenizer

The same input can now map to 1.0x to 1.35x the prior token count, content-dependent. Re-run /v1/messages/count_tokens against your representative payloads and re-baseline:

  • max_tokens ceilings
  • Compaction triggers
  • Client-side token estimates
  • Cost projections

Automated Migration

Inside Claude Code, most of the above is handled for you. On the API side:

claude /claude-api migrate
Enter fullscreen mode Exit fullscreen mode

Applies the changes across the codebase.

New Product Features

/ultrareview runs a review session reading through changes and flagging bugs a careful human reviewer would catch. Pro and Max users get three free ultrareviews. I have been using it as a pre-commit gate.

Auto mode for Max. Previously gated to Team and Enterprise. Now extends to Max subscribers. Longer runs with fewer interruptions, and with less risk than fully skipping permissions.

File-system memory improvements. Agents keeping a scratchpad, notes file, or structured memory store across turns make cleaner notes and actually leverage them on later tasks.

Pricing Did Not Move

Input:        $5.00 / 1M tokens
Output:      $25.00 / 1M tokens
Prompt cache: up to 90% savings
Batch:        50% savings
Enter fullscreen mode Exit fullscreen mode

Two things will nudge your bill up:

  1. New tokenizer (up to 1.35x input token count on some content)
  2. High-resolution images (up to ~4,784 tokens each)

Task budgets and the effort parameter are how you trade back.

On GitHub Copilot, 4.7 went GA the same day on Pro+, Business, and Enterprise, carrying a 7.5x premium request multiplier through April 30 as promotional pricing.

Switching Claude Code to 4.7

# Set as default
claude config set model claude-opus-4-7

# Or override per session
claude --model claude-opus-4-7
Enter fullscreen mode Exit fullscreen mode

The opus alias in Claude Code points at it on current releases. Available on claude.ai, Claude Platform API, AWS Bedrock (research preview), Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot.

When to Reach for It

4.7 when reasoning depth, long agent runs, or high-resolution screenshots are what the job needs. Sonnet 4.6 is still the right call for smaller, faster work where speed and cost decide the tradeoff.

Full breakdown on the webapp: buildthisnow.com/blog/models/claude-opus-4-7

Build your own agentic SaaS with Claude Code in 48 hours: buildthisnow.com

Top comments (0)