정상록

Posted on Apr 17

Claude Opus 4.7 is out — what actually changed for developers

TL;DR

Anthropic dropped Claude Opus 4.7 yesterday (April 16, 2026). Price is identical to 4.6 ($5/$25 per M tokens) but SWE-bench Pro rose from 53.4% → 64.3% — actually passing GPT-5.4 (57.7%). The practical changes that matter: Claude Code now defaults to a new xhigh effort tier automatically, a /ultrareview slash command runs dedicated code review sessions, and the new tokenizer inflates the same input by 1.0–1.35x tokens. You probably need to bump max_tokens to 64k+ and retest prompts you wrote for 4.6.

What Anthropic actually shipped

Model ID:      claude-opus-4-7
Release:       2026-04-16 (GA)
Price:         $5/M input, $25/M output (unchanged from 4.6)
Availability:  Claude API · Amazon Bedrock · GCP Vertex AI · Microsoft Foundry

If you're used to thinking of Opus releases as marginal, 4.7 is not that kind of release.

Benchmark deltas that aren't just noise

                        Opus 4.6    Opus 4.7    Delta
SWE-bench Verified        —          87.6%       —
SWE-bench Pro            53.4%       64.3%      +10.9pp
CursorBench              58%         70%        +12pp
XBOW visual accuracy     54.5%       98.5%      +44pp
Databricks OfficeQA Pro   —          err -21%    —
Rakuten SWE-Bench         —           3x tasks    —

The one I'd focus on is XBOW. Vision accuracy going from 54.5% to 98.5% isn't benchmark farming — it's the difference between "computer-use agents are a demo" and "I can actually ship one."

The xhigh effort tier (and why Claude Code just got better for free)

Effort levels used to be low | medium | high | max. 4.7 adds xhigh between high and max.

# Anthropic SDK
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,  # bump from 32k — xhigh produces longer reasoning
    thinking={"type": "enabled", "budget_tokens": 30000},
    effort="xhigh",  # new tier
    messages=[{"role": "user", "content": "..."}]
)

The invisible change: Claude Code defaulted every plan to xhigh automatically. You don't set anything. Quality went up, nothing in your CLI config changes.

/ultrareview — a review session that actually pushes back

/ultrareview

This runs a dedicated review pass that flags bugs and design issues. Pro/Max users get 3 free sessions per month.

Vercel's team noted after testing: "new behavior of starting from proofs for systems code." In practice it's more willing to say "this invariant isn't being preserved here" than prior Claude versions, which tended toward agreement.

Migration: things that will bite you

1. Token usage goes up — sometimes a lot

New tokenizer. Same input now uses 1.0x–1.35x tokens depending on content type. Code-heavy inputs trend toward the high end.

# Before (4.6)
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=32000,
    messages=messages
)

# After (4.7) — needs headroom
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,  # xhigh reasoning expands late-turn output
    messages=messages
)

Anthropic claims efficiency improved at all effort levels (shorter output for same quality). My anecdotal tests back this up for coding tasks but it varies. Don't assume, measure.

2. Instruction-following is now strict

This is the hidden gotcha. 4.7 executes prompts literally. Implicit assumptions you leaned on with 4.6 — "obviously Claude will skip step X if condition Y" — no longer hold.

# A prompt that used to "just work"
"Refactor this function and add tests if needed."

# 4.7 behavior: always adds tests, even when not needed
# 4.6 behavior: would judge whether tests were warranted

Anthropic's own migration guide recommends re-tuning prompts and agentic harnesses. Plan the time.

3. Task Budget (public beta) — let the model see its own budget

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,
    task_budget={"remaining_tokens": 50000},  # beta
    messages=messages
)

You tell Claude how many tokens are left. It prioritizes and gracefully winds down as the budget drops. Actually useful for long agentic loops that used to die mid-step.

Safety direction: intentionally narrower

Worth noting because it's unusual: 4.7 is the first Anthropic model where cyber capabilities were deliberately scaled back. A new Cyber Verification Program exists for legitimate security researchers (vuln research, pentesting, red teaming) to regain expanded access through a vetting process.

If you're in security and hit capability walls you didn't see in 4.6, that's why.

What partners are saying (filter for patterns, not hype)

The repeated keywords across partner quotes are what matter:

"stays coherent for hours" (Cognition/Devin)
"doesn't give up on hard problems" (Cognition)
"passes implicit-requirement tests" (Notion Agent)
"starts from proofs for systems code" (Vercel)
"loop resistance, consistency, graceful error recovery" (Genspark)

Translation: the model doesn't just benchmark better, it behaves differently in long-running agentic setups.

Tonight's checklist

pip install -U anthropic (or update your Claude Code client)
Run one existing prompt against claude-opus-4-7 and diff the output
Bump max_tokens in every API call to 64k
Try /ultrareview on a PR you're not sure about
Before rolling 4.7 into a production agent: budget 1-2 hours for harness re-tuning

FAQ

Is there a free tier?

No. API pricing is identical to 4.6 ($5/M input, $25/M output). Claude Pro and Max plans get Claude Code access including the new xhigh default.

Do I need to change my model ID?

Yes. claude-opus-4-7. The old claude-opus-4-6 ID still works during the deprecation window.

Is Mythos Preview the same thing?

No. Mythos Preview is an unreleased Anthropic model available only via limited preview. Opus 4.7 is the strongest generally available model.

Will 4.7 break my existing Claude Code setup?

Almost certainly not — Claude Code handles the transition. What may feel different is faster quality improvement because xhigh is on by default. Your prompts may still need re-tuning.

DEV Community