TL;DR
Anthropic dropped Claude Opus 4.7 yesterday (April 16, 2026). Price is identical to 4.6 ($5/$25 per M tokens) but SWE-bench Pro rose from 53.4% → 64.3% — actually passing GPT-5.4 (57.7%). The practical changes that matter: Claude Code now defaults to a new xhigh effort tier automatically, a /ultrareview slash command runs dedicated code review sessions, and the new tokenizer inflates the same input by 1.0–1.35x tokens. You probably need to bump max_tokens to 64k+ and retest prompts you wrote for 4.6.
What Anthropic actually shipped
Model ID: claude-opus-4-7
Release: 2026-04-16 (GA)
Price: $5/M input, $25/M output (unchanged from 4.6)
Availability: Claude API · Amazon Bedrock · GCP Vertex AI · Microsoft Foundry
If you're used to thinking of Opus releases as marginal, 4.7 is not that kind of release.
Benchmark deltas that aren't just noise
Opus 4.6 Opus 4.7 Delta
SWE-bench Verified — 87.6% —
SWE-bench Pro 53.4% 64.3% +10.9pp
CursorBench 58% 70% +12pp
XBOW visual accuracy 54.5% 98.5% +44pp
Databricks OfficeQA Pro — err -21% —
Rakuten SWE-Bench — 3x tasks —
The one I'd focus on is XBOW. Vision accuracy going from 54.5% to 98.5% isn't benchmark farming — it's the difference between "computer-use agents are a demo" and "I can actually ship one."
The xhigh effort tier (and why Claude Code just got better for free)
Effort levels used to be low | medium | high | max. 4.7 adds xhigh between high and max.
# Anthropic SDK
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=64000, # bump from 32k — xhigh produces longer reasoning
thinking={"type": "enabled", "budget_tokens": 30000},
effort="xhigh", # new tier
messages=[{"role": "user", "content": "..."}]
)
The invisible change: Claude Code defaulted every plan to xhigh automatically. You don't set anything. Quality went up, nothing in your CLI config changes.
/ultrareview — a review session that actually pushes back
/ultrareview
This runs a dedicated review pass that flags bugs and design issues. Pro/Max users get 3 free sessions per month.
Vercel's team noted after testing: "new behavior of starting from proofs for systems code." In practice it's more willing to say "this invariant isn't being preserved here" than prior Claude versions, which tended toward agreement.
Migration: things that will bite you
1. Token usage goes up — sometimes a lot
New tokenizer. Same input now uses 1.0x–1.35x tokens depending on content type. Code-heavy inputs trend toward the high end.
# Before (4.6)
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=32000,
messages=messages
)
# After (4.7) — needs headroom
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=64000, # xhigh reasoning expands late-turn output
messages=messages
)
Anthropic claims efficiency improved at all effort levels (shorter output for same quality). My anecdotal tests back this up for coding tasks but it varies. Don't assume, measure.
2. Instruction-following is now strict
This is the hidden gotcha. 4.7 executes prompts literally. Implicit assumptions you leaned on with 4.6 — "obviously Claude will skip step X if condition Y" — no longer hold.
# A prompt that used to "just work"
"Refactor this function and add tests if needed."
# 4.7 behavior: always adds tests, even when not needed
# 4.6 behavior: would judge whether tests were warranted
Anthropic's own migration guide recommends re-tuning prompts and agentic harnesses. Plan the time.
3. Task Budget (public beta) — let the model see its own budget
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=64000,
task_budget={"remaining_tokens": 50000}, # beta
messages=messages
)
You tell Claude how many tokens are left. It prioritizes and gracefully winds down as the budget drops. Actually useful for long agentic loops that used to die mid-step.
Safety direction: intentionally narrower
Worth noting because it's unusual: 4.7 is the first Anthropic model where cyber capabilities were deliberately scaled back. A new Cyber Verification Program exists for legitimate security researchers (vuln research, pentesting, red teaming) to regain expanded access through a vetting process.
If you're in security and hit capability walls you didn't see in 4.6, that's why.
What partners are saying (filter for patterns, not hype)
The repeated keywords across partner quotes are what matter:
- "stays coherent for hours" (Cognition/Devin)
- "doesn't give up on hard problems" (Cognition)
- "passes implicit-requirement tests" (Notion Agent)
- "starts from proofs for systems code" (Vercel)
- "loop resistance, consistency, graceful error recovery" (Genspark)
Translation: the model doesn't just benchmark better, it behaves differently in long-running agentic setups.
Tonight's checklist
-
pip install -U anthropic(or update your Claude Code client) - Run one existing prompt against
claude-opus-4-7and diff the output - Bump
max_tokensin every API call to 64k - Try
/ultrareviewon a PR you're not sure about - Before rolling 4.7 into a production agent: budget 1-2 hours for harness re-tuning
FAQ
Is there a free tier?
No. API pricing is identical to 4.6 ($5/M input, $25/M output). Claude Pro and Max plans get Claude Code access including the new xhigh default.
Do I need to change my model ID?
Yes. claude-opus-4-7. The old claude-opus-4-6 ID still works during the deprecation window.
Is Mythos Preview the same thing?
No. Mythos Preview is an unreleased Anthropic model available only via limited preview. Opus 4.7 is the strongest generally available model.
Will 4.7 break my existing Claude Code setup?
Almost certainly not — Claude Code handles the transition. What may feel different is faster quality improvement because xhigh is on by default. Your prompts may still need re-tuning.
Top comments (0)