Claude Opus 4.7 Just Dropped: 87.6% SWE-bench, Breaking API Changes, and the Hidden Cost Increase
Anthropic released Claude Opus 4.7 yesterday (April 16, 2026). The benchmarks are impressive. The breaking changes are aggressive. And the "unchanged pricing" comes with an asterisk most coverage is ignoring.
I've been tracking AI model releases for the past year. Here's the no-BS breakdown.
The Numbers That Matter
| Benchmark | Opus 4.6 | Opus 4.7 | Change |
|---|---|---|---|
| SWE-bench Verified | 80.8% | 87.6% | +6.8 pts |
| SWE-bench Pro | 53.4% | 64.3% | +10.9 pts |
| CursorBench | 58% | 70% | +12 pts |
| GPQA Diamond | 91.3% | 94.2% | +2.9 pts |
| Visual Acuity | 54.5% | 98.5% | +44 pts |
The coding improvements are real. Opus 4.7 now solves 3x more production coding tasks than 4.6. If you use Claude Code or Cursor daily, you'll feel the difference immediately.
Vision went from mediocre to near-perfect. 98.5% visual acuity with 3.75 MP support (3x the previous resolution). Screenshot analysis, document OCR, and computer use just got dramatically better.
How It Stacks Up (April 2026 Frontier Models)
| Model | SWE-bench Verified | SWE-bench Pro | GPQA Diamond | Price (in/out per MTok) |
|---|---|---|---|---|
| Opus 4.7 | 87.6% | 64.3% | 94.2% | $5 / $25 |
| GPT-5.4 | ~83% | 57.7% | 94.4% | $2.50 / $15 |
| Gemini 3.1 Pro | 80.6% | 54.2% | 94.3% | $2 / $12 |
Opus 4.7 leads on coding by a wide margin. General reasoning (GPQA) is a three-way tie. Price-wise, Gemini 3.1 Pro costs 60% less.
The question isn't which model is "best." It's which model is best for your task at your budget.
The Breaking Changes Nobody's Talking About
If you're running Opus 4.6 in production, do not just swap the model ID. Three things will break:
1. Temperature/top_p/top_k → 400 Error
# THIS WILL FAIL ON OPUS 4.7
response = client.messages.create(
model="claude-opus-4-7",
temperature=0.7, # 400 error
top_p=0.9, # 400 error
)
Anthropic removed all sampling parameters. Their guidance: "use prompting to guide behavior." This is a bold move. Every other frontier model still supports temperature.
2. Extended Thinking Budgets → Gone
# BEFORE (will crash)
thinking = {"type": "enabled", "budget_tokens": 32000}
# AFTER (works)
thinking = {"type": "adaptive"}
Adaptive thinking is the only option now. Anthropic says it "reliably outperforms extended thinking" in their evaluations. Maybe. But removing the choice entirely is frustrating for teams that tuned their budget_tokens carefully.
3. Thinking Content Hidden by Default
Streaming now shows a long pause before output begins — thinking happens but you can't see it. Add display: "summarized" to get it back:
thinking = {"type": "adaptive", "display": "summarized"}
The Hidden Cost Increase
Anthropic says "pricing remains the same as Opus 4.6: $5/$25 per MTok."
Technically true. Practically misleading.
Opus 4.7 uses a new tokenizer. The same text now maps to 1.0-1.35x more tokens. Your prompts didn't change. Your bill did.
A prompt that cost $1.00 on Opus 4.6 now costs $1.00-$1.35 on Opus 4.7. At scale, that's a 10-35% effective price increase with no announcement, no changelog entry, just a buried note in the docs.
How to control costs:
Use the
effortparameter. Start withhighinstead ofxhighormax. For most tasks,higheffort on Opus 4.7 still outperforms Opus 4.6 atmax.Use prompt caching. Cached reads are $0.50/MTok — 10x cheaper than standard input.
Route by task. Not every prompt needs a $5/$25 model. Use Opus 4.7 for complex coding and agentic work. Use Gemini 3.1 Pro ($2/$12) or GPT-5.4 Mini ($0.75/$4.50) for simpler tasks.
Use a multi-model gateway. Instead of hardcoding one model, route each request to the best model for that task. One API endpoint, switch models by changing a parameter.
New Features Worth Knowing
Task Budgets (Beta): An advisory token cap across full agentic loops. The model sees a countdown and self-moderates. Useful for controlling runaway agent costs:
output_config={
"effort": "high",
"task_budget": {"type": "tokens", "total": 128000},
}
xhigh Effort Level: New option between high and max. Fine-grained control over the quality-cost tradeoff.
High-Res Vision: 2,576px max (was 1,568px). 1:1 pixel coordinates — no more scale-factor math.
Better Memory: Agents that maintain scratchpads across turns work noticeably better.
The Mythos Question
Anthropic has publicly conceded that Opus 4.7 trails their unreleased Mythos model. Mythos has 10 trillion parameters and is described as more capable across the board.
So why release Opus 4.7 at all? Because Mythos isn't GA (generally available). It's behind safety reviews and access controls. Opus 4.7 is what you can actually use in production today. Think of it as Anthropic's "safe frontier" — the most capable model they're comfortable releasing broadly.
My Recommendation
If you're on Opus 4.6: Upgrade, but plan the migration. The breaking changes are real. Budget a day for testing.
If you're on Sonnet 4.6 ($3/$15): Stay unless you need the coding quality jump. Sonnet handles 90% of tasks fine at 40% lower cost.
If you're optimizing costs: Use Opus 4.7 selectively for hard problems. Route everything else to cheaper models through a unified API gateway — one endpoint gives you access to Opus 4.7, GPT-5.4, Gemini 3.1 Pro, and 150+ models without managing separate integrations.
If you're starting fresh: Don't lock into one provider. The frontier changes every 2-3 months. Build with model flexibility from day one.
What's your experience with Opus 4.7 so far? Drop your benchmarks in the comments — especially if you're seeing different results on real-world tasks vs. the official numbers.
Top comments (1)
What's your experience with Opus 4.7 so far?