DevToolsPicks

Posted on Apr 16 • Originally published at devtoolpicks.com

Claude Opus 4.7 Just Launched: What Changed and Is It Actually Worth It?

#aicodingtools #developertools #indiehacker #saastools

Originally published at devtoolpicks.com

Anthropic launched Claude Opus 4.7. It is live in Claude.ai for Pro, Max, Team, and Enterprise plans, on the API as claude-opus-4-7, and on AWS Bedrock, Google Vertex AI, and Microsoft Foundry.

Short verdict upfront: if you are on Claude Max and using Opus for coding, switch today. The SWE-bench Pro numbers are a real jump, not marketing. If you are on GPT-5.4 or Gemini 3.1 Pro and happy with your workflow, there is no emergency. Opus 4.7 wins on coding benchmarks but trades blows elsewhere.

The one thing nobody is talking about: the new tokenizer can use up to 35% more tokens for the same text. That matters if you pay per token on the API.

Here is the full breakdown.

Quick Verdict

Model	Best For	Price (input/output per 1M tokens)	Rating
Claude Opus 4.7	Coding, long agentic tasks, vision work	$5 / $25	9.5/10
Claude Opus 4.6	Still fine if you are not pushing limits	$5 / $25	8.5/10
GPT-5.4	General reasoning, some coding tasks	$5 / $25 (similar range)	9/10
Gemini 3.1 Pro	Multilingual, long-context reasoning	Varies by tier	8.5/10

The price on Opus 4.7 is the same as Opus 4.6. That part matters. Anthropic did not charge more for the upgrade, which is unusual for a flagship jump.

What Actually Changed in Opus 4.7

The benchmark chart Anthropic posted tells most of the story. Opus 4.7 jumps meaningfully over Opus 4.6 on every coding and agentic benchmark, and beats GPT-5.4 and Gemini 3.1 Pro on the majority of them.

Coding benchmarks

Benchmark	Opus 4.7	Opus 4.6	GPT-5.4	Gemini 3.1 Pro	Mythos Preview
SWE-bench Pro (agentic coding)	64.3%	53.4%	57.7%	54.2%	77.8%
SWE-bench Verified	87.6%	80.8%	n/a	80.6%	93.9%
Terminal-Bench 2.0	69.4%	65.4%	75.1% (self-reported harness)	68.5%	82.0%

The SWE-bench Pro jump from 53.4% to 64.3% is the headline. That is roughly a 20% relative improvement on the benchmark Anthropic cares about most. For agentic coding work, this is the biggest single-release jump since Opus 4.5.

Terminal-Bench 2.0 is the one place where Opus 4.7 does not lead outright. GPT-5.4 scores higher at 75.1%, but that number uses OpenAI's own harness, not the standard one. Apples to oranges, treat with caution.

Reasoning and other benchmarks

Humanity's Last Exam (multidisciplinary): 46.9% no tools, 54.7% with tools
Agentic search (BrowseComp): 79.3%
Scaled tool use (MCP-Atlas): 77.3%
Agentic computer use (OSWorld Verified): 78.0%
Agentic financial analysis (Finance Agent v1.1): 64.4%
Cybersecurity vulnerability reproduction (CyberGym): 73.1%
Graduate-level reasoning (GPQA Diamond): 94.2%
Visual reasoning (CharXiv Reasoning): 91.0%
Multilingual Q&A (MMMLU): 91.5%

Opus 4.7 leads on most of these. Gemini 3.1 Pro edges ahead on agentic search (85.9%) and ties on some multilingual tasks. Everywhere else, Opus 4.7 is in front.

Vision is the sleeper upgrade

This is the underrated part. Anthropic says Opus 4.7 sees images at more than three times the resolution of Opus 4.6. XBOW's visual-acuity benchmark (used for autonomous penetration testing) jumped from 54.5% on Opus 4.6 to 98.5% on Opus 4.7. That is not incremental. That is a different model for any workflow involving screenshots, UI inspection, PDF processing, or visual document work.

If you are building an app that reads invoices, parses dashboards, or does computer-use automation, this one change may matter more than all the coding numbers combined.

New features in the API

Opus 4.7 ships with three new developer-facing controls worth knowing about:

xhigh effort level. Slots between high and max in the effort parameter. Finer control over the reasoning-vs-latency tradeoff. Use max when you need the absolute best result and do not care about speed. Use xhigh when you need almost that but do not want the latency tax.
Task budgets (beta). Lets Claude prioritize work and manage costs across long runs. Useful for autonomous agents where you would otherwise burn through tokens on a loop.
Adaptive thinking is the default. thinking: {type: "enabled"} with budget_tokens is now deprecated on Opus 4.7. Use thinking: {type: "adaptive"} and let Claude decide how much to think per query.

New features in Claude Code

/ultrareview command. Runs a dedicated review session that flags issues a careful human reviewer would catch. This sits on top of the AI Code Review feature Anthropic shipped earlier this year.
Auto mode for Max users. Reduces interruptions on longer tasks. You set it once, Claude keeps going.
Routines (shipped two days earlier, pairs well with 4.7). Scheduled, API, and GitHub-triggered Claude Code sessions that run on Anthropic's cloud infrastructure. Hand off a bug fix at 2am, come back to a draft PR.

We covered Routines and the desktop redesign in depth in the Claude Code desktop redesign post. If you are trying to make sense of how Skills, MCP connectors, and plugins fit around Opus 4.7, we broke that down in Claude Skills vs MCP Connectors vs Plugins.

The Tokenizer Change Nobody Is Mentioning

Buried in the Opus 4.7 API docs: "Opus 4.7 uses a new tokenizer compared to previous models, contributing to its improved performance on a wide range of tasks. This new tokenizer may use up to 35% more tokens for the same fixed text."

Read that again. Same text, up to 35% more tokens billed.

Opus 4.7 and Opus 4.6 both cost $5 input and $25 output per million tokens. On paper that looks identical. In practice, if your workload lands in the "up to 35% more tokens" range, your real cost per task can rise by roughly a third on the same prompt.

Anthropic says the new tokenizer contributes to the performance improvement, which is believable. More tokens means the model can represent finer-grained concepts internally. But as a developer deciding whether to switch your production workload, you need to test this on your actual prompts before flipping the switch. Do not assume the price tag is the whole price.

For interactive use in Claude.ai on Pro or Max, this does not affect you directly. Your subscription covers it. For API usage in production, benchmark your real workload on both models for a week and compare the actual invoice.

Opus 4.7 vs GPT-5.4

On coding, Opus 4.7 wins. SWE-bench Pro 64.3% to 57.7%, and the Verified number was not published for GPT-5.4 in the Anthropic chart (usually a sign the competitor did not score well).

On general reasoning, it is closer. GPT-5.4 holds up on multi-step agent benchmarks and outscores Opus on self-reported Terminal-Bench. But you have to trust OpenAI's harness for that one.

Honest take: if your primary use case is shipping code, ship on Opus 4.7 or Claude Code. If you are doing general research, writing, brainstorming, or mixed workflows, the choice between GPT-5.4 and Opus 4.7 still comes down to what you are already paying for and which tool you prefer. Neither is definitively better for everything.

We wrote a full breakdown of this tradeoff in ChatGPT Pro $100 vs Claude Max vs Cursor last week. Opus 4.7 strengthens the Claude Max case but does not flip it for everyone.

Opus 4.7 vs Gemini 3.1 Pro

Gemini 3.1 Pro is in a weird spot. It holds its own on multilingual benchmarks and has a strong long-context story. But on the coding work that Claude has historically led in, Opus 4.7 now extends the gap rather than closing it.

If you are already building on Vertex AI and using Gemini for cost reasons, Opus 4.7 is also available on Vertex AI. You can test it side by side in the same console without moving clouds.

If you are building multilingual apps or doing research work where Gemini's long-context handling has served you well, there is no urgent reason to switch. Opus 4.7 is better on most benchmarks, but "better on most benchmarks" and "worth a migration" are different statements.

Who Should Switch to Opus 4.7 Right Now

Switch today:

You are on Claude Max and use Opus heavily for coding. No reason to wait. It is in the model selector. Pick it.
You build agentic workflows with multi-step plans. The SWE-bench Pro jump and the "verifies its own outputs before reporting back" behavior are real. Longer-running tasks are where 4.7 pulls away from 4.6.
You process images, screenshots, PDFs, or do computer-use automation. The 3x resolution vision upgrade is the biggest sleeper change in this release.
You use Claude Code and pay for Max. /ultrareview and Auto mode land today. Worth setting up.

Test before switching:

API users in production. The new tokenizer can eat 35% more tokens. Benchmark on your real prompts first. If your workload is already expensive on Opus 4.6, that 35% could bite.
Teams on Opus 4.6 with stable pipelines. If your prompts are tuned and your outputs are consistent, verify that 4.7 produces the same quality before swapping the model string in production.

Do not bother yet:

You are on Sonnet 4.6 and happy. Sonnet 4.6 is still the price-performance sweet spot for most indie hacker workloads. Opus 4.7 is the flagship, not the default.
You are on the free tier of Claude.ai. Opus 4.7 is Pro, Max, Team, and Enterprise only. Sonnet is what you get on free.
You are on GPT-5.4 for non-coding tasks. If ChatGPT is working for your research, writing, or daily driver use, nothing in this release forces a switch. Revisit when Sonnet 4.8 lands (likely May based on Anthropic's release pattern).

Honest Cons

This is where the DevToolPicks treatment differs from the release-day hype. Every tool has downsides, Opus 4.7 included.

The tokenizer tax. Same $5/$25 rate, but up to 35% more tokens on the same text. For API users, this is a real cost increase disguised as a price match.
Opus is still expensive relative to Sonnet. Most indie hacker workloads should probably still run on Sonnet 4.6 for cost reasons. Opus is the tool you reach for when the task actually requires it.
Mythos overshadows it. Anthropic's benchmark chart includes Mythos Preview, which beats Opus 4.7 on almost every benchmark. Mythos is not publicly available. Seeing a model that is meaningfully better than 4.7 sitting in a closed preview makes 4.7 feel like a midpoint rather than a destination.
Benchmarks are not your workload. SWE-bench Pro is a standardized benchmark. Your codebase is not. The 20% relative jump on SWE-bench Pro does not guarantee a 20% improvement on your actual bugs. Test before you commit.
Prefilling and some older API patterns are deprecated. If you have code built on budget_tokens or thinking: {type: "enabled"}, you need to migrate to adaptive thinking. Small change but not zero.

Pricing and How to Access It

Claude.ai (interactive)

Included in your existing plan:

Pro ($20/month): Opus 4.7 available in the model selector with standard limits
Max ($100-200/month): Higher limits, /ultrareview, Auto mode
Team: Same Opus 4.7 access plus admin features
Enterprise: Same plus compliance and data residency

Free tier does not include Opus 4.7. Sonnet only.

API (`claude-opus-4-7`)

Input: $5 per 1M tokens
Output: $25 per 1M tokens
Prompt caching: Up to 90% cost savings on cache reads
Batch processing: 50% discount on both input and output
Context window: 1M tokens at standard pricing
US-only inference: 1.1x multiplier via the inference_geo parameter

Cloud platforms

AWS Bedrock: Live in us-east-1 and us-west-2. Model ID anthropic.claude-opus-4-7.
Google Vertex AI: Available via base_model: anthropic-claude-opus-4-7. Regional and multi-region endpoints include a 10% premium.
Microsoft Foundry: Available per Anthropic's announcement.

FAQ

Is Opus 4.7 faster than Opus 4.6?

Not necessarily. Opus 4.7 is more thorough and catches more issues during planning. Anthropic says this accelerates overall execution on complex tasks because there are fewer mistakes to fix, but for simple one-shot queries you may see similar or slightly longer response times. If you need raw speed, Opus 4.6 fast mode ($30/$150 per MTok) still exists.

Should I upgrade if I am already paying for Claude Max?

Yes, immediately. It is included. Open the model selector, pick Opus 4.7, done. There is no reason to stay on 4.6 for interactive use if you are on Max.

Is Opus 4.7 better than GPT-5.4 overall?

Better on coding benchmarks. Trades blows on general reasoning. Nobody is definitively better across every task yet. If you write code for a living, Opus 4.7 is the stronger pick today. If you use AI for research or writing, GPT-5.4 is still competitive.

What is Claude Mythos and why is it in the benchmark chart?

Mythos is Anthropic's more powerful unreleased model, held back from public release due to safety and cybersecurity concerns. It is in the chart because it scored higher than Opus 4.7 on several benchmarks, signaling that Anthropic has more capability internally than they are shipping. For now, Opus 4.7 is what you can actually use.

Will my existing Opus 4.6 API code work with 4.7?

Mostly yes. Change claude-opus-4-6 to claude-opus-4-7 and you are most of the way there. Two things to update: migrate from thinking: {type: "enabled"} with budget_tokens to thinking: {type: "adaptive"} with the effort parameter. Prefilling assistant messages is also not supported and will return a 400 error.

When will Sonnet 4.7 launch?

Anthropic's release pattern has Sonnet following Opus by one to four weeks. Internal references to Sonnet 4.8 (not 4.7) have appeared in leaked Claude Code source code, suggesting the next Sonnet release may skip the .7 version entirely and land as 4.8 in May 2026. No official announcement yet.

Final Recommendation

If you have Claude Max: switch to Opus 4.7 in the model selector right now. It is better for coding, better for agentic work, and included in what you are already paying for.

If you use the API in production: test Opus 4.7 on your real prompts for a week before swapping. The tokenizer change is the only part of this release that might cost you money you did not expect. Same price per token, but potentially up to 35% more tokens per task.

If you are on Pro: same answer as Max. Upgrade in the model selector, it is there.

If you are on free: your path to Opus 4.7 is upgrading to Pro at $20/month. Whether that is worth it depends on how much you already use Claude. If you are hitting Sonnet limits weekly, yes. If Sonnet handles everything you throw at it, stay free.

If you are on GPT-5.4 and happy: Opus 4.7 is better for coding on paper, but switching your entire workflow for a benchmark jump is rarely worth it. Wait for Sonnet 4.8 in May, compare price-performance then, and decide.

The real story of this release is not Opus 4.7 beating GPT-5.4 by a few percentage points on SWE-bench. It is that Anthropic shipped a meaningful coding upgrade at the same price as the previous model, in the same week as Claude Code Routines, and with Mythos sitting unreleased in the wings. That is the pattern to watch. Opus 4.7 is where it sits today.

DEV Community

Claude Opus 4.7 Just Launched: What Changed and Is It Actually Worth It?

Quick Verdict