Om Shree

Posted on Apr 17

Everything You Need to Know About Claude Opus 4.7

#ai #claude #discuss #programming

Anthropic dropped Claude Opus 4.7 yesterday. It's a direct upgrade to Opus 4.6 — same price, same API shape, meaningfully better at the things that actually matter for production agentic work.

Here's what changed and what you actually need to know before migrating.

The Core Improvements

Coding and agentic tasks

This is where the biggest gains are. Opus 4.7 is noticeably better on hard, long-running coding problems — the kind where Opus 4.6 would stall, loop, or hand back something half-finished.

Cursor saw a 70% pass rate on their internal benchmark, up from 58% with Opus 4.6. CodeRabbit saw 10%+ recall improvement on difficult PRs. Notion's agent team reported 14% better task completion at fewer tokens and a third of the tool errors. Rakuten's SWE-Bench testing showed Opus 4.7 resolving 3x more production tasks.

What's actually different under the hood: the model is better at verifying its own outputs before reporting back. It catches its own logical faults during planning. It pushes through tool failures that used to stop the previous model cold. For agentic workflows, that consistency matters more than raw benchmark numbers.

Instruction following — with a catch

Opus 4.7 is substantially more literal about following instructions. That sounds straightforwardly good, and it mostly is. But there's a real migration implication: prompts written for earlier Claude models assumed some loose interpretation. Opus 4.7 takes instructions at face value. If your prompt says something ambiguous, you'll get a more literal result than you expected.

Worth auditing your existing prompts before switching over.

Vision: 3x the resolution

Opus 4.7 now accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels. Previous Claude models topped out at about 1.15 megapixels. This is a model-level change — you don't need to change anything in your API calls. Images just get processed at higher fidelity automatically.

What this unlocks in practice: dense screenshots for computer-use agents, complex technical diagrams, chemical structures, any visual work where the detail actually matters. XBOW, which builds autonomous penetration testing tools, saw their visual acuity benchmark go from 54.5% with Opus 4.6 to 98.5%. That's not a marginal improvement — that's a different class of capability.

One note: higher resolution means more tokens consumed. If you don't need the extra fidelity, downsample before sending.

Memory across sessions

Opus 4.7 is better at using filesystem-based memory. It carries notes forward across long multi-session work and uses them to reduce the setup overhead on new tasks. For anyone running multi-day agentic workflows, this is genuinely useful.

New API Features Launching Alongside

xhigh effort level

There's a new effort tier between high and max. The full ladder is now: low → medium → high → xhigh → max. In Claude Code, Anthropic has raised the default to xhigh for all plans.

For coding and agentic use cases, Anthropic recommends starting with high or xhigh. Max effort is there for the hardest problems where you want to throw everything at it.

Task budgets (public beta)

Developers can now set token spend budgets on the API, giving Claude a way to allocate effort across longer runs rather than burning all its compute on early steps. Useful for agentic pipelines where you want the model to prioritize intelligently.

/ultrareview in Claude Code

A new slash command that produces a dedicated review session — reads through your changes and flags bugs and design issues a careful reviewer would catch. Pro and Max users get three free ultrareviews to try it out.

Auto mode extended to Max users

Auto mode lets Claude make tool-use decisions on your behalf, so you can run longer tasks with fewer interruptions. Previously limited, now available to Max plan users.

The Cybersecurity Angle

This one is worth understanding properly.

Last week Anthropic announced Project Glasswing, which assessed AI risks in cybersecurity. They stated they'd keep Claude Mythos Preview limited and test new cyber safeguards on less capable models first.

Opus 4.7 is the first model in that pipeline. Its cyber capabilities are intentionally less advanced than Mythos Preview — Anthropic experimented with selectively reducing these during training. And it ships with automatic safeguards that detect and block prohibited or high-risk cybersecurity requests.

If you do legitimate security work — vulnerability research, penetration testing, red-teaming — there's a new Cyber Verification Program you can apply to join. That gets you access to the capabilities that would otherwise be blocked.

Pricing and Availability

Same as Opus 4.6: $5 per million input tokens, $25 per million output tokens.

Available via Claude.ai, the API (claude-opus-4-7), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

Migration Notes

Two things that affect token usage when moving from Opus 4.6:

First, Opus 4.7 uses an updated tokenizer. The same input can map to roughly 1.0–1.35x more tokens depending on content type. This varies — code and structured text tend toward the higher end.

Second, the model thinks more at higher effort levels, especially on later turns in agentic settings. More output tokens per complex task.

Anthropic's own testing shows the net effect is favorable on coding evaluations, but the right move is to measure it on your actual traffic before committing. They've published a migration guide at platform.claude.com/docs/en/about-claude/models/migration-guide.

Should You Upgrade

For straightforward API usage, yes. Same price, better results across coding, vision, and long-horizon tasks. The tokenizer change means costs may shift slightly but the model is more efficient in how it uses those tokens.

For production agentic pipelines, audit your prompts first. The stricter instruction following is a feature, but it will surface ambiguities in prompts that Opus 4.6 quietly papered over. Fix those before flipping the switch.

I cover Anthropic model releases and agentic AI infrastructure at our Yt channel. MCP Weekly drops every Monday.

Top comments (2)

Archit Mittal • Apr 20

Good overview, Om. The thing I'm watching most closely with each Opus release isn't the headline benchmark numbers — it's the long-context retrieval behavior past 100k tokens. That's where agentic workflows live or die, because real-world coding/ops tasks keep loading more context than you'd expect. Have you run any informal needle-in-a-haystack tests on 4.7 yet? Curious whether the mid-document attention is more stable than 4.5.

Om Shree • Apr 21

Thanks Sir !
Loved your Insights!!!