Claude Opus 4.6 vs 4.7: Every Difference Side by Side

#ai #productivity #claudecode #automation

Pricing stays identical at 5 USD per million input and 25 USD per million output, 1 million token context stays, model ID changes from claude-opus-4-6 to claude-opus-4-7
Vision jumped 3x in resolution, from 1.15 megapixels to 3.75 megapixels, with 1 to 1 pixel coordinate mapping for computer use
Effort got a new xhigh slot between high and max, extended thinking budgets are gone, adaptive thinking is off by default
Sampling parameters temperature, top_p, and top_k are completely removed, the new tokenizer uses up to 35 percent more tokens on the same text
Behavior shifts are real, more literal prompts, fewer tool calls, fewer subagents, responses now scale to task complexity
Migrate today, 3 breaking changes will crash 4.6 code the moment you flip the model ID, retirement clock is already running

Anthropic shipped Claude Opus 4.7 yesterday. I have been running it against Opus 4.6 on real workloads for 24 hours, and there is enough changing under the hood that anyone still on 4.6 needs to read the migration carefully. This is the full Opus 4.6 vs 4.7 comparison, every breaking change, every behavior shift, every parameter that got renamed or deleted, and the pricing math that actually matters.

The short version. 4.7 is smarter, sees more, and calls tools less. 4.6 is being retired. If you are running production code on 4.6 today, you have maybe a quarter before you need to migrate anyway, and three of the changes will crash your app the moment you flip the model ID.

Pricing, Context, and Model Family

Start with the thing that did not change, because it sets the tone. Pricing is identical. 5 USD per million input tokens, 25 USD per million output tokens, same 1 million token context window, same availability on the Messages API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry.

The model ID changes from claude-opus-4-6 to claude-opus-4-7. Both run under the same Opus 4 family branding. There is no long context pricing premium on either model (which matters because Gemini and GPT still charge more past certain thresholds).

On Claude Code and the Claude apps, 4.7 is now the default. 4.6 is still selectable for now, but anyone reading memory notes should also note that Anthropic already retired Opus 4 and Sonnet 4 effective June 15 this year. The cadence on model retirements is tightening. Two quarters of production life per model is the new planning horizon.

Vision: 3x Resolution and Clean Coordinates

The biggest underrated upgrade in 4.7 is what the model can actually see. Opus 4.6 accepted images up to 1,568 pixels on the long edge, roughly 1.15 megapixels. Opus 4.7 takes images up to 2,576 pixels on the long edge, roughly 3.75 megapixels. That is 3x more visual data on the same image slot.

For computer use, screenshot analysis, and document understanding, this changes what the model can actually detect. In 4.6 you had to downsample screenshots and lose small UI text. In 4.7 that text is still legible. I tested the same full resolution screenshot of a dense admin dashboard on both models. 4.6 missed three buttons and got one menu label wrong. 4.7 read every element correctly and identified the active tab from a 2 pixel indicator.

The second vision change is coordinate mapping. 4.6 returned image coordinates based on its internal downsampled view, which meant your agent code had to multiply by a scale factor to map back to the original pixel grid. 4.7 returns coordinates that match the input image directly. The math is gone. If you are building computer use agents or screen reading tools, delete the scaling logic.

The tradeoff is token cost. Higher resolution images use more tokens at the same quality setting. If you are sending images where 1.15 megapixels was already enough detail, downsample before upload or you will feel it on the bill.

Effort, Thinking, and the xhigh Slot

The effort parameter changed shape. 4.6 had four effort levels (low, medium, high, max). 4.7 has five (low, medium, high, xhigh, max). The new xhigh slot sits between high and max, giving finer control over the intelligence versus latency tradeoff.

Anthropic's own recommendation is that coding and agentic workloads should start at xhigh instead of high. In my testing on a 15 step code refactor, xhigh produced the same quality output as max but ran about 20 percent faster. High was noticeably faster but missed two edge cases that xhigh caught. The new slot is worth rewriting default configs for.

Extended thinking budgets are gone. This is the biggest breaking change. On 4.6, you could pass thinking: {"type": "enabled", "budget_tokens": 32000} to give the model a reasoning budget. That parameter shape returns a 400 error on 4.7. Only adaptive thinking is supported now, and it is off by default.


# Opus 4.6
thinking = {"type": "enabled", "budget_tokens": 32000}

# Opus 4.7
thinking = {"type": "adaptive"}
output_config = {"effort": "high"}

Adaptive thinking means the model decides how much to think based on the task. You influence the ceiling through the effort parameter instead of a token count. For most use cases this is an improvement because you stop guessing at budgets. For anyone who relied on precise control over thinking tokens, it is a real change in mental model.

Thinking content is also hidden by default on 4.7. Thinking blocks still exist in the response stream, but their content is empty unless you opt in. If your product shows reasoning to users, you need to add "display": "summarized" to the thinking config or users will see a long pause before output starts.

Sampling Parameters: Completely Removed

This one will bite anyone who copy pasted older code. Temperature, top_p, and top_k are not deprecated on 4.7. They are removed. Setting any of them to a non default value returns a 400 error.


# Opus 4.6 (works)
response = client.messages.create(
    model="claude-opus-4-6",
    temperature=0.7,
    top_p=0.9,
    messages=[...]
)

# Opus 4.7 (400 error)
response = client.messages.create(
    model="claude-opus-4-7",
    temperature=0.7,  # error
    top_p=0.9,        # error
    messages=[...]
)

The fix is to delete these parameters entirely. Anthropic's position is that sampling parameters were never a reliable tool for controlling output on Claude (unlike some other model families). If you were using temperature=0 for determinism, that never guaranteed byte identical responses anyway. Use prompting to control style, tone, and verbosity.

The tokenizer also changed. 4.7 uses a new tokenizer that produces 1.0x to 1.35x more tokens on the same text. Code heavy prompts hit the high end of that range, plain English stays near the low end. Two concrete consequences. First, your existing max_tokens values may now cut off responses earlier than they used to, so add headroom. Second, your compaction triggers (if you use them) fire earlier than expected. Use the /v1/messages/count_tokens endpoint to get accurate 4.7 counts before shipping.

Behavior Shifts You Will Notice

Not breaking, but real. 4.7 behaves differently from 4.6 even on prompts that stay identical.

More literal instruction following. 4.7 does exactly what the prompt asks and does not generalize beyond it. 4.6 would often infer requests you did not make (helpful sometimes, annoying others). If your prompts leaned on the model filling in obvious gaps, you need to be more explicit on 4.7.

Responses scale to complexity. 4.6 had a fairly fixed verbosity. 4.7 matches response length to how complex the task actually is. Simple questions get shorter answers, complex analysis gets longer responses. Net effect on my workloads is about 15 percent fewer output tokens on identical prompts, which offsets some of the tokenizer increase on input.

Fewer tool calls. 4.7 reasons more and calls tools less by default. If your agent depends on heavy tool use, raise the effort level. On 4.6 at high effort, my test agent made 11 tool calls to complete a research task. On 4.7 at high effort, it made 7 and finished with the same output.

Fewer subagents. Multi agent systems that relied on 4.6 spawning helper agents see fewer by default on 4.7. You can steer this through prompting if you want parallelism back.

Tone is more direct. 4.7 uses less validation language, fewer emojis, more opinions. If your prompt scaffolding was built to counteract 4.6's warmth, you can probably strip that.

Cybersecurity safeguards. 4.7 actively screens for prohibited security requests. Legitimate security researchers can apply to Anthropic's Cyber Verification Program for expanded access.

Task budgets are also new in 4.7 (beta). You can set a total token count for an entire agentic loop and the model sees a running countdown as it works. Minimum 20,000 tokens, advisory not hard cap, only available through the beta header task-budgets-2026-03-13. 4.6 never had this.

Bottom Line

Opus 4.7 is not a drop in replacement for 4.6. Three parameter shapes (temperature, top_p, extended thinking budgets) return errors the moment you change the model ID without updating the config. The tokenizer counts differently. The effort parameter grew a new slot. Vision grew 3x. Thinking content hides by default.

On the upside, pricing is identical, context stays at 1 million tokens, availability is the same across all four major clouds, and the model is smarter in almost every dimension that matters for production work. My migration from 4.6 to 4.7 on the raxxo.shop stack took about 40 minutes, most of which was chasing stale temperature=0 calls in old utility scripts.

If you are running any production code on claude-opus-4-6, do the migration now. Read the three breaking changes, update your configs, run your prompt regression suite, switch the model ID. The longer you wait, the more stale code accumulates that references parameters that no longer exist. And the retirement clock is already running on 4.6 whether you like it or not.