Claude Opus 4.7 vs 4.6

#claude #claudeopus #ai #developertools

The price page hasn't moved. Five dollars per million input. Twenty-five per million output. Identical to Opus 4.6.

Run the same Claude Code call on both, though, and 4.7 hands you a bigger bill. A representative agent call with 25K of context and 4K of output went from $0.225 on 4.6 to $0.2805 on 4.7. That's a 24.7 percent jump for the exact same job, the exact same prompt, the exact same code shipped.

Nothing in the rate card explains it. The new tokenizer does. And one day before this writeup, Claude Code v2.1.142 made 4.7 the fast-mode default. So your next session is paying the new rate unless you go opt back.

The Tokenizer Is the Story

Simon Willison ran one system prompt through Anthropic's own count_tokens API on both models. Opus 4.6 returned 5,039 tokens. Opus 4.7 returned 7,335. That's a 1.46x ratio on plain English text, and it sits eleven points above the 1.0 to 1.35x ceiling Anthropic published themselves.

OpenRouter then ran the comparison across one million real production requests. Their finding: 12 to 27 percent more billed tokens on typical 2K to 25K prompts after prompt caching. Without caching, 32 to 45 percent more.

Same English in. More tokens counted. Same per-token rate. Bigger bill out.

What the Worked Math Looks Like

A standard Claude Code build call. The agent reads roughly 25K tokens of context (your repo plus a spec) and writes roughly 4K tokens of code.

On Opus 4.6:

input:  25,000 tokens × $5/1M  = $0.125
output:  4,000 tokens × $25/1M = $0.100
total                          = $0.225

On Opus 4.7, the same English source gets recounted by the new tokenizer. OpenRouter measured native input inflation at 1.34x for 25K to 50K prompts, and 13 to 30 percent longer completions at long context. Apply both:

input:  25,000 × 1.34 = 33,500 tokens × $5/1M  = $0.1675
output:  4,000 × 1.13 =  4,520 tokens × $25/1M = $0.1130
total                                          = $0.2805

A 24.7 percent effective increase. Nothing in the menu warned you. The tokenizer ate your margin.

Run that across a working day. A solo founder pushing 200 agent calls daily moves from $45 to $56. About $330 a month in spend you didn't opt into.

Two cases flip. Long-context calls at 128K and up get prompt caching to absorb roughly 93 percent of the inflation, so the gap shrinks to about 15 percent. Short calls under 2K tokens actually get cheaper by 1.6 percent, because completions on 4.7 are about 62 percent shorter at that size.

The mid range, where most agent calls sit, is where it hurts.

The kicker: prompt caching has to be on for the 12 to 27 percent figure to hold. Without it you're staring at 32 to 45 percent more billed tokens for the same English text. A lot of indie setups don't have caching wired up because the docs treat it as a power-user feature. On 4.7 it's not optional anymore. It's the difference between a quiet price hike and a loud one.

The Default Flip Nobody Warned You About

Claude Code v2.1.142 shipped on 2026-05-14. The changelog buries the change two bullets deep: fast mode now runs on Opus 4.7 by default. Every /fast you typed yesterday on 4.6 fires on 4.7 today.

Fast mode is six times the per-token price of standard mode. Multiply that by the new tokenizer ratio on long English prompts and a fast call now bills closer to seven and a half times the cost of standard 4.6, depending on prompt size.

You can pin it back. Anthropic shipped a new env var alongside the default flip:

# Pin Claude Code fast mode to Opus 4.6 (not the new 4.7 default).
# Active as of Claude Code v2.1.142 (2026-05-14).
export CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1

# Verify:
claude --version   # expect 2.1.142 or later
claude /fast       # confirm fast mode is on
# Fast mode now runs on Opus 4.6 regardless of any other env var.

A few notes on the flag. It takes precedence over CLAUDE_CODE_ENABLE_OPUS_4_7_FAST_MODE if both are set. Drop it in ~/.zshrc for global stickiness, or your project .env for per-repo control. Confirm it still ships in your version with:

claude --help | grep OPUS_4_6

Anthropic typically keeps these flags around for six to twelve months after a default flip.

Where 4.7 Earns Its Keep

The benchmark gains are real, and they cluster.

SWE-bench Pro climbed +10.9 points (53.4 to 64.3). That's the largest jump on the table for hard, multi-file agent coding work. CharXiv (no tools) gained +13.0 points (69.1 to 82.1), so chart and figure parsing got a real upgrade. SWE-bench Verified, Terminal-Bench 2.0, GPQA Diamond, OSWorld-Verified all moved up too.

One regression stands out. BrowseComp, the agentic web research benchmark, dropped from 83.7 to 79.3 percent. That's 4.4 points worse on 4.7 than on 4.6.

Stay on 4.7 when the call profile matches at least one of these:

Hard SWE work where the +10.9 SWE-bench Pro lift compounds across long tasks
High-resolution images, charts, or PDFs where +13.0 CharXiv shows up in your output
Prompts at 128K and up where caching captures most of the tokenizer inflation
Plan-then-execute agents where one stronger plan saves five weaker build cycles

When to Pin Back to 4.6

Three workloads regress or break even on 4.7 once you account for cost. Override fast mode back to 4.6 when:

Prompts sit between 2K and 25K and you don't use prompt caching. This is the worst tokenizer cost zone.
The job involves agentic web research. BrowseComp dropped 4.4 points.
You're on a Claude Max plan and your session caps now burn down 1.3 to 3 times faster than they did last month.

Evaluator and judgment work doesn't need the SWE-bench Pro lift. Linters and doc writers don't gain from CharXiv. Those agents produce identical output on 4.6 at a real discount.

Per-Agent Model Selection Is the Real Move

Most teams pick one model per project. The 4.7 cost shift makes per-agent selection the better default.

A planner reads a spec, decides the file layout, picks the build order. SWE-bench Pro +10.9 lands here. Keep planners on 4.7.

A backend agent or designer writes code. Same SWE-bench Pro lift. Keep them on 4.7.

An evaluator reads code and finds gaps. Judgment work. Pin to 4.6 with the override flag and pocket the difference.

A linter, a formatter, a doc writer. Mechanical work. Pin to 4.6.

A test agent runs the test plan and reports back. Mechanical too. Pin to 4.6.

Five agents pinned to 4.6. Three on 4.7. The build still gets the planner and builder lifts where they matter. The cheap loops stay cheap.

The Verdict

Same rate card. Different bill.

Opus 4.7 prints better numbers on the benchmarks that matter for hard agent coding. It costs more for the same English prompt almost everywhere except the very short tail. And the default fast-mode flip means the new bill is already running on your laptop right now if you haven't set the override.

Pin 4.6 for evaluators, linters, and doc writers. Keep 4.7 for planners and SWE-heavy builders. Set the override flag once, forget about it, and let each agent role buy what it actually needs.

Full benchmark table, sources, and the override config: https://www.buildthisnow.com/blog/models/claude-opus-4-7-vs-4-6