DEV Community

Cover image for The smarter the model, the more it saves.
Vikas Sabbi
Vikas Sabbi

Posted on

The smarter the model, the more it saves.

There is a belief, reasonable on its face, that has been quietly circulating among Claude Code power users since WOZCODE launched: that the plugin's advantage would fade as the underlying models got smarter. The argument goes something like this — if Claude can think more clearly, plan more efficiently, and make fewer mistakes on its own, why would it need a layer of optimized tooling on top? Intelligence, the thinking goes, is the best efficiency.
We believed this too. Every Opus release has been a stress test for that assumption. When Opus 4.7 dropped, we ran our benchmark again expecting the gap to narrow. It did not.

What we actually measured
The setup was deliberately boring. Same TypeScript codebase we have used for months. Same fifteen prompts covering the kind of work developers do every day — fixing a 500 error, splitting an oversized service class, adding JWT typing, wiring up Jest, and similar tasks. The only two variables were the model (Opus 4.6 or 4.7) and whether WOZCODE was installed. Everything else, including Claude Code's default settings, was left exactly as Anthropic ships it.
The results across all four combinations came back clean:
Opus 4.6 on vanilla Claude Code cost $11.62 per run. With WOZCODE installed, that same suite cost $6.88 — a 41% reduction. Those numbers were already enough to justify the plugin, but they were not the surprise.
The surprise was Opus 4.7.
Vanilla Claude Code on the new model cost $20.92 for the same suite. That is an 80% increase over Opus 4.6 — nearly double, on prompts that had not changed. With WOZCODE installed, Opus 4.7 cost $7.73. A 12% increase over 4.6, and a 63% reduction against vanilla 4.7 on the same run.
The dollar gap between vanilla and WOZCODE grew from $4.74 to $13.19 per run. Not narrowed. Grew.

Speed followed the same pattern
Cost was not the only thing that moved. Wall-clock time for vanilla Claude Code on Opus 4.6 was 28 minutes and 31 seconds. On Opus 4.7, that crept up to 35 minutes and 2 seconds — nearly seven minutes longer for the same output. WOZCODE on Opus 4.7 finished in 26 minutes and 21 seconds, faster than vanilla on the older model.
The turn count tells the story more starkly. Vanilla Claude Code needed 161 tool calls to clear the benchmark suite on Opus 4.7. WOZCODE needed 52. Same tasks, same model, less than a third of the round trips.

Why a better model amplifies better tools
The conventional wisdom had the relationship backwards. A smarter model does not make tooling irrelevant. It makes good tooling more valuable, because it can actually use it.
WOZCODE replaces several of Claude Code's built-in tools with versions designed around a different philosophy: do more meaningful work per call rather than more calls. Its combined search and read collapses what would otherwise be a grep followed by multiple file reads into a single operation. Its batched editor applies changes across an entire codebase in one call instead of file by file. Its AST-aware truncation returns function signatures during exploration and full file bodies only when the model has narrowed down where it needs to be. Its live SQL tool runs queries directly against connected databases, replacing what vanilla Claude Code handles through a Bash subprocess and several rounds of context-heavy output parsing.
None of those optimizations are useful if the model using them cannot plan. Batched edits only save turns when the model can reason about ten changes at once before making any of them. Combined search and read only helps when the model arrives at a call already knowing what it is looking for. Smarter, more deliberate planning is exactly what separates Opus 4.7 from its predecessor — and that is precisely what WOZCODE's tooling is designed to reward.
Vanilla Claude Code's primitives do not offer that surface. There is no "plan ten changes and execute them together" call to plan into. A smarter model still has to reach for the same per-file, per-operation interface, so its gains show up as denser, more expensive individual turns rather than fewer total calls. Combined with Anthropic's new xhigh effort default and a tokenizer update that increases token counts on the same inputs, that is where the 80% cost increase comes from. The intelligence is real. The tooling just cannot capture it.

What Anthropic said and what we measured
Anthropic noted at the Opus 4.7 launch that real-world spend would increase beyond the per-token price due to the tokenizer change and higher default effort. Their estimate was 20 to 30 percent over Opus 4.6 at max effort. Our measured number on vanilla Claude Code at leave-defaults — the configuration most users will run without touching a setting — was 80 percent. Some of that is workload-specific; prompts involving cross-file reasoning spend more on output tokens when the model thinks harder. But the direction is clear, and the magnitude for typical development workloads will sit well above the headline estimate.

The trajectory, not just the number
The more interesting question is not what happened with Opus 4.7 specifically, but what this result suggests about Opus 4.8, 5.0, and whatever comes after.
The savings gap widened by 22 percentage points moving from 4.6 to 4.7. If the underlying mechanism holds — better planners extract more value from tools designed for planning — then each new model release widens the gap rather than closing it. WOZCODE's tooling does not change shape between model versions. Its advantage compounds because the thing that makes its tools effective, the model's ability to plan into them, keeps improving while vanilla Claude Code's interface stays the same.
For teams on Claude's flat subscription plans, the dollar figure does not change when they upgrade to 4.7. But usage caps fill faster when vanilla Claude Code burns through 161 turns on a task that WOZCODE completes in 52. The effective capacity of a Max plan is roughly three times larger with WOZCODE installed, assuming similar task distribution.
For teams on API billing or pass-through pricing, the arithmetic is straightforward. Upgrading from Opus 4.6 to 4.7 adds about 85 cents per benchmark run with WOZCODE installed. Without it, the same upgrade adds over nine dollars per run. Installing the plugin and upgrading the model in the same week leaves you strictly ahead across every dimension that matters — cost, speed, and turns consumed.

Installing WOZCODE
Two terminal commands. No signup required, no code leaves your machine, and the free plan includes $100 per month in Claude Code savings with no account needed at all.
claude plugin marketplace add WithWoz/wozcode-plugin
claude plugin install woz@wozcode-marketplace
Opus 4.7 is the best model Anthropic has shipped. It thinks harder, plans better, and produces meaningfully stronger results. The tools it uses should be able to keep up.

Benchmark methodology: identical TypeScript codebase, leave-defaults preset, both runs completed April 28 2026. Leave-defaults means the benchmark runner does not override Claude Code effort or thinking settings, so each model runs with whatever Claude Code ships as its default configuration. Per-prompt breakdowns and raw run logs are available on request.

Top comments (0)