Anthropic shipped Opus 4.8 on May 28. The benchmark tables got posted as usual — beats the previous model on SWE-Bench Pro, and so on. Everyone screenshots those.
I got stuck on the small thing sitting next to the benchmarks: the effort dial.
And this post isn't an announcement, it's a question. Hence #discuss.
First, let's be clear about what changed
The concrete things that shipped with Opus 4.8:
- Effort control moved into the UI. It used to be an API parameter — a flag only developers touched. Now it sits next to the model selector in claude.ai and Cowork. Levels: low, medium, high, xhigh (that's the name in Claude Code; "extra" in claude.ai), max.
- The default changed. 4.7 defaulted to xhigh. 4.8 defaults to high. And Anthropic says: on coding tasks, 4.8 at high spends roughly the same tokens as 4.7's default — but scores better on every coding benchmark. So it's a "think less, ship better" kind of tuning.
- There's a claim on the honesty side. Per Anthropic's own evals, 4.8 is around four times less likely to let a flaw in code it wrote pass unremarked. Early testers also report it flags its own uncertainties more often and makes fewer unsupported claims.
- Dynamic workflows (research preview). Claude Code can now run hundreds of parallel subagents in a single session; it's said to handle codebase-scale migrations across hundreds of thousands of lines from kickoff to merge, with the existing test suite as its bar. (Enterprise/Team/Max plans.)
- Fast mode runs at 2.5x speed, and is three times cheaper than it was on previous models.
Pricing is flat vs 4.7.
I saw the honesty thing first-hand: it stated a confident, specific violation with a line number, then pushed back on itself, re-ran the actual check, and corrected the made-up line numbers in the same turn. That's the part that matters more than any benchmark bar.
OK. Those are the features. Now the actual point.
The real question: does this change the routine?
The claim is this: you can't optimize tokens, speed, and quality at the same time. It's a triangle. Pull one corner and another stretches.
What Opus 4.8 does is take the choice on that triangle away from the model and hand it to you. Adaptive thinking said "I'll decide how much to think." The effort dial says "no, you tell me."
And here I'm split in two.
On one hand: this is a genuine workflow change. There's now a tiny decision before every task — "does this want high, or xhigh?" Cranking max on a hard refactor you leave running async is a different world from blowing through a typo fix on low. For someone who uses it with discipline, that's a real difference, from rate-limit management to output quality.
On the other hand: how many people will actually touch the dial? Most will leave it on default. And honestly — one tester warned that maxing out the highest settings on dynamic, looping tasks makes token burn "staggering." So used wrong, this isn't a feature, it's a trap.
There's also this: effort won't rescue a bad prompt. A vague instruction at max effort gets you a very expensive, very thorough answer to the wrong question. So maybe the thing that changed isn't the model — it's the discipline expected from us.
What I'm putting on the table
What I want to see in the comments:
Will you actually use the effort dial, or leave it on default? Be honest. Because the gap between "feature exists" and "feature gets used" is enormous in this field.
Which task gets which level? What's your practical rule? I'm in the "daily = high, long async = xhigh, max almost never" camp. You?
The honesty claim: if the model learned to say "I'm not sure," is that real trust, or just more convincing uncertainty? Will you review its code less now, or more?
The big picture: we're shifting from making the model smarter to letting the user dial the model's effort. Is that the right direction? Or is it a polished name for offloading complexity onto the end user?
I genuinely haven't decided. So I'm leaving it here.
The effort dial: game changer, or a slider bolted onto something we were already doing?
Comments open. Feel free to be blunt.


Top comments (0)