DEV Community

RAXXO Studios
RAXXO Studios

Posted on • Originally published at raxxo.shop

Opus 4.8 vs Sonnet vs Haiku: How I Route Work in 2026

  • Flat 5/25 price, a 3x cheaper Fast mode, and abstention reset the routing math in 2026

  • I reach for Opus 4.8 when a silent wrong answer is expensive, citing SWE-Bench Pro 69.2

  • Sonnet wins on well-scoped, high-volume, latency-sensitive jobs at mid relative cost

  • Haiku takes the cheapest, fastest work like classification and rough drafts

  • Fast mode runs about 2.5x faster and effort scales from low to max

For two years the rule was simple. Reach for the cheap tier by default, escalate to the top tier only when the cheap one failed. In 2026 I flipped that for hard work. Here is the routing logic I actually run, and the three changes in Opus 4.8 that moved the line.

What changed the routing math in 2026

The old math was loss-driven. Top-tier intelligence cost a premium, so I rationed it. I would start small, watch the cheap model stumble on a multi-file refactor, then pay the upgrade tax and redo the prompt. That is two passes for one job.

Three things shifted this year. First, the list price on the top tier held flat. Opus 4.8 is still 5 per million tokens in, 25 per million out, same as 4.7. No long-context surcharge either, even at the full 1M window. So the per-task ceiling stopped climbing while the capability kept rising.

Second, the new Fast mode runs about 2.5x faster and costs roughly a third of the old fast-tier price. That is close to 3x cheaper for the latency-sensitive slice of my work. Speed used to be a reason to drop down a tier. Now I can keep the heavier model and just flip Fast mode on.

Third, and this is the one that actually changed my defaults, abstention. Opus 4.8 is about 4x less likely to let its own code bugs pass unremarked than prior models, and it says "I am not sure" instead of confidently guessing. Reviewer Simon Willison called it honesty by abstention. That lowers the cost of being wrong, because the model flags the risky spot instead of burying it.

Put those together and the expensive failure mode (a confident wrong answer I ship and debug later) gets rarer at the top tier, while the price of using it did not move. For the full spec breakdown of what shipped, see the Opus 4.8 launch roundup. Here I am only interested in how it changes which model I pick.

When I reach for Opus 4.8

My trigger is one question. How expensive is a silent wrong answer here? If the cost is high, Opus 4.8, every time.

Hard agentic coding is the clearest case. Anything that spans many files, holds a lot of context, and has to keep a plan straight across dozens of tool calls. The benchmark that maps to this for me is SWE-Bench Pro, where Opus 4.8 scores 69.2, up from 64.3 on 4.7. On that same test it beats GPT-5.5 at 58.6 and Gemini 3.1 Pro at 54.2. SWE-Bench Verified lands at 88.6. Those numbers track what I feel on real refactors across my repos.

The deeper reason is abstention. On a 15-file migration, the worst outcome is not a model that gets stuck. It is a model that quietly invents a function signature, ships it green, and hands me a bug that surfaces three commits later. Opus 4.8 is 4x less likely to do that. It stops and tells me the part it cannot verify. On work where I am the only reviewer, that flag is worth more than a small speed gain.

I also default here for anything that reasons across my whole 15-repo setup, which I described in running a studio from one CLAUDE.md file. One coherent plan across a large context is exactly where the top tier earns the price.

To be fair, it is not a clean sweep. On Terminal-Bench 2.1 it loses to GPT-5.5, 74.6 to 78.2, and GPQA is a slight step back versus 4.7 (93.6 vs 94.2). I keep those in view. But for multi-file reasoning where a quiet mistake is costly, 4.8 is my default, not my escalation. This is the same logic I worked through in the six workflows that justify the Opus tier.

When Sonnet is the right call

Sonnet is my workhorse for everything that is well-scoped. If I can hand the model a clear spec, a single file, and a tight definition of done, the extra reasoning of the top tier is wasted budget. Sonnet sits in the middle on relative cost (cheaper than Opus, pricier than Haiku) and that middle is where most of my volume lives.

The pattern I look for: the task is bounded and I already know roughly what good output looks like. Writing a component against an existing design system. Adding a test to a function I just wrote. Translating a tidy spec into code with no ambiguity to resolve. There is no hidden multi-step plan that can drift, so abstention matters less and I would rather have the lower per-task spend.

Latency-sensitive loops are the other Sonnet zone. Anything in a tight write-run-read cycle where I am iterating fast and each round needs to come back quickly. The thinking budget of Opus is overhead I feel as waiting. Sonnet keeps the loop snappy.

I also lean Sonnet for high-volume batch work where the per-call spend compounds. Twenty similar edits across a codebase, each one simple. The savings stack up across the batch, and because each task is small the risk of a costly silent error is low to begin with. That is the whole point of Anthropic's advisor strategy: push capable reasoning down the price ladder so the mid tier handles more of the day.

The honest test: if I would not bother reviewing the output line by line because the task is too contained to go wrong, Sonnet is correct. The moment I catch myself wanting to double-check every line, that is the signal to move up.

When Haiku wins

Haiku is the cheapest and fastest tier, and that is exactly its job. I do not ask it to reason across files or hold a long plan. I ask it to do one small thing, fast, thousands of times.

Classification is the home run. Tagging articles by topic. Sorting incoming items into buckets. Scoring something against a rubric. The task is narrow, the output is short, and correctness is easy to spot-check in aggregate. Paying top-tier rates here would be like hiring a senior engineer to alphabetize a list.

Rough drafts are the other Haiku lane. First-pass copy I am going to rewrite anyway. Boilerplate I will edit by hand. A quick summary I just need to skim. When the output is a starting point, not a finished artifact, speed beats polish and the cheapest tier wins on volume.

The trap I avoid is letting Haiku creep into work that needs judgment. The exact moment it stops being right is when one wrong answer matters. Haiku does not get the 4x abstention benefit of Opus 4.8, so on high-stakes single calls it can guess where the top tier would have flagged uncertainty. For a batch of 5,000 tags, a few misses wash out. For one code change that ships to production, a single confident miss is the whole problem. So the routing rule is volume and disposability, not just price. If the work is cheap to verify and cheap to redo, Haiku. If a single output carries weight, move up.

The two dials that change everything

Picking the model is half the decision. The two settings on top of it matter just as much, and people leave them on defaults.

Dial one is Fast mode. It runs about 2.5x faster and costs roughly a third of the old fast-tier price, close to 3x cheaper than before. It is available on Opus 4.8, 4.7, and 4.6. This is what lets me keep a heavy model on latency-sensitive work instead of dropping a tier for speed. My habit now: pick the model for the difficulty, then flip Fast mode on whenever I am in an interactive loop and waiting on each turn.

Dial two is effort. The levels run low, medium, high, xhigh, and max. They trade thinking time against speed and spend. For Opus 4.8 the practical default lands around high. For hard agentic coding, the kind that spans many files, many people start at xhigh and let it reason longer before acting. I keep this light: more effort for genuinely hard planning, less for tasks where the answer is mostly lookup.

Here is the table I route by.

Task type
Model
Fast / Effort

Hard multi-file refactor, agentic coding
Opus 4.8
Fast off, xhigh effort

Architecture or planning across many repos
Opus 4.8
Fast off, high effort

Interactive coding loop, fast iteration
Opus 4.8
Fast on, high effort

Well-scoped single-file code, tests
Sonnet
Fast on, medium effort

High-volume batch edits, latency-sensitive
Sonnet
Fast on, low/medium effort

Classification, tagging, rough drafts
Haiku
Fast on, low effort

The research-preview Dynamic Workflows feature (fanning out hundreds of parallel subagents) sits on top of all this for Max, Team, and Enterprise plans, but that is its own topic and the roundup covers it.

Bottom Line

My rule of thumb fits one line: route by how expensive a silent wrong answer is, not by how cheap the model is. When a quiet mistake is costly (hard multi-file coding, anything I am the only reviewer for), Opus 4.8 is now my default, because abstention lowers the cost of being wrong while the flat 5/25 price held the ceiling steady. When the work is well-scoped, Sonnet. When it is cheap, disposable, and high-volume, Haiku. Then set the two dials: Fast mode for any loop where I am waiting, more effort for genuinely hard planning. That is the whole framework. The top tier stopped being a luxury I rationed and became a sensible default for hard work, which is a real shift from where I sat a year ago. If you want to see how I apply this routing across real client and product builds, here is what my one-person studio actually ships.

Top comments (0)