Claude Sonnet 5's New Tokenizer: 41% More Tokens per Prompt

#claude #caching #devops #ai

claude-sonnet-5 is live on the Synthorai gateway, and right now it is cheap: $2 / $10 per million input / output tokens, which is 2.5× under Opus 4.8 and below Sonnet 4.6. Enjoy it while it lasts. That is introductory pricing through August 31, 2026; on September 1 the rate returns to $3 / $15, the same sticker as Sonnet 4.6.

If you cache against the Claude line, the caching and TTL contract is a drop-in carry-over. Cost is where you have to look twice, and the reason is how Sonnet 5 counts tokens. It ships with a new tokenizer that turns the same English text into about 41% more input tokens than Sonnet 4.6, and token count is what you pay on and are limited by. The sticker price is only half the bill.

Here is what that token change touches, before any code change or quality question enters the picture:

Cost per prompt. At the standard rate, the same English prompt costs about 41% more than on Sonnet 4.6, since identical text is billed as more tokens at the same per-token price.
Every token-based estimate. A per-call budget, or a local-tokenizer count, sized against 4.6 runs about 40% low on Sonnet 5. Meter the live usage, not a local guess.
Context-window headroom. The same document eats about 41% more of the window, so long-context and RAG calls fit less real text per request.
Rate limits. A tokens-per-minute cap drains about 41% faster for the same workload, trimming throughput.
Cache eligibility (a small upside). The 1,024-token minimum is easier to clear, so a prefix that sat just under it on 4.6 may become cacheable on Sonnet 5.

The rest of the post puts measured numbers on each: price, the caching economics, and the token-count shift.

Prices, caching, TTL, and token counts measured against https://synthorai.io/ (Anthropic-native /v1/messages) on 2026-07-01. Per-token prices are derived from the usage cost on live calls; the intro/standard rates and the August 31 expiry are from Anthropic's announcement. Reproduce against your own prompt before quoting.

Availability

import os
from anthropic import Anthropic

anth = Anthropic(
    api_key=os.environ["SYNTHORAI_KEY"],
    base_url="https://synthorai.io/",   # SDK appends /v1/messages
)

msg = anth.messages.create(
    model="claude-sonnet-5",            # the only line that changes
    max_tokens=512,
    system=[
        {"type": "text", "text": SYSTEM_PROMPT,
         "cache_control": {"type": "ephemeral"}},
    ],
    messages=[{"role": "user", "content": question}],
)
print(msg.usage)   # cache_creation_input_tokens, cache_read_input_tokens, cost

Swap the model field and nothing in your caching path moves. The mechanics behind cache_control are in the caching tutorial; the architecture of why the cache exists is in Part 1 of the series.

Price: cheap now, back to Sonnet 4.6's rate in September

Per-token pricing on the gateway, derived from the usage cost on plain (uncached) calls:

Model	Input ($/M)	Output ($/M)
`claude-sonnet-5` (intro, through Aug 31)	$2.00	$10.00
`claude-sonnet-5` (standard, from Sep 1)	$3.00	$15.00
`claude-sonnet-4-6`	$3.00	$15.00
`claude-opus-4-8`	$5.00	$25.00

The intro rate is a real discount, and against Opus 4.8 it is the durable part of the story: even at the standard $3 / $15, Sonnet 5 stays cheaper than Opus, and the two share a tokenizer (more on that below), so the comparison is clean at both prices.

Against Sonnet 4.6 the discount is temporary. On September 1 the sticker price is identical, so any "Sonnet 5 is cheaper than 4.6" plan built on today's number expires with the promo. And as the next section shows, at equal sticker price Sonnet 5 is actually the pricier of the two for the same text.

We don't publish capability benchmarks we haven't run; whether Sonnet 5's quality justifies its cost over 4.6 is your eval, not ours.

Caching and TTL: a drop-in

The caching contract is identical to the rest of the Claude line. We ran a cold write / warm read sequence with a stable 2.2K-token prefix, varying the user message each call so no response-level cache could contaminate the result. Cost per warm turn, at the current intro price:

Model	Cold turn (cache write)	Warm turn (cache read)	Cold → warm
`claude-sonnet-5` (intro)	$0.0069	$0.0017	4.0×
`claude-sonnet-4-6`	$0.0079	$0.0024	3.3×
`claude-opus-4-8`	$0.0172	$0.0043	4.0×

The invariants hold as they do across the Opus line:

Read discount ≈ 90%. A warm cache read costs about 10% of the input price, matching Anthropic's documented "up to 90%" cached-read savings. Break-even is one hit.
1-hour TTL works the same. cache_control: {"type": "ephemeral", "ttl": "1h"} is accepted on Sonnet 5, and the usage object splits the buckets as before: cache_creation.ephemeral_5m_input_tokens vs ephemeral_1h_input_tokens. The 1-hour write premium is about 2× no-cache (vs about 1.25× for the 5-minute write); reads stay ≈10% regardless of TTL.

One caveat on the table: those warm-turn dollars are at the intro rate. From September 1, multiply the Sonnet 5 figures by 1.5× ($2 → $3 input, $10 → $15 output). A warm Sonnet 5 turn that costs $0.0017 today is about $0.0026 in September, still under Opus 4.8's $0.0043, but no longer under Sonnet 4.6.

The token-count catch

Here is what makes the September reset bite twice. The same system text reports about 41% more input tokens on Sonnet 5 than on Sonnet 4.6.

Model	Input tokens (identical text)	Input cost at standard price
`claude-sonnet-4-6`	1,594	$0.0048
`claude-sonnet-5`	2,245	$0.0067
`claude-opus-4-8`	2,245	$0.0112

Sonnet 5 tokenizes the same English prompt as 2,245 tokens, the identical count Opus 4.8 reports, and well above Sonnet 4.6's 1,594. Sonnet 5 shipped with the newer tokenizer the Opus line adopted at 4.7.

Put the price and the token count together and the picture is clear:

During the intro period, the 41% token bump is offset by the 33% lower rate ($2 vs $3), so the same uncached prompt costs about what it did on 4.6, and warm turns run cheaper thanks to the discounted output.
From September 1, the rate matches 4.6 but the token count does not. The same English prompt costs about 41% more on Sonnet 5 than on Sonnet 4.6 ($0.0067 vs $0.0048 for this prefix), because identical text is simply counted as more tokens at the same per-token price.

Against Opus 4.8 there is no such catch: the tokenizer is the same (2,245 = 2,245), so Sonnet 5 is cleanly cheaper at both the intro rate (2.5×) and the standard rate (1.67×).

So budget the September bill, not the July one: the per-token rate rises 1.5× on September 1, and the higher token count is already baked in today. And read cache_creation_input_tokens / cache_read_input_tokens from the live response rather than a local tokenizer that may still be on the old vocabulary.

Sonnet 5 vs Opus 4.8: the durable win

This is the comparison the launch changes for keeps. Sonnet 5 and Opus 4.8 share a tokenizer, so on any prompt the token counts are identical and the cost difference is purely the rate: 2.5× cheaper at the intro price, 1.67× cheaper at the standard price, on cold turns, warm turns, input, and output alike. A warm cached turn is $0.0017 vs $0.0043 today; even in September it is roughly $0.0026 vs $0.0043.

For a high-volume caching agent loop where the prefix repeats every turn, that gap compounds. The decision is the usual one: run your own eval, and if Sonnet 5 clears your quality bar, the gateway math favors it durably, not just until August. If it doesn't, Opus 4.8 is one model field away with the same caching code.

Migration checklist

✅ Caching code carries over verbatim. cache_control markers, breakpoint count, ttl: "1h", usage field names are all identical to the Opus line.
✅ TTL choices carry over. 5m for live/session workloads, 1h for bursty or agent-with-pauses work.
✅ Discount economics carry over. ≈90% read, ≈1.25× write (5m), ≈2× write (1h).
⚠️ Mark September 1 on the budget. The intro rate ends Aug 31; Sonnet 5 goes to $3 / $15. Model the 1.5× step-up before it lands.
⚠️ Re-measure token counts (from 4.6 or earlier). Same text, about 41% more tokens on Sonnet 5. At standard pricing that makes the same prompt pricier than 4.6, not cheaper.
⚠️ Trust the live usage object. Read *_input_tokens and cost from the response, not a cached estimate from the old generation.

Bottom line

Sonnet 5 is a strong deal on a clock. Against Opus 4.8 it is a durable 1.67–2.5× cheaper with a drop-in caching path, which makes it the obvious first thing to eval for any Opus workload that isn't quality-critical. Against Sonnet 4.6 the win is only the introductory discount: on September 1 the price matches 4.6, and the new tokenizer means the same prompt actually costs more. Take the discount, but size your budget on the September numbers and confirm your token counts against the live usage object before you promise finance anything.

For the full caching playbook, see the four-part series starting with How KV Cache & TTL Work and the working Python tutorial.

FAQ

Is Sonnet 5 cheaper than Sonnet 4.6?
Only during the introductory period. Through August 31, 2026 it is $2 / $10 vs 4.6's $3 / $15. From September 1 it is $3 / $15, the same rate. And because the same text counts as about 41% more tokens on Sonnet 5, at the standard price the same prompt costs more than on 4.6.

When does the intro price end?
August 31, 2026, per Anthropic's announcement. On September 1 the rate becomes $3 per million input and $15 per million output tokens.

How much cheaper is Sonnet 5 than Opus 4.8?
2.5× at the intro rate, 1.67× at the standard rate, on both input and output. They share a tokenizer, so token counts match and the difference is purely the rate, at both prices.

Do I need to change my cache_control code?
No. Marker syntax, breakpoint limit, and TTL options are identical to the Opus line. Change the model field and nothing else. Warm reads are ≈10% of the input price; the 1-hour write is ≈2× no-cache, the 5-minute write ≈1.25×.

Is Sonnet 5 a drop-in replacement for Opus 4.8?
On the caching, TTL, and cost surface, migration is trivial and it is cheaper at both prices. On quality, run your own eval; we don't publish capability benchmarks we haven't run. For model-quality claims, see Anthropic's model card.

Verification: price, caching, TTL, and token-count figures measured against https://synthorai.io/ on 2026-07-01 using the Anthropic-native /v1/messages path, single tenant. Per-token prices are derived from usage cost on plain calls; cost-per-turn is a small-sample median with a 2.2K-token cached prefix and reflects the current intro rate. Intro pricing and the August 31, 2026 expiry are from Anthropic's Sonnet 5 announcement; discount/premium ratios cross-checked against Anthropic Prompt Caching docs. Your numbers will vary with prompt, region, and load.