Sonnet 5 vs GLM-5.2 vs everyone: how to pick the cheapest LLM API in 2026

#ai #llm #machinelearning #webdev

Two frontier-class models just launched weeks apart — Anthropic's Claude Sonnet 5
(closed, $2/$10 per 1M launch pricing) and Z.AI's GLM-5.2 (open-weight, MIT, ~$1.40/
$4.40 across hosts) — and the first question everyone asks is "which is cheaper?"
The honest answer: it depends on your token mix, your tier, and whether cached
input matters. Here's a repeatable way to answer it for your case, using live,
verified pricing.

1. Normalize everything to $/1M tokens

Providers quote prices in incompatible units — per-1K, per-1M, sometimes per-image
or per-character — and split input, output, and cached-input. Before you can
compare anything, convert all of it to dollars per 1 million input tokens and
per 1 million output tokens. (This is the single biggest source of "wait, that's
cheaper than I thought" errors.)

2. Separate the question by tier

Comparing a frontier flagship to a budget model on price alone is meaningless.
Bucket first, then compare within a bucket:

Flagship / frontier: the spread is real. The cheapest flagship-class model right now is about $1 / $2 per 1M (in/out); the priciest frontier tier runs up to $30 / $180. Same nominal tier, a 30-90x spread — which is exactly why you bucket first. Sonnet 5 lands mid-tier on price despite frontier capability; GLM-5.2 is the cheapest open option at that level.
Budget / fast: the floor is far lower than most people assume — ~$0.017-$0.05 / 1M input for capable small models.
Embeddings: a near-commodity at ~$0.02 / 1M across several providers.
Open-weight, multi-host: the same open model (GLM-5.2, DeepSeek, Qwen) is often served by several providers at different prices — compare hosts, not just models.

3. Weight by your actual token ratio

A summarizer is input-heavy; a code generator is output-heavy. Output usually
costs 3-5x input, so a model that looks cheap on input can lose on a
generation-heavy workload. Multiply each rate by your real volume — don't eyeball
the sticker price.

4. Don't forget cached input

For RAG and agent loops you re-send the same context constantly. Cached-input
pricing is often a huge discount — Sonnet 5's cache hits are 90% cheaper than
fresh input ($0.20 vs $2.00 /1M) — and it can flip the ranking entirely. If your
workload is cache-heavy, rank by cached-input price, not raw input. (There's a
live ranking of caching-capable APIs
if you want the current order.)

5. Use live data, not a blog post's snapshot

Prices move — Sonnet 5's own launch pricing reverts from $2/$10 to $3/$15 on Sep 1,

A table you screenshot today is wrong next month. I maintain Model Price Watch, which tracks 159 models across 24 providers and re-verifies prices against each provider's official pricing page 3x a day. If you'd rather script it, there's a free no-key JSON API: https://modelpricewatch.com/api/v1/models.json.

Worked example: for a chat product doing ~2M input / 0.5M output tokens a day, run
those numbers through a cost calculator across your shortlist — and if you re-send
a big system prompt each call, add the cached-input rate. The difference between
Sonnet 5 with caching and a naive flagship default can be the majority of your bill.

Disclosure: I build and maintain Model Price Watch. The method above works with
any pricing source — I just happen to keep one current.