hiyoyo

Posted on May 28

Claude API Cheatsheet 2026 — Models, Pricing, Limits in One Place

#claude #llm #cheatsheet #ai

Claude API Cheatsheet 2026 — Models, Pricing, Limits in One Place

All information verified from Anthropic's official documentation as of May 2026.

Updated May 29, 2026: Claude Opus 4.8 released. Model ID, pricing, and tips updated.

Models

Model	ID	Context	Best For
Claude Opus 4.8	`claude-opus-4-8`	1M	Complex reasoning, hardest tasks
Claude Sonnet 4.6	`claude-sonnet-4-6`	1M	Most production workloads
Claude Haiku 4.5	`claude-haiku-4-5-20251001`	200K	Fast, simple tasks

💡 Opus 4.8 and Sonnet 4.6 both support 1M token context at flat rate — no surcharge.

API Pricing (per million tokens)

Model	Input	Output	Batch Input	Batch Output
Opus 4.8 (regular)	$5.00	$25.00	$2.50	$12.50
Opus 4.8 (fast mode)	$10.00	$50.00	—	—
Sonnet 4.6	$3.00	$15.00	$1.50	$7.50
Haiku 4.5	$1.00	$5.00	$0.50	$2.50

⚠️ Batch API = 50% discount, but processes within 24 hours (async only).

💡 Fast mode runs at 2.5× speed. Now 3× cheaper than fast mode was for previous Opus models.

¹ fast mode note: Opus 4.8 fast mode ($10/$50) is separately priced from the older Opus 4.6/4.7 fast mode ($30/$150). The fast mode docs now confirm speed: "fast" works with Opus 4.8, but explicit 4.8 pricing is not yet listed there. Prices above are from the official release announcement and Anthropic's pricing page.

Prompt Caching

Cache Type	Cost
Cache write (5 min TTL)	1.25x input rate
Cache write (1 hour TTL)	2x input rate
Cache hit	~0.1x input rate (up to 90% savings)

💡 Best for: system prompts, repeated context, long documents.

💡 Opus 4.8 lowers the minimum cacheable tokens from 4,096 to 1,024 — making caching practical for shorter system prompts too.

Subscription Plans (not API)

Plan	Monthly	Annual	Notes
Free	$0	$0	Sonnet 4.6, rolling 5hr limit
Pro	$20/mo	$17/mo	All models + Claude Code
Max 5x	$100/mo	—	5x Pro usage
Max 20x	$200/mo	—	20x Pro usage
Team Standard	$25/seat	$20/seat	Min 5 seats
Team Premium	$125/seat	$100/seat	Includes Claude Code & Cowork

⚠️ Subscriptions ≠ API access. API is always billed separately per token.

Basic API Call

import anthropic

client = anthropic.Anthropic(api_key="your_key")

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)
print(message.content[0].text)

Batch API

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "request-1",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Hello!"}]
            }
        }
    ]
)
print(batch.id)

Mid-task System Updates (NEW in Opus 4.8)

The Messages API now accepts system entries inside the messages array.
This lets you update Claude's instructions mid-task without breaking the prompt cache.

messages=[
    {"role": "user", "content": "Start the migration."},
    {"role": "assistant", "content": "Starting now..."},
    {"role": "system", "content": "Updated instruction: also update test files."},
    {"role": "user", "content": "Continue."},
]

💡 Useful for updating permissions, token budgets, or environment context while an agent is running.

Key Tips

Haiku → simple classification, summaries, high-volume tasks
Sonnet → most production use cases, best price/performance
Opus → complex reasoning, long-horizon agentic tasks (~1.7x more expensive than Sonnet, 5x more than Haiku)
Use Batch API for non-realtime workloads (50% cheaper)
Use prompt caching for repeated system prompts (up to 90% cheaper)
Opus 4.8 defaults to high effort. Use xhigh (Claude Code) or max for harder tasks — more tokens, better results. Rate limits in Claude Code have been increased to accommodate this.
Opus 4.8 fast mode runs at 2.5× speed at $10/$50 per million tokens — 3× cheaper than fast mode was on previous Opus models.
Opus 4.8 is ~4× less likely than Opus 4.7 to let code flaws pass unremarked (per Anthropic's evaluations).
Migrating from Opus 4.7? No new breaking changes have been officially announced for 4.8. Constraints introduced in 4.7 (non-default temperature/top_p/top_k returning 400, adaptive thinking replacing extended thinking) are expected to carry over. Tokenizer is assumed unchanged, but actual token cost may vary — verify with your own inputs.
Extended thinking: not supported on Opus 4.8 (confirmed in Anthropic's model overview). Use adaptive thinking instead.

Sources: Anthropic news — Introducing Claude Opus 4.8 · Claude API docs

If this was useful, a ❤️ helps more than you'd think.

Top comments (5)

Harjot Singh • May 31

A one-place cheatsheet for models/pricing/limits is genuinely handy - bookmarking. One suggestion that would make it even more useful: a "best model for this job" column rather than just listing them, because the practical question isn't "what does each cost" but "which one should I actually reach for given the task." Haiku-tier for classification/extraction, Sonnet for most coding, Opus for the gnarly reasoning - the price table only becomes decisions when paired with that mapping.

The limits section is the underrated one too - rate limits and context windows shape architecture as much as price does (they're why batching and context-scoping aren't optional at scale). Having all three in one place is exactly the reference you need to make routing decisions, which is the whole game in multi-model setups like Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - we pick the cheapest model that clears each task's bar, and a cheatsheet like yours is literally the lookup table that informs that. Really useful resource. Are you keeping it updated as pricing shifts? This space moves fast enough that a maintained cheatsheet becomes a go-to reference.

hiyoyo • May 31

Thanks for the detailed feedback — the "best model for this task" column is a genuinely good idea and I'll add it. The Haiku/Sonnet/Opus routing breakdown is exactly the kind of practical layer that makes a price table actually useful for decisions.

As for keeping it updated — I use Opus myself on a daily basis, so as long as Anthropic keeps shipping, I'll keep the cheatsheet current.

Interesting to hear about Moonshift — picking the cheapest model that clears each task's bar is exactly the right framing for multi-model routing.

Harjot Singh • May 31

A "best model for this task" column would genuinely be the most useful thing in the whole cheatsheet, because the real cost lever isn't the per-token price, it's not defaulting every call to the flagship. Cheap model for the easy 80%, frontier only where it's actually needed. If you add it, the framing I'd use is cost-per-completed-task, not per-token, since a pricier model that one-shots beats a cheap one you retry three times. That routing decision is most of how I keep my own builds to a few dollars. Solid reference either way, bookmarked.

Harjot Singh • May 31

Glad the column idea landed, the "best model for this task" row is the part people actually act on, the rest is reference. And "cheapest model that clears the bar" only works if you can define the bar per task, which is the quietly hard part: for codegen the bar is "does it compile and pass tests," for summarization it's fuzzier and you fall back to a judge. The cheatsheet plus that decision column is basically the manual version of what Moonshift automates per step. One ask for the table: a "fails silently vs degrades gracefully" note per model would be gold, because the real risk isn't price, it's a cheap model being confidently wrong on a task you assumed it could handle. Bookmarking this.

hiyoyo • Jun 1

The "fails silently vs degrades gracefully" column is a genuinely interesting idea, but hard to put in a table — it varies too much by task and prompt style.

In my experience, Claude will tell you when it doesn't know something, and if you clearly know more than it does, it'll just ask you to explain. Gemini... not so much, it'll confidently make things up. But honestly, I think a lot of this comes down to how you talk to the model. Push any LLM hard enough with an aggressive prompt and it starts hallucinating. Keep it calm and honest and Claude at least will stay honest back.

The cost-per-task framing is good though — I'll work that into the cheatsheet update.