All information verified from Anthropic's official documentation as of May 2026.
Updated May 29, 2026: Claude Opus 4.8 released. Model ID, pricing, and tips updated.
Models
| Model | ID | Context | Best For |
|---|---|---|---|
| Claude Opus 4.8 | claude-opus-4-8 |
1M | Complex reasoning, hardest tasks |
| Claude Sonnet 4.6 | claude-sonnet-4-6 |
1M | Most production workloads |
| Claude Haiku 4.5 | claude-haiku-4-5-20251001 |
200K | Fast, simple tasks |
💡 Opus 4.8 and Sonnet 4.6 both support 1M token context at flat rate — no surcharge.
API Pricing (per million tokens)
| Model | Input | Output | Batch Input | Batch Output |
|---|---|---|---|---|
| Opus 4.8 (regular) | $5.00 | $25.00 | $2.50 | $12.50 |
| Opus 4.8 (fast mode) ¹ | $10.00 | $50.00 | — | — |
| Sonnet 4.6 | $3.00 | $15.00 | $1.50 | $7.50 |
| Haiku 4.5 | $1.00 | $5.00 | $0.50 | $2.50 |
⚠️ Batch API = 50% discount, but processes within 24 hours (async only).
💡 Fast mode runs at 2.5× speed. Now 3× cheaper than fast mode was for previous Opus models.¹ Fast mode note: Opus 4.8 fast mode ($10/$50) is priced separately from the previous fast mode for Opus 4.6/4.7 ($30/$150). The fast mode documentation page has not yet been updated to reflect Opus 4.8. Pricing figures are sourced from the official release announcement and the Anthropic pricing page.
Prompt Caching
| Cache Type | Cost |
|---|---|
| Cache write (5 min TTL) | 1.25x input rate |
| Cache write (1 hour TTL) | 2x input rate |
| Cache hit | ~0.1x input rate (up to 90% savings) |
💡 Best for: system prompts, repeated context, long documents.
Subscription Plans (not API)
| Plan | Monthly | Annual | Notes |
|---|---|---|---|
| Free | $0 | $0 | Sonnet 4.6, rolling 5hr limit |
| Pro | $20/mo | $17/mo | All models + Claude Code |
| Max 5x | $100/mo | — | 5x Pro usage |
| Max 20x | $200/mo | — | 20x Pro usage |
| Team Standard | $25/seat | $20/seat | Min 5 seats |
| Team Premium | $125/seat | $100/seat | Includes Claude Code & Cowork |
⚠️ Subscriptions ≠ API access. API is always billed separately per token.
Basic API Call
import anthropic
client = anthropic.Anthropic(api_key="your_key")
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(message.content[0].text)
Batch API
import anthropic
client = anthropic.Anthropic()
batch = client.messages.batches.create(
requests=[
{
"custom_id": "request-1",
"params": {
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}
}
]
)
print(batch.id)
Mid-task System Updates (NEW in Opus 4.8)
The Messages API now accepts system entries inside the messages array.
This lets you update Claude's instructions mid-task without breaking the prompt cache.
messages=[
{"role": "user", "content": "Start the migration."},
{"role": "assistant", "content": "Starting now..."},
{"role": "system", "content": "Updated instruction: also update test files."},
{"role": "user", "content": "Continue."},
]
💡 Useful for updating permissions, token budgets, or environment context while an agent is running.
Key Tips
- Haiku → simple classification, summaries, high-volume tasks
- Sonnet → most production use cases, best price/performance
- Opus → complex reasoning, long-horizon agentic tasks (~1.7x more expensive than Sonnet, 5x more than Haiku)
- Use Batch API for non-realtime workloads (50% cheaper)
- Use prompt caching for repeated system prompts (up to 90% cheaper)
-
Opus 4.8 defaults to
higheffort. Usexhigh(Claude Code) ormaxfor harder tasks — more tokens, better results. Rate limits in Claude Code have been increased to accommodate this. - Opus 4.8 fast mode runs at 2.5× speed at $10/$50 per million tokens — 3× cheaper than fast mode was on previous Opus models.
- Opus 4.8 is ~4× less likely than Opus 4.7 to let code flaws pass unremarked (per Anthropic's evaluations).
-
Migrating from Opus 4.7? No new breaking changes have been officially announced for 4.8. Constraints introduced in 4.7 (non-default
temperature/top_p/top_kreturning 400, adaptive thinking replacing extended thinking) are expected to carry over. Tokenizer is assumed unchanged, but actual token cost may vary — verify with your own inputs. - Extended thinking: not supported on Opus 4.8 (confirmed in Anthropic's model overview). Use adaptive thinking instead.
Sources: Anthropic news — Introducing Claude Opus 4.8 · Claude API docs
Top comments (0)