DEV Community

Cover image for Claude API Cheatsheet 2026 — Models, Pricing, Limits in One Place
hiyoyo
hiyoyo

Posted on

Claude API Cheatsheet 2026 — Models, Pricing, Limits in One Place

All information verified from Anthropic's official documentation as of May 2026.

Updated May 29, 2026: Claude Opus 4.8 released. Model ID, pricing, and tips updated.


Models

Model ID Context Best For
Claude Opus 4.8 claude-opus-4-8 1M Complex reasoning, hardest tasks
Claude Sonnet 4.6 claude-sonnet-4-6 1M Most production workloads
Claude Haiku 4.5 claude-haiku-4-5-20251001 200K Fast, simple tasks

💡 Opus 4.8 and Sonnet 4.6 both support 1M token context at flat rate — no surcharge.


API Pricing (per million tokens)

Model Input Output Batch Input Batch Output
Opus 4.8 (regular) $5.00 $25.00 $2.50 $12.50
Opus 4.8 (fast mode) ¹ $10.00 $50.00
Sonnet 4.6 $3.00 $15.00 $1.50 $7.50
Haiku 4.5 $1.00 $5.00 $0.50 $2.50

⚠️ Batch API = 50% discount, but processes within 24 hours (async only).

💡 Fast mode runs at 2.5× speed. Now 3× cheaper than fast mode was for previous Opus models.

¹ Fast mode note: Opus 4.8 fast mode ($10/$50) is priced separately from the previous fast mode for Opus 4.6/4.7 ($30/$150). The fast mode documentation page has not yet been updated to reflect Opus 4.8. Pricing figures are sourced from the official release announcement and the Anthropic pricing page.


Prompt Caching

Cache Type Cost
Cache write (5 min TTL) 1.25x input rate
Cache write (1 hour TTL) 2x input rate
Cache hit ~0.1x input rate (up to 90% savings)

💡 Best for: system prompts, repeated context, long documents.


Subscription Plans (not API)

Plan Monthly Annual Notes
Free $0 $0 Sonnet 4.6, rolling 5hr limit
Pro $20/mo $17/mo All models + Claude Code
Max 5x $100/mo 5x Pro usage
Max 20x $200/mo 20x Pro usage
Team Standard $25/seat $20/seat Min 5 seats
Team Premium $125/seat $100/seat Includes Claude Code & Cowork

⚠️ Subscriptions ≠ API access. API is always billed separately per token.


Basic API Call

import anthropic

client = anthropic.Anthropic(api_key="your_key")

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)
print(message.content[0].text)
Enter fullscreen mode Exit fullscreen mode

Batch API

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "request-1",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Hello!"}]
            }
        }
    ]
)
print(batch.id)
Enter fullscreen mode Exit fullscreen mode

Mid-task System Updates (NEW in Opus 4.8)

The Messages API now accepts system entries inside the messages array.
This lets you update Claude's instructions mid-task without breaking the prompt cache.

messages=[
    {"role": "user", "content": "Start the migration."},
    {"role": "assistant", "content": "Starting now..."},
    {"role": "system", "content": "Updated instruction: also update test files."},
    {"role": "user", "content": "Continue."},
]
Enter fullscreen mode Exit fullscreen mode

💡 Useful for updating permissions, token budgets, or environment context while an agent is running.


Key Tips

  • Haiku → simple classification, summaries, high-volume tasks
  • Sonnet → most production use cases, best price/performance
  • Opus → complex reasoning, long-horizon agentic tasks (~1.7x more expensive than Sonnet, 5x more than Haiku)
  • Use Batch API for non-realtime workloads (50% cheaper)
  • Use prompt caching for repeated system prompts (up to 90% cheaper)
  • Opus 4.8 defaults to high effort. Use xhigh (Claude Code) or max for harder tasks — more tokens, better results. Rate limits in Claude Code have been increased to accommodate this.
  • Opus 4.8 fast mode runs at 2.5× speed at $10/$50 per million tokens — 3× cheaper than fast mode was on previous Opus models.
  • Opus 4.8 is ~4× less likely than Opus 4.7 to let code flaws pass unremarked (per Anthropic's evaluations).
  • Migrating from Opus 4.7? No new breaking changes have been officially announced for 4.8. Constraints introduced in 4.7 (non-default temperature/top_p/top_k returning 400, adaptive thinking replacing extended thinking) are expected to carry over. Tokenizer is assumed unchanged, but actual token cost may vary — verify with your own inputs.
  • Extended thinking: not supported on Opus 4.8 (confirmed in Anthropic's model overview). Use adaptive thinking instead.

Sources: Anthropic news — Introducing Claude Opus 4.8 · Claude API docs

Top comments (0)