Claude API Cheatsheet 2026 — Models, Pricing, Limits in One Place

#claude #api #llm #cheatsheet

All information verified from Anthropic's official documentation as of May 2026.

Models

Model	ID	Context	Best For
Claude Opus 4.7	`claude-opus-4-7`	1M	Complex reasoning, hardest tasks
Claude Sonnet 4.6	`claude-sonnet-4-6`	1M	Most production workloads
Claude Haiku 4.5	`claude-haiku-4-5-20251001`	200K	Fast, simple tasks

💡 Opus 4.7 and Sonnet 4.6 both support 1M token context at flat rate — no surcharge.

API Pricing (per million tokens)

Model	Input	Output	Batch Input	Batch Output
Opus 4.7	$5.00	$25.00	$2.50	$12.50
Sonnet 4.6	$3.00	$15.00	$1.50	$7.50
Haiku 4.5	$1.00	$5.00	$0.50	$2.50

⚠️ Batch API = 50% discount, but processes within 24 hours (async only).

Prompt Caching

Cache Type	Cost
Cache write (5 min TTL)	1.25x input rate
Cache write (1 hour TTL)	2x input rate
Cache hit	~0.1x input rate (up to 90% savings)

💡 Best for: system prompts, repeated context, long documents.

Subscription Plans (not API)

Plan	Monthly	Annual	Notes
Free	$0	$0	Sonnet 4.6, rolling 5hr limit
Pro	$20/mo	$17/mo	All models + Claude Code
Max 5x	$100/mo	—	5x Pro usage
Max 20x	$200/mo	—	20x Pro usage
Team Standard	$25/seat	$20/seat	Min 5 seats
Team Premium	$125/seat	$100/seat	Includes Claude Code & Cowork

⚠️ Subscriptions ≠ API access. API is always billed separately per token.

Basic API Call

import anthropic

client = anthropic.Anthropic(api_key="your_key")

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)
print(message.content[0].text)

Batch API

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "request-1",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Hello!"}]
            }
        }
    ]
)
print(batch.id)

Key Tips

Haiku → simple classification, summaries, high-volume tasks
Sonnet → most production use cases, best price/performance
Opus → complex reasoning only (~1.7x more expensive than Sonnet, 5x more than Haiku)
Use Batch API for non-realtime workloads (50% cheaper)
Use prompt caching for repeated system prompts (up to 90% cheaper)
Opus 4.7 uses a new tokenizer — may consume up to 35% more tokens than Opus 4.6 for the same input. Update your max_tokens accordingly.
Opus 4.7 rejects non-default temperature, top_p, and top_k (returns 400 error) — omit these parameters entirely.