If you're building with the Claude API, pricing will be one of your first questions. The good news: it's entirely token-based, predictable, and — with the right techniques — very affordable even at scale. This guide breaks down exactly what you'll pay and how to minimize it.
How Claude API Pricing Works
Claude charges per token — roughly 4 characters of English text. Every API call has two costs:
- Input tokens — your prompt, system message, and conversation history
- Output tokens — the model's response (typically 3–5× more expensive)
Prices are listed per 1 million tokens (MTok).
Claude Model Pricing (May 2025)
ModelInput (per MTok)Output (per MTok)Best forClaude Haiku 4.5$0.80$4.00High-volume, simple tasksClaude Sonnet 4.6$3.00$15.00Balanced — most projectsClaude Opus 4.7$15.00$75.00Complex reasoning, low volume
Rule of thumb: Start with Sonnet. Switch to Haiku for repetitive tasks (classification, extraction). Use Opus only when you need the extra reasoning power.
Real-World Cost Estimates
Let's make this concrete. Here's a Python helper to estimate monthly costs:
def estimate_cost(input_tokens: int, output_tokens: int, model: str = "sonnet") -> float: prices = { "haiku": (0.80, 4.00), # per 1M tokens (input, output) "sonnet": (3.00, 15.00), "opus": (15.00, 75.00), } inp, out = prices[model] return (input_tokens / 1_000_000 * inp) + (output_tokens / 1_000_000 * out)# Example: 1000 API calls, avg 500 input + 300 output tokens eachcalls = 1000cost = estimate_cost(calls * 500, calls * 300, model="sonnet")print(f"Monthly estimate: ${cost:.2f}")# → Monthly estimate: $5.99
Some typical scenarios on Sonnet:
Use CaseCalls/dayTokens/callMonthly costSimple chatbot1,000500 in / 300 out~$6Document summarizer2003,000 in / 500 out~$7Code review tool1002,000 in / 1,000 out~$6Classification pipeline10,000200 in / 50 out~$4
Cut Costs with Prompt Caching (Up to 90% Off)
Prompt caching is the single biggest lever for reducing API costs. When you mark a portion of your prompt with cache_control, Claude stores it server-side for 5 minutes. Subsequent requests that reuse that prefix pay only 10% of normal input cost for the cached portion.
Cache writes cost 25% more than normal input — but you break even after just 2 cache hits, and save heavily after that. Perfect for:
- Long system prompts repeated across many calls
- Large document contexts (PDFs, codebases) queried multiple times
- Few-shot example sets used in a session
import anthropicclient = anthropic.Anthropic()# System prompt cached after first request (5-minute TTL)response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, system=[ { "type": "text", "text": "You are a helpful coding assistant with deep Python expertise.", "cache_control": {"type": "ephemeral"} # cache this prefix } ], messages=[{"role": "user", "content": "Explain list comprehensions."}])# Check cache performance in response headers / usageusage = response.usageprint(f"Input tokens: {usage.input_tokens}")print(f"Cache write tokens: {usage.cache_creation_input_tokens}")print(f"Cache read tokens: {usage.cache_read_input_tokens}")
Batch API: 50% Discount for Async Workloads
The Batch API processes requests asynchronously (within 24 hours) at 50% off standard pricing. Ideal for:
- Overnight data processing pipelines
- Bulk content generation or classification
- Offline analysis that doesn't need real-time responses
For Sonnet, that's $1.50/MTok input and $7.50/MTok output — cheaper than Haiku at standard rates.
Practical Tips to Keep Costs Low
- Log token usage — every response includes usage.input_tokens and usage.output_tokens. Track them from day one.
- Use max_tokens — always set a reasonable limit to prevent runaway outputs.
- Cache aggressively — anything over 1,024 tokens that repeats is a caching candidate.
- Right-size your model — Haiku at $0.80/MTok handles most classification, extraction, and short Q&A tasks perfectly well.
- Use Batch API for anything that doesn't need real-time responses.
- Compress history in long conversations — summarize older turns instead of passing them verbatim.
Bottom Line
Claude API pricing is straightforward and highly controllable. For most developer projects, you're looking at $5–30/month at moderate scale. With prompt caching and the right model tier, costs stay low even as usage grows. The key is to measure from the start — token counts are always in the response, so there's no excuse not to track them.
Originally published at kalyna.pro
Top comments (0)