A friend of mine deployed a customer support chatbot using GPT-4o. Three days later: $340 in OpenAI charges. He had no idea where it came from. He thought a few thousand API calls would cost maybe $10.
That's the LLM API cost trap — and it gets almost every developer the first time, because nobody actually teaches you the math before you ship.
This article fixes that. We'll cover:
- What tokens actually are (the explanation that's actually useful)
- Why input and output tokens are priced differently — and why it matters
- A side-by-side pricing table for GPT-4o, Claude, Gemini, and others (mid-2026)
- The exact formula to estimate your monthly bill before you deploy
- 5 silent cost killers in production AI apps
🧮 If you just want the number fast: LLM API Cost Calculator — free, no signup, runs in your browser.
What Is a Token? (The Version That Actually Helps)
Every article says "a token is ~4 characters or 0.75 words." Technically true. Practically useless.
Here's what you actually need to know: a token is the smallest chunk of text an LLM processes. The tokenizer splits your text using a vocabulary of ~100,000 patterns. Common words are usually 1 token. Rare or long words split into multiple tokens.
Real examples:
"Hello" → 1 token
"Hello world" → 2 tokens
"internationalization" → 4 tokens
{"name": "Muhammad"} → ~7 tokens
A 500-word article → ~650–700 tokens
Why does this matter for cost? Every API call charges you for:
- Every token you send (prompt + conversation history + system prompt)
- Every token the model generates back
That friend with the $340 bill? He was passing the full 20-message conversation history on every single turn. By message 20, each API call was using 4,000+ tokens in context before the model even started replying.
Input Tokens vs Output Tokens — The Pricing Split
This is the distinction most developers miss and it costs them the most money.
Providers split pricing into:
- Input tokens — everything you send to the model
- Output tokens — everything the model generates back
Output tokens are almost always 3–5x more expensive than input tokens. Because generating a token requires the model to run a full forward pass for every single character it produces (autoregressive generation). Reading input is one pass. Writing output is N passes.
Practical impact:
| Prompt Style | Input Tokens | Output Tokens | Cost Ratio |
|---|---|---|---|
| "Summarize in 3 bullets" | 850 | 120 | Low output cost |
| "Write a detailed analysis" | 850 | 600 | 5x more output cost |
Same input. Radically different bill. At 10,000 calls/month, that's hundreds of dollars difference from one word in your prompt.
The lever most developers ignore: control output length with max_tokens, not just prompt length.
LLM API Pricing Table — Mid-2026
⚠️ Pricing changes frequently. Always verify at openai.com/api/pricing and anthropic.com/pricing.
| Model | Input / 1M tokens | Output / 1M tokens | Context |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128K |
| GPT-4o mini | $0.15 | $0.60 | 128K |
| Claude Sonnet 4 | $3.00 | $15.00 | 200K |
| Claude Haiku 4.5 | $0.80 | $4.00 | 200K |
| Gemini 1.5 Pro | $1.25 | $5.00 | 1M+ |
| Gemini 1.5 Flash | $0.075 | $0.30 | 1M+ |
| Llama 3.3 70B (Groq) | $0.59 | $0.79 | 128K |
GPT-4o mini output tokens are 16x cheaper than GPT-4o. For classification, routing, or simple Q&A — this is the switch that changes unit economics entirely.
The Formula
Cost per call =
(input_tokens / 1_000_000 × input_rate) +
(output_tokens / 1_000_000 × output_rate)
Worked example — Document summarizer on Claude Sonnet 4:
Document: 3,000 tokens (input)
System prompt: 200 tokens (input)
Summary output: 400 tokens (output)
Cost per call:
Input: (3,200 / 1,000,000) × $3.00 = $0.0096
Output: (400 / 1,000,000) × $15.00 = $0.0060
Total: $0.0156
Monthly (5,000 summaries): $78
Now add conversation history — context grows to 8,000 tokens per call → $195/month. Switch to flagship model → $600+. The math compounds fast.
How to Count Tokens Before You Send
Don't guess — count. Here's how for each provider:
OpenAI — tiktoken:
import tiktoken
encoder = tiktoken.encoding_for_model("gpt-4o")
tokens = encoder.encode("Your prompt text here")
print(f"Token count: {len(tokens)}")
Install: pip install tiktoken. Runs locally, no API call needed. Full docs on GitHub.
Claude — token counting endpoint:
// No actual generation — just counts
const response = await anthropic.messages.countTokens({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: yourPrompt }],
})
console.log(response.input_tokens)
See Anthropic's token counting docs for tool use + system prompt edge cases.
Quick estimate (English text only):
- Characters ÷ 4 ≈ tokens
- Words ÷ 0.75 ≈ tokens
- Accuracy drops 20–40% for non-Latin scripts (Arabic, Hindi, Chinese)
5 Mistakes Silently Inflating Your Bill
These show up in almost every AI app I've reviewed. Fix them and you'll typically cut costs 40–60%.
1. Sending Full Conversation History Every Turn
Each turn adds more input tokens. By turn 20 in a chat, you're paying for 19 previous exchanges you already paid for. Implement a sliding window — keep last N turns only, or summarize old context.
2. Bloated System Prompts
A 2,000-token system prompt sent with every call = 100M tokens of overhead per day at 50k requests. Cut ruthlessly. Every sentence needs to earn its place.
3. No max_tokens Set
Without a ceiling, the model will be verbose. For classification tasks: 50–100 tokens. For summaries: 200–400 tokens. Always set this.
4. Flagship Model for Everything
Is your email categorization task worth 16x the cost of GPT-4o mini? Route simple tasks to cheaper models. Reserve GPT-4o / Claude Sonnet for tasks that actually need it. Most teams see 60–70% cost reduction from this one change.
5. Not Using Prompt Caching
If you're sending the same large reference document or knowledge base with every request, you're overpaying. Both Anthropic and OpenAI offer prompt caching in 2026. Anthropic's implementation can save up to 90% on cached input tokens.
Real-World Monthly Cost Estimates
| App Type | Setup | Monthly Cost |
|---|---|---|
| Support chatbot (2k conversations/day) | GPT-4o mini, 8 turns avg | ~$18–25 |
| Same chatbot | GPT-4o | ~$280–350 |
| Code review assistant (500 PRs) | Claude Sonnet 4 | ~$23 |
| Doc summarizer (10k docs) | GPT-4o mini | ~$18–22 |
| Content generator (1k articles) | GPT-4o | ~$263 |
Pattern: output-heavy tasks + expensive models = highest cost. Build your own estimate with the LLM API Cost Calculator — plug in your numbers and it gives you the monthly projection instantly.
Monitoring in Production
Cost dashboard alone isn't enough — you find out after the damage. Set up:
- OpenAI: Hard spend limits + soft alert thresholds in account settings
- Anthropic: Usage API for daily spend data + console budget alerts (available 2026)
-
App-level: Log
input_tokensandoutput_tokensfrom every API response into your own DB - Per-user limits: Rate limits or credit systems at application layer — don't let a single user's session spike your bill
Treat LLM API cost like database query cost. You wouldn't ship a query without understanding its performance profile.
Quick Reference
Token estimate: word_count / 0.75 OR char_count / 4
Cost per call: (in_tok/1M × in_rate) + (out_tok/1M × out_rate)
Biggest cost lever: max_tokens ceiling + model routing
Best cheap models: GPT-4o mini ($0.60/1M out), Gemini Flash ($0.30/1M out)
Free calculator: webtoolshub.online/tools/llm-api-cost-calculator
The math isn't complicated once you know where to look. The $340 chatbot bill wasn't a pricing problem — it was a context management problem. Now you know what to check before you deploy.
What's your biggest AI API cost optimization? Drop it in the comments — always curious what's working for people in production.

Top comments (0)