Muhammad Awais

Posted on Jun 16 • Originally published at webtoolshub.online

How to Calculate LLM API Cost: Tokens, Pricing & the Formula Every Dev Needs (2026)

#ai #llm #javascript #webdev

A friend of mine deployed a customer support chatbot using GPT-4o. Three days later: $340 in OpenAI charges. He had no idea where it came from. He thought a few thousand API calls would cost maybe $10.

That's the LLM API cost trap — and it gets almost every developer the first time, because nobody actually teaches you the math before you ship.

This article fixes that. We'll cover:

What tokens actually are (the explanation that's actually useful)
Why input and output tokens are priced differently — and why it matters
A side-by-side pricing table for GPT-4o, Claude, Gemini, and others (mid-2026)
The exact formula to estimate your monthly bill before you deploy
5 silent cost killers in production AI apps

🧮 If you just want the number fast: LLM API Cost Calculator — free, no signup, runs in your browser.

What Is a Token? (The Version That Actually Helps)

Every article says "a token is ~4 characters or 0.75 words." Technically true. Practically useless.

Here's what you actually need to know: a token is the smallest chunk of text an LLM processes. The tokenizer splits your text using a vocabulary of ~100,000 patterns. Common words are usually 1 token. Rare or long words split into multiple tokens.

Real examples:

"Hello"              → 1 token
"Hello world"        → 2 tokens
"internationalization" → 4 tokens
{"name": "Muhammad"} → ~7 tokens
A 500-word article   → ~650–700 tokens

Why does this matter for cost? Every API call charges you for:

Every token you send (prompt + conversation history + system prompt)
Every token the model generates back

That friend with the $340 bill? He was passing the full 20-message conversation history on every single turn. By message 20, each API call was using 4,000+ tokens in context before the model even started replying.

Input Tokens vs Output Tokens — The Pricing Split

This is the distinction most developers miss and it costs them the most money.

Providers split pricing into:

Input tokens — everything you send to the model
Output tokens — everything the model generates back

Output tokens are almost always 3–5x more expensive than input tokens. Because generating a token requires the model to run a full forward pass for every single character it produces (autoregressive generation). Reading input is one pass. Writing output is N passes.

Practical impact:

Prompt Style	Input Tokens	Output Tokens	Cost Ratio
"Summarize in 3 bullets"	850	120	Low output cost
"Write a detailed analysis"	850	600	5x more output cost

Same input. Radically different bill. At 10,000 calls/month, that's hundreds of dollars difference from one word in your prompt.

The lever most developers ignore: control output length with max_tokens, not just prompt length.

LLM API Pricing Table — Mid-2026

⚠️ Pricing changes frequently. Always verify at openai.com/api/pricing and anthropic.com/pricing.

Model	Input / 1M tokens	Output / 1M tokens	Context
GPT-4o	$2.50	$10.00	128K
GPT-4o mini	$0.15	$0.60	128K
Claude Sonnet 4	$3.00	$15.00	200K
Claude Haiku 4.5	$0.80	$4.00	200K
Gemini 1.5 Pro	$1.25	$5.00	1M+
Gemini 1.5 Flash	$0.075	$0.30	1M+
Llama 3.3 70B (Groq)	$0.59	$0.79	128K

GPT-4o mini output tokens are 16x cheaper than GPT-4o. For classification, routing, or simple Q&A — this is the switch that changes unit economics entirely.

The Formula

Cost per call =
  (input_tokens  / 1_000_000 × input_rate) +
  (output_tokens / 1_000_000 × output_rate)

Worked example — Document summarizer on Claude Sonnet 4:

Document:      3,000 tokens  (input)
System prompt:   200 tokens  (input)
Summary output:  400 tokens  (output)

Cost per call:
  Input:  (3,200 / 1,000,000) × $3.00  = $0.0096
  Output: (400   / 1,000,000) × $15.00 = $0.0060
  Total:                                  $0.0156

Monthly (5,000 summaries): $78

Now add conversation history — context grows to 8,000 tokens per call → $195/month. Switch to flagship model → $600+. The math compounds fast.

How to Count Tokens Before You Send

Don't guess — count. Here's how for each provider:

OpenAI — tiktoken:

import tiktoken

encoder = tiktoken.encoding_for_model("gpt-4o")
tokens = encoder.encode("Your prompt text here")
print(f"Token count: {len(tokens)}")

Install: pip install tiktoken. Runs locally, no API call needed. Full docs on GitHub.

Claude — token counting endpoint:

// No actual generation — just counts
const response = await anthropic.messages.countTokens({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: yourPrompt }],
})
console.log(response.input_tokens)

See Anthropic's token counting docs for tool use + system prompt edge cases.

Quick estimate (English text only):

Characters ÷ 4 ≈ tokens
Words ÷ 0.75 ≈ tokens
Accuracy drops 20–40% for non-Latin scripts (Arabic, Hindi, Chinese)

5 Mistakes Silently Inflating Your Bill

These show up in almost every AI app I've reviewed. Fix them and you'll typically cut costs 40–60%.

1. Sending Full Conversation History Every Turn

Each turn adds more input tokens. By turn 20 in a chat, you're paying for 19 previous exchanges you already paid for. Implement a sliding window — keep last N turns only, or summarize old context.

2. Bloated System Prompts

A 2,000-token system prompt sent with every call = 100M tokens of overhead per day at 50k requests. Cut ruthlessly. Every sentence needs to earn its place.

3. No `max_tokens` Set

Without a ceiling, the model will be verbose. For classification tasks: 50–100 tokens. For summaries: 200–400 tokens. Always set this.

4. Flagship Model for Everything

Is your email categorization task worth 16x the cost of GPT-4o mini? Route simple tasks to cheaper models. Reserve GPT-4o / Claude Sonnet for tasks that actually need it. Most teams see 60–70% cost reduction from this one change.

5. Not Using Prompt Caching

If you're sending the same large reference document or knowledge base with every request, you're overpaying. Both Anthropic and OpenAI offer prompt caching in 2026. Anthropic's implementation can save up to 90% on cached input tokens.

Real-World Monthly Cost Estimates

App Type	Setup	Monthly Cost
Support chatbot (2k conversations/day)	GPT-4o mini, 8 turns avg	~$18–25
Same chatbot	GPT-4o	~$280–350
Code review assistant (500 PRs)	Claude Sonnet 4	~$23
Doc summarizer (10k docs)	GPT-4o mini	~$18–22
Content generator (1k articles)	GPT-4o	~$263

Pattern: output-heavy tasks + expensive models = highest cost. Build your own estimate with the LLM API Cost Calculator — plug in your numbers and it gives you the monthly projection instantly.

Monitoring in Production

Cost dashboard alone isn't enough — you find out after the damage. Set up:

OpenAI: Hard spend limits + soft alert thresholds in account settings
Anthropic: Usage API for daily spend data + console budget alerts (available 2026)
App-level: Log input_tokens and output_tokens from every API response into your own DB
Per-user limits: Rate limits or credit systems at application layer — don't let a single user's session spike your bill

Treat LLM API cost like database query cost. You wouldn't ship a query without understanding its performance profile.

Quick Reference

Token estimate:     word_count / 0.75  OR  char_count / 4
Cost per call:      (in_tok/1M × in_rate) + (out_tok/1M × out_rate)
Biggest cost lever: max_tokens ceiling + model routing
Best cheap models:  GPT-4o mini ($0.60/1M out), Gemini Flash ($0.30/1M out)
Free calculator:    webtoolshub.online/tools/llm-api-cost-calculator

The math isn't complicated once you know where to look. The $340 chatbot bill wasn't a pricing problem — it was a context management problem. Now you know what to check before you deploy.

What's your biggest AI API cost optimization? Drop it in the comments — always curious what's working for people in production.

DEV Community

How to Calculate LLM API Cost: Tokens, Pricing & the Formula Every Dev Needs (2026)

What Is a Token? (The Version That Actually Helps)

Input Tokens vs Output Tokens — The Pricing Split

LLM API Pricing Table — Mid-2026

The Formula

How to Count Tokens Before You Send

5 Mistakes Silently Inflating Your Bill

1. Sending Full Conversation History Every Turn

2. Bloated System Prompts

3. No `max_tokens` Set

4. Flagship Model for Everything

5. Not Using Prompt Caching

Real-World Monthly Cost Estimates

Monitoring in Production

Quick Reference

Top comments (0)

What Is a Token? (The Version That Actually Helps)

Input Tokens vs Output Tokens — The Pricing Split

LLM API Pricing Table — Mid-2026

The Formula

How to Count Tokens Before You Send

5 Mistakes Silently Inflating Your Bill

1. Sending Full Conversation History Every Turn

2. Bloated System Prompts

3. No max_tokens Set

4. Flagship Model for Everything

5. Not Using Prompt Caching

Real-World Monthly Cost Estimates

Monitoring in Production

Quick Reference

3. No `max_tokens` Set