I Updated My LLM Token Counter to Include Every Major 2026 Model (with Real Prices)

#webdev #ai #showdev #tools

Most token counter tools I find online are still listing GPT-4 Turbo and Gemini 1.5 Pro. Meanwhile I'm pricing DeepSeek V3, Claude 3.7, and o3 calls and getting wildly wrong estimates.

So I updated mine. Here's what's in it now and why I built it this way.

What changed

The tool now covers 20 models across 7 providers:

Provider	Models
OpenAI	GPT-4o, GPT-4o mini, GPT-4.1, GPT-4.1 mini, o3, o3-mini, o4-mini
Anthropic	Claude 3.7 Sonnet, Claude 3.5 Sonnet, Claude 3.5 Haiku
Google	Gemini 2.0 Flash, Gemini 2.0 Flash Lite, Gemini 2.5 Pro
DeepSeek	DeepSeek V3, DeepSeek R1
Meta	Llama 3.3 70B, Llama 3.1 405B
Mistral	Mistral Large, Mistral Small
Alibaba	Qwen 2.5 72B

Prices are from official docs as of March 2026.

The "output multiplier" problem

Every token counter I've seen hardcodes output = 2× input for cost estimation.

That's reasonable for chatbots. It's completely wrong for:

Summarization: output is usually 0.2–0.5× input
Code generation: can be 3–5× input
RAG answers: typically 0.3× input
Chain-of-thought reasoning (o3, R1): can be 10× input

So I added a slider: output multiplier 0.5× to 10×, default 2×. Drag it to match your actual use case before looking at cost estimates.

Why browser-only matters more than I expected

I originally built this as browser-only because I didn't want to run a backend. Turns out that's actually a feature people care about.

A few people have told me they paste actual production prompts into token counters — prompts that contain API keys, customer data, or internal system prompts. If the counter is server-side, that's a data leak waiting to happen. No-backend isn't just a technical choice, it's a trust signal.

The tradeoff is accuracy. Real tokenization requires the actual tokenizer library (tiktoken, sentencepiece, etc.). I'm using a BPE approximation that's ±5-10% accurate for English text. For CJK and code it can drift more.

I'm transparent about this in the UI: the token visualization is labeled "approximate, BPE-style" and there's a note about verifying with the SDK for production.

The provider filter

With 20 models, the grid was getting unmanageable. I added a filter bar: All / OpenAI / Anthropic / Google / DeepSeek / Meta / Mistral / Alibaba.

Selecting a provider filters both the model cards and the context window bars (but the cost comparison table still shows all models — you want to compare across providers when deciding which model to use).

Interesting price observations

A few things that surprised me when updating prices:

DeepSeek V3 at $0.27/1M input is cheaper than GPT-4o mini ($0.15/1M input) if you're doing longer context work. But for tiny calls, GPT-4o mini wins. Context length × volume matters more than headline price.

Gemini 2.0 Flash Lite ($0.075/1M) is the cheapest capable model on the list. If you're building high-volume pipelines where quality doesn't need to be frontier-level, it's hard to argue against.

o3 at $10/1M input + $40/1M output is genuinely expensive. A 50K token chain-of-thought response costs ~$2. For agentic loops that think before acting, the output multiplier slider matters a lot — set it to 5-8× for reasoning models.