I Built a Token Counter That Works Offline — 19 Models, File Drop, Cost Estimator

#ai #showdev #webdev #llm

title: "I Built a Token Counter That Works Offline — 19 Models, File Drop, Cost Estimator"
published: true
description: "A free, 100% client-side token counter for GPT-4.1, Claude 3.7, Gemini 2.5 Pro, DeepSeek R1, and 15 more models. No API key. No server. Drop a file and see your token count instantly."
tags: showdev, webdev, ai, llm

canonical_url: https://citriac.github.io/token-counter.html

Every time I start writing a prompt, I hit the same friction: how many tokens is this?

I'm building LLM-powered stuff on a 2014 MacBook (yes, really), and I don't want to:

Send my prompt text to some third-party server
Open the OpenAI playground just to count tokens
Write Python every time I want a quick estimate

So I built Token Counter 2026 — a fully offline, zero-dependency token counter that runs in your browser.

What it does

19 models with real pricing (March 2026):

OpenAI: GPT-4o, GPT-4.1, GPT-4.1 mini, o3, o3-mini, o4-mini
Anthropic: Claude 3.7 Sonnet, Claude 3.5 Sonnet, Claude 3.5 Haiku
Google: Gemini 2.5 Pro, Gemini 2.0 Flash, Gemini 2.0 Flash Lite
DeepSeek: V3, R1
Meta: Llama 3.3 70B, Llama 3.1 405B
Mistral: Large, Small
Alibaba: Qwen 2.5 72B

Features:

📊 Token count, character count, words, chars-per-token ratio
💰 Cost estimator with adjustable output multiplier (0.5× to 10×)
📐 Context window usage bars — see instantly if you're approaching the limit
🎨 Token visualization — colored chips showing how your text gets tokenized
📂 Drop any file (.txt, .md, .json, .py, .js, etc.) to count immediately
📋 Paste from clipboard with one click
🔒 100% local — nothing sent anywhere, works offline

Why "approximate" tokenization?

The honest answer: exact tokenization requires running tiktoken (OpenAI) or sentencepiece in the browser, which means WASM bundles and complexity. For most use cases — estimating whether your prompt fits a context window, comparing model costs — a ±5% approximation is plenty.

The tool uses a BPE-style heuristic that handles:

English text (accurate to within ~5%)
CJK characters (counted separately, as they typically tokenize as 1-2 tokens each)
Code (handles whitespace and symbol splitting)
JSON and structured data

For production use, always verify with the official SDK. For planning and estimation, this is fast enough.

Real use case: comparing prompt costs

I was planning a content pipeline that calls GPT-4o-mini on every article. With 2,000-word articles:

System prompt: ~300 tokens
Article text:  ~2,500 tokens
Total input:   ~2,800 tokens
Output:        ~500 tokens (summary + tags)

At GPT-4o-mini pricing ($0.15/1M input, $0.60/1M output):

Input: $0.00042
Output: $0.00030
Total: $0.00072 per article

For 1,000 articles/month: $0.72. Worth it, no need to optimize.

The tool makes this comparison across all 19 models instantly — I can see that Gemini 2.0 Flash Lite would cost 1/8th as much, while DeepSeek V3 is roughly 10× cheaper than GPT-4o.

The context window bars are underrated

My favorite feature: the context usage bars for all filtered models update in real time as you type. When I'm building an agent with a long system prompt, I can immediately see:

"This prompt uses 8.3% of GPT-4.1's 1M token window, but 42% of Mistral Small's 32K window"

That kind of cross-model awareness is genuinely useful when choosing between providers.

Under the hood

Pure HTML/CSS/JS. No build step, no framework, no npm. The whole thing is one 900-line HTML file.

The tokenizer is a BPE approximation implemented in ~60 lines:

Split on whitespace, punctuation, CJK boundaries
Chunks longer words into ~4-character pieces (mimicking cl100k's typical subword lengths)
Model-specific adjustments: Gemini gets ~10% fewer tokens (more aggressive merging), Llama 3 gets ~5% more

It's not perfect. It's also instant, offline, and doesn't require loading a 5MB WASM binary.