Stop guessing your AI API bill: a quick guide to token cost math

#ai #api #tutorial #webdev

You can ship an LLM feature in an afternoon. Figuring out what it costs to run usually happens later, when the invoice shows up and someone asks why. A few minutes of token math up front avoids most of that.

Here is how the pricing works and how to estimate it.

Tokens, not words

Providers bill per token, not per word or per request. A token is about 4 characters of English, so "Hello world" is roughly 3 tokens and 750 words lands near 1,000 tokens. Input and output are billed separately, and output is almost always the pricier side.

GPT-4o is $2.50 per million input tokens and $10.00 per million output tokens. That 4x gap is the part people underestimate once responses get long.

The formula

Per request, the cost is:

cost = (input_tokens / 1M * input_price) + (output_tokens / 1M * output_price)

Multiply by monthly volume and you have the bill.

Take a support bot: 800 input tokens (system prompt plus the user message) and 400 output tokens per reply, 50,000 requests a month, on GPT-4o.

Input: 800 x 50,000 = 40M tokens, so $100
Output: 400 x 50,000 = 20M tokens, so $200
Total: $300/month

Run the same workload on GPT-4.1 Mini and the number drops by roughly 10x. That one comparison is often what decides the model.

Where it goes wrong

Three things bite people repeatedly:

The system prompt counts every time. A 600-token system prompt isn't a one-time cost. You pay for it on every single request. Trim it.
Output is the expensive half. Setting max_tokens sensibly is the cheapest optimization there is.
Words lie. Code, JSON, and non-English text tokenize very differently from prose. Count real tokens, don't eyeball word counts.

Tools that do the math

I got tired of redoing this per model, so I've been using Vortenza's free AI calculators. The OpenAI API Cost Calculator lets you pick a model and drop in your tokens and monthly volume. There's a Claude API Cost Calculator for Anthropic models, and an AI Token Counter for when you want the actual token count of an input instead of a guess. No signup, runs in the browser.

The calculator isn't really the point, though. The point is doing the estimate while you're still designing the feature. Cost is a design constraint, same as latency. Treat it like one and the invoice stops being a surprise.

Top comments (1)

LEI GUO • May 24

Great breakdown! The token math is something a lot of devs overlook when choosing an AI provider. I've found that understanding your actual usage pattern (burst vs steady) makes a huge difference in picking the right pricing plan. What model do you usually work with?