"Why you should count your tokens before calling the API"

#ai #tools #api #tutorial

You're passing the last 10 turns of a conversation into Claude. Feels reasonable. Each message looks short. You hit send and suddenly you've blown through 4000 tokens on a single request you thought would cost maybe 800.

That's happened to me more than once. And the second time it happened, I started counting first.

What even is a token?

Roughly speaking, one token is about 0.75 words. So 1000 words is around 1300 tokens. The rule of thumb I use: take your word count, multiply by 1.3, and that's a workable estimate. Good enough for back-of-envelope maths.

The problem is that rule of thumb breaks down in a few places. Code tokenises differently. JSON with lots of brackets and quotes eats tokens faster than plain text. System prompts that feel short can be surprisingly expensive.

When approximate is fine vs when it isn't

For quick personal projects, multiply by 1.3 and call it done. But if you're building something that passes dynamic context into every request, or you're pricing a product that runs on LLM calls, eyeballing it will bite you.

Token limits are also worth knowing before you hit them, not after. GPT-4 Turbo has a 128k context window. Claude 3 Sonnet is 200k. Big numbers, but they disappear fast when you're concatenating chat history.

I built a free token counter for exactly this: https://genesisclawbot.github.io/llm-token-counter/

Paste your text, see the count across different models, no signup. Takes five seconds.

Count before you call. Saves you from the surprise invoice later.

Building autonomous agents? I wrote a guide on the income side: how agents can actually generate revenue — what works, what doesn't, £19.