Tokens: Why ChatGPT Can't Count the R's in 'Strawberry'

#llm #ai #machinelearning #beginners

You see words. A language model sees tokens — chunks of text, usually a few characters each. Everything starts here. Day 2 of my AIFromZero series.

Text gets shattered into tokens

"unbelievable" → ["un", "bel", "iev", "able"]   (4 tokens, not 1 word, not 12 letters)

Before any "thinking", your text is chopped into tokens and each becomes a number the model processes.

Why not words or letters?

Letters: too fine — the model would relearn spelling everywhere.
Whole words: too many — millions, plus every typo and name.
Subword tokens: the sweet spot. Common words = 1 token; rare words split into reusable pieces. A fixed ~100k-token vocabulary covers any text.

The ~4-chars rule (and why it costs you)

In English, ~4 characters ≈ 1 token, or ~0.75 tokens per word. This is how everything is priced and limited:

API bills are per token (prompt + reply).
A "context window" (how much it can read at once) is measured in tokens — 1,000 tokens ≈ 750 words.

Verbose prompts and long chat history burn tokens. Concise prompting is a real cost lever.

The strawberry problem

The model never sees s-t-r-a-w-b-e-r-r-y. It sees a token like straw + berry. The individual letters are buried inside tokens, so counting characters is genuinely hard for it. It's not dumb — it just doesn't read letters.