DEV Community

Devanshu Biswas
Devanshu Biswas

Posted on

Tokens: Why ChatGPT Can't Count the R's in 'Strawberry'

You see words. A language model sees tokens — chunks of text, usually a few characters each. Everything starts here. Day 2 of my AIFromZero series.

Text gets shattered into tokens

"unbelievable" → ["un", "bel", "iev", "able"]   (4 tokens, not 1 word, not 12 letters)
Enter fullscreen mode Exit fullscreen mode

Before any "thinking", your text is chopped into tokens and each becomes a number the model processes.

Why not words or letters?

  • Letters: too fine — the model would relearn spelling everywhere.
  • Whole words: too many — millions, plus every typo and name.
  • Subword tokens: the sweet spot. Common words = 1 token; rare words split into reusable pieces. A fixed ~100k-token vocabulary covers any text.

The ~4-chars rule (and why it costs you)

In English, ~4 characters ≈ 1 token, or ~0.75 tokens per word. This is how everything is priced and limited:

  • API bills are per token (prompt + reply).
  • A "context window" (how much it can read at once) is measured in tokens — 1,000 tokens ≈ 750 words.

Verbose prompts and long chat history burn tokens. Concise prompting is a real cost lever.

The strawberry problem

The model never sees s-t-r-a-w-b-e-r-r-y. It sees a token like straw + berry. The individual letters are buried inside tokens, so counting characters is genuinely hard for it. It's not dumb — it just doesn't read letters.

Tokens are step 1 of everything

Tokenize → turn each token into a vector (embeddings, next) → run through the transformer → predict the next token. Every LLM starts exactly here.

🔤 Type anything and watch it tokenize live: https://dev48v.infy.uk/ai/days/day2-tokens.html

Day 2 of AIFromZero.

Top comments (0)