Introduction
Although we interact with LLMs using natural language, these models never processes raw text directly. Before a prompt reaches the model, it is converted into a sequence of tokens, the fundamental units that the model understands.
Tokenization is one of the earliest stages of the inference pipeline and influences everything from context windows and API pricing to latency and memory usage.
What Is a Token?
A token is the smallest unit of text processed by a language model, it is not necessarily a word. Depending on the tokenizer, a token may represent:
- an entire word
- part of a word
- punctuation
- whitespace
- numbers
- symbols
- emojis
Different models use different tokenizers, so the same text may be split differently depending on the model.
Why Tokens?
Simply because language models operate on numbers, not text. Before the transformer can perform any computation, the input must be converted into a numerical representation.
The preprocessing pipeline looks like this:
Raw Text
│
▼
Tokenizer
│
▼
Tokens
│
▼
Token IDs
│
▼
Embedding Layer
│
▼
Embedding Vectors
│
▼
Transformer
The tokenizer splits the input into tokens and each token is then mapped to a unique integer called a token ID, which are passed through the model's embedding layer, which converts them into dense vectors that become the actual input to the transformer.
A Real Example
Instead of using hypothetical examples, let's look at how OpenAI's tokenizer processes text.
Input:
I have no enemies.
OpenAI tokenizes it to:
["I", " have", " no", " enemies", "."]
with the following token IDs:
[40, 679, 860, 33974, 13]
that have been generated by OpenAI Tokenizer for the "GPT-5.x & O1/3" models.
The transformer never sees the original sentence, it only receives the corresponding sequence of token IDs.
Token IDs
After tokenization, every token is replaced with an integer.
Conceptually:
" have" → 679
" no" → 860
" enemies" → 33974
...
The exact numbers differ between models because each tokenizer has its own vocabulary, as these integers are not meaningful by themselves, they simply act as indices into the model's embedding table.
From Tokens to Predictions
Once converted into embeddings, the transformer begins inference.
At each generation step, the model predicts the probability distribution of the next token.
The predicted token is appended to the existing sequence, and the process repeats until a stopping condition is reached.
Prompt
│
▼
Tokenizer
│
▼
Token IDs
│
▼
Embeddings
│
▼
Transformer
│
▼
Predict Next Token
│
▼
Append Token
│
└──────────────┐
▼
Repeat
This autoregressive loop is how every modern decoder-only LLM generates text.
Try It Yourself
OpenAI Tokenizer: https://platform.openai.com/tokenizer
Top comments (0)