In Part 1, we learned what an LLM is and how it generates text.
Now let’s go deeper into how models like ChatGPT actually process language internally.
This article covers:
- What a token really is
- How tokenization works
- Encoding & decoding with Python
- Vector embeddings
- Positional encoding
- Self-attention & multi-head attention
1. What Is a Token?
A token is a piece of text converted into a number that the model understands.
Example:
A → 1
B → 2
C → 3
So if you type:
B D E → it becomes → 2 4 5
LLMs don’t understand words.
They understand numbers.
This process of converting text → numbers is called tokenization.
2. What Is Tokenization?
Tokenization means:
Converting user input into a sequence of numbers that the model can process.
Workflow:
Text → Tokens → Model → Tokens → Text
Example:
Input:
"Hey there, my name is Piyush"
Internally becomes:
[20264, 1428, 225216, 3274, ...]
These numbers go into the transformer, which predicts the next token again and again.
👉Note: Every model has its own mechanism for generating tokens.
3. Encoding & Decoding Tokens in Python
Using the tiktoken library:
import tiktoken
encoder = tiktoken.encoding_for_model("gpt-4o")
text = "Hey there, my name is Prabhas Kumar"
tokens = encoder.encode(text)
print(tokens)
decoded = encoder.decode(tokens)
print(decoded)
What happens:
- encode() → converts text → tokens
- decode() → converts tokens → readable text
This is exactly how ChatGPT works internally.
4. Vector Embeddings – Giving Words Meaning
Tokens alone are just numbers.
Embeddings give them meaning.
An embedding is a vector (list of numbers) that represents the semantic meaning of a word.
Example idea:
- Dog and Cat → close together
- Paris and India → close together
- Eiffel Tower and India Gate → close together
Words with similar meaning are placed near each other in vector space.
That’s how LLMs understand relationships like:
Paris → Eiffel Tower
India → Taj Mahal
This is called semantic similarity.
5. Positional Encoding – Order Matters
Consider:
- "Dog ate cat"
- "Cat ate dog"
Same words.
Different meaning.
Embeddings alone don’t know position.
So the model adds positional encoding.
Positional encoding tells the model:
- This word is first
- This word is second
- This word is third
So the model understands order and structure.
6. Self-Attention – Words Talking to Each Other
Self-attention lets tokens influence each other.
Example:
- "river bank"
- "ICICI bank"
Same word: bank
Different meaning.
Self-attention allows:
- "river" → changes meaning of "bank"
- "ICICI" → changes meaning of "bank"
So context decides meaning.
7. Multi-Head Attention – Looking at Many Angles
Multi-head attention means the model looks at:
- Meaning
- Position
- Context
- Relationship
At the same time.
Like a human observing many things at once.
This gives the model a deep understanding of the sentence.
8. Final Flow of an LLM
User enters text
Tokenization → numbers
Embeddings → meaning
Positional encoding → order
Self + Multi-head attention → context
Linear + Softmax → probability of next token
Decode → readable output
Final Thoughts
LLMs don’t know language.
They predict tokens based on probability and patterns.
Yet the result feels intelligent because:
- Tokens carry meaning (embeddings)
- Order is preserved (positional encoding)
- Context is understood (attention)
And that’s the magic behind ChatGPT.
Top comments (1)
Brilliant explanation of this complex process.