Prabhakar

Posted on Jan 24

How Tokenization, Embeddings & Attention Work in LLMs (Part 2)

#ai #llm #programming #mobile

In Part 1, we learned what an LLM is and how it generates text.
Now let’s go deeper into how models like ChatGPT actually process language internally.

This article covers:

What a token really is
How tokenization works
Encoding & decoding with Python
Vector embeddings
Positional encoding
Self-attention & multi-head attention

1. What Is a Token?

A token is a piece of text converted into a number that the model understands.

Example:

A → 1  
B → 2  
C → 3

So if you type:
B D E → it becomes → 2 4 5

LLMs don’t understand words.
They understand numbers.

This process of converting text → numbers is called tokenization.

2. What Is Tokenization?

Tokenization means:

Converting user input into a sequence of numbers that the model can process.

Workflow:

Text → Tokens → Model → Tokens → Text

Example:

Input:

"Hey there, my name is Piyush"

Internally becomes:

[20264, 1428, 225216, 3274, ...]

These numbers go into the transformer, which predicts the next token again and again.

👉Note: Every model has its own mechanism for generating tokens.

3. Encoding & Decoding Tokens in Python

Using the tiktoken library:

import tiktoken

encoder = tiktoken.encoding_for_model("gpt-4o")

text = "Hey there, my name is Prabhas Kumar"
tokens = encoder.encode(text)

print(tokens)

decoded = encoder.decode(tokens)
print(decoded)

What happens:

encode() → converts text → tokens
decode() → converts tokens → readable text

This is exactly how ChatGPT works internally.

4. Vector Embeddings – Giving Words Meaning

Tokens alone are just numbers.
Embeddings give them meaning.

An embedding is a vector (list of numbers) that represents the semantic meaning of a word.

Example idea:

Dog and Cat → close together
Paris and India → close together
Eiffel Tower and India Gate → close together

Words with similar meaning are placed near each other in vector space.

That’s how LLMs understand relationships like:

Paris → Eiffel Tower  
India → Taj Mahal

This is called semantic similarity.

5. Positional Encoding – Order Matters

Consider:

"Dog ate cat"
"Cat ate dog"

Same words.
Different meaning.

Embeddings alone don’t know position.
So the model adds positional encoding.

Positional encoding tells the model:

This word is first
This word is second
This word is third

So the model understands order and structure.

6. Self-Attention – Words Talking to Each Other

Self-attention lets tokens influence each other.

Example:

"river bank"
"ICICI bank"

Same word: bank
Different meaning.

Self-attention allows:

"river" → changes meaning of "bank"
"ICICI" → changes meaning of "bank"

So context decides meaning.

7. Multi-Head Attention – Looking at Many Angles

Multi-head attention means the model looks at:

Meaning
Position
Context
Relationship

At the same time.

Like a human observing many things at once.

This gives the model a deep understanding of the sentence.

8. Final Flow of an LLM

User enters text

Tokenization → numbers

Embeddings → meaning

Positional encoding → order

Self + Multi-head attention → context

Linear + Softmax → probability of next token

Decode → readable output

Final Thoughts

LLMs don’t know language.
They predict tokens based on probability and patterns.

Yet the result feels intelligent because:

Tokens carry meaning (embeddings)
Order is preserved (positional encoding)
Context is understood (attention)

And that’s the magic behind ChatGPT.

DEV Community

How Tokenization, Embeddings & Attention Work in LLMs (Part 2)

1. What Is a Token?

2. What Is Tokenization?

3. Encoding & Decoding Tokens in Python

4. Vector Embeddings – Giving Words Meaning

5. Positional Encoding – Order Matters

6. Self-Attention – Words Talking to Each Other

7. Multi-Head Attention – Looking at Many Angles

8. Final Flow of an LLM

Final Thoughts

Top comments (1)