DEV Community

Cover image for How Tokenization, Embeddings & Attention Work in LLMs (Part 2)
Prabhakar
Prabhakar

Posted on

How Tokenization, Embeddings & Attention Work in LLMs (Part 2)

In Part 1, we learned what an LLM is and how it generates text.
Now let’s go deeper into how models like ChatGPT actually process language internally.

This article covers:

  • What a token really is
  • How tokenization works
  • Encoding & decoding with Python
  • Vector embeddings
  • Positional encoding
  • Self-attention & multi-head attention

1. What Is a Token?

A token is a piece of text converted into a number that the model understands.

Example:

A → 1  
B → 2  
C → 3 
Enter fullscreen mode Exit fullscreen mode

So if you type:
B D E → it becomes → 2 4 5

LLMs don’t understand words.
They understand numbers.

This process of converting text → numbers is called tokenization.

2. What Is Tokenization?

Tokenization means:

Converting user input into a sequence of numbers that the model can process.

Workflow:

Text → Tokens → Model → Tokens → Text
Enter fullscreen mode Exit fullscreen mode

Example:

Input:

"Hey there, my name is Piyush"
Enter fullscreen mode Exit fullscreen mode

Internally becomes:

[20264, 1428, 225216, 3274, ...]
Enter fullscreen mode Exit fullscreen mode

These numbers go into the transformer, which predicts the next token again and again.

👉Note: Every model has its own mechanism for generating tokens.

3. Encoding & Decoding Tokens in Python

Using the tiktoken library:

import tiktoken

encoder = tiktoken.encoding_for_model("gpt-4o")

text = "Hey there, my name is Prabhas Kumar"
tokens = encoder.encode(text)

print(tokens)

decoded = encoder.decode(tokens)
print(decoded)
Enter fullscreen mode Exit fullscreen mode

What happens:

  • encode() → converts text → tokens
  • decode() → converts tokens → readable text

This is exactly how ChatGPT works internally.

4. Vector Embeddings – Giving Words Meaning

Tokens alone are just numbers.
Embeddings give them meaning.

An embedding is a vector (list of numbers) that represents the semantic meaning of a word.

Example idea:

  • Dog and Cat → close together
  • Paris and India → close together
  • Eiffel Tower and India Gate → close together

Words with similar meaning are placed near each other in vector space.

That’s how LLMs understand relationships like:

Paris → Eiffel Tower  
India → Taj Mahal
Enter fullscreen mode Exit fullscreen mode

This is called semantic similarity.

5. Positional Encoding – Order Matters

Consider:

  • "Dog ate cat"
  • "Cat ate dog"

Same words.
Different meaning.

Embeddings alone don’t know position.
So the model adds positional encoding.

Positional encoding tells the model:

  • This word is first
  • This word is second
  • This word is third

So the model understands order and structure.

6. Self-Attention – Words Talking to Each Other

Self-attention lets tokens influence each other.

Example:

  • "river bank"
  • "ICICI bank"

Same word: bank
Different meaning.

Self-attention allows:

  • "river" → changes meaning of "bank"
  • "ICICI" → changes meaning of "bank"

So context decides meaning.

7. Multi-Head Attention – Looking at Many Angles

Multi-head attention means the model looks at:

  • Meaning
  • Position
  • Context
  • Relationship

At the same time.

Like a human observing many things at once.

This gives the model a deep understanding of the sentence.

8. Final Flow of an LLM

User enters text

Tokenization → numbers

Embeddings → meaning

Positional encoding → order

Self + Multi-head attention → context

Linear + Softmax → probability of next token

Decode → readable output

Final Thoughts

LLMs don’t know language.
They predict tokens based on probability and patterns.

Yet the result feels intelligent because:

  • Tokens carry meaning (embeddings)
  • Order is preserved (positional encoding)
  • Context is understood (attention)

And that’s the magic behind ChatGPT.

Top comments (1)

Collapse
 
ben-santora profile image
Ben Santora

Brilliant explanation of this complex process.