From Words to Intelligence: How LLMs Actually Work (Without the Math Headache)

Govind kumar — Sun, 08 Mar 2026 03:15:14 +0000

Large Language Models often feel magical. You type a sentence, and suddenly an AI writes code, explains physics, or drafts emails.

But under the hood, the system is doing something surprisingly structured.

Let’s walk through the core building blocks of modern AI models in a simple and fun way.

1. Tokenization — Breaking Language into Pieces

Before an AI understands text, it must split the sentence into smaller units called tokens.

Example sentence:

"I love artificial intelligence"

Tokenized form might look like:

["I", "love", "artificial", "intelligence"]

Sometimes tokens are even smaller:

["art", "ificial", "intelli", "gence"]

This depends on the model’s vocabulary size.

Vocabulary Size

This is the number of tokens a model knows.

Example:

Model	Approx Vocabulary
Small models	~30k tokens
Modern LLMs	100k+ tokens

Think of it like a dictionary the AI uses to read text.

2. Vectors — Turning Words into Numbers

Computers don’t understand words.
They understand numbers.

So each token becomes a vector (a list of numbers).

Example:

"cat" → [0.21, -0.34, 0.77, 0.11]
"dog" → [0.19, -0.30, 0.74, 0.10]

Notice something interesting?

The vectors for cat and dog look similar.

That’s because they have similar meaning.

3. Embeddings — Capturing Meaning

These vectors are called embeddings.

Embeddings represent semantic meaning.

Example relationships:

Word	Meaning Relation
King	Male ruler
Queen	Female ruler
Paris	Capital of France

Embeddings capture these relationships mathematically.

Famous example:

King - Man + Woman ≈ Queen

Pretty cool, right?

4. Position Encoding — Remembering Word Order

Words alone are not enough.

Compare:

Dog bites man
Man bites dog

Same words, very different meaning.

Transformers solve this using positional encoding.

Each word receives a position signal:

Dog (position 1)
bites (position 2)
man (position 3)

This allows the model to understand sequence and grammar.

5. Encoder vs Decoder

Most LLM architectures use two types of networks.

Encoder

Reads and understands input.

Example:

Translate English → French

The encoder converts the sentence into meaning vectors.

Decoder

Generates new text one token at a time.

Example:

The capital of France is → Paris

Many modern chat models are primarily decoder-based.

6. Self-Attention — The Secret Sauce

Self-attention allows each word to look at every other word in the sentence.

Example:

"The animal didn't cross the street because it was tired"

The word "it" must figure out what it refers to.

Self-attention helps the model connect:

it → animal

Instead of reading text strictly left-to-right, the model looks at relationships between words.

7. Softmax — Turning Scores into Probabilities

When predicting the next word, the model produces scores.

Example:

Paris: 8.1
London: 3.2
Pizza: -1.4

Softmax converts these scores into probabilities.

Paris → 0.92
London → 0.07
Pizza → 0.01

The model then chooses the most likely token.

8. Multi-Head Attention — Multiple Perspectives

Instead of one attention mechanism, transformers use multiple attention heads.

Think of it like a team of analysts:

One head focuses on grammar
One focuses on subject relationships
One focuses on context

Together they build a deeper understanding of the sentence.

9. Temperature — Controlling Creativity

Temperature controls how adventurous the AI becomes.

Temperature	Behavior
0.1	Very predictable
0.5	Balanced
1.0	Creative
2.0	Chaotic

Example prompt:

The best food in Italy is...

Low temperature → “pizza”
High temperature → “truffle pasta with wild mushrooms”.

10. Knowledge Cutoff — When the Model Stopped Learning

LLMs are trained on huge datasets, but training eventually stops.

That date is called the knowledge cutoff.

If the cutoff is 2024, the model might not know events from 2025 unless external tools provide updated data.

Final Picture: How Everything Works Together

When you type a prompt:

Explain black holes simply

The AI pipeline roughly looks like this:

Tokenization splits the text
Tokens become embeddings
Positional encoding adds order
Self-attention analyzes relationships
Multi-head attention extracts deeper context
The decoder predicts the next token
Softmax selects the most probable word
Temperature controls creativity

Repeat this thousands of times per response.

And that’s how a machine writes paragraphs.

Closing Thought

Large Language Models are not magic.

They are layers of clever mathematics turning language into patterns.

But when these pieces work together — tokens, vectors, attention, and probabilities — something remarkable happens:

Machines start speaking our language.

DEV Community: Govind kumar

From Words to Intelligence: How LLMs Actually Work (Without the Math Headache)

1. Tokenization — Breaking Language into Pieces

Vocabulary Size

2. Vectors — Turning Words into Numbers

3. Embeddings — Capturing Meaning

4. Position Encoding — Remembering Word Order

5. Encoder vs Decoder

Encoder

Decoder

6. Self-Attention — The Secret Sauce

7. Softmax — Turning Scores into Probabilities

8. Multi-Head Attention — Multiple Perspectives

9. Temperature — Controlling Creativity

10. Knowledge Cutoff — When the Model Stopped Learning

Final Picture: How Everything Works Together

Closing Thought