DEV Community

Govind kumar
Govind kumar

Posted on

From Words to Intelligence: How LLMs Actually Work (Without the Math Headache)

Large Language Models often feel magical. You type a sentence, and suddenly an AI writes code, explains physics, or drafts emails.

But under the hood, the system is doing something surprisingly structured.

Let’s walk through the core building blocks of modern AI models in a simple and fun way.

1. Tokenization — Breaking Language into Pieces

Before an AI understands text, it must split the sentence into smaller units called tokens.

Example sentence:

"I love artificial intelligence"
Enter fullscreen mode Exit fullscreen mode

Tokenized form might look like:

["I", "love", "artificial", "intelligence"]
Enter fullscreen mode Exit fullscreen mode

Sometimes tokens are even smaller:

["art", "ificial", "intelli", "gence"]
Enter fullscreen mode Exit fullscreen mode

This depends on the model’s vocabulary size.

Vocabulary Size

This is the number of tokens a model knows.

Example:

Model Approx Vocabulary
Small models ~30k tokens
Modern LLMs 100k+ tokens

Think of it like a dictionary the AI uses to read text.


2. Vectors — Turning Words into Numbers

Computers don’t understand words.
They understand numbers.

So each token becomes a vector (a list of numbers).

Example:

"cat" → [0.21, -0.34, 0.77, 0.11]
"dog" → [0.19, -0.30, 0.74, 0.10]
Enter fullscreen mode Exit fullscreen mode

Notice something interesting?

The vectors for cat and dog look similar.

That’s because they have similar meaning.


3. Embeddings — Capturing Meaning

These vectors are called embeddings.

Embeddings represent semantic meaning.

Example relationships:

Word Meaning Relation
King Male ruler
Queen Female ruler
Paris Capital of France

Embeddings capture these relationships mathematically.

Famous example:

King - Man + Woman ≈ Queen
Enter fullscreen mode Exit fullscreen mode

Pretty cool, right?


4. Position Encoding — Remembering Word Order

Words alone are not enough.

Compare:

Dog bites man
Man bites dog
Enter fullscreen mode Exit fullscreen mode

Same words, very different meaning.

Transformers solve this using positional encoding.

Each word receives a position signal:

Dog (position 1)
bites (position 2)
man (position 3)
Enter fullscreen mode Exit fullscreen mode

This allows the model to understand sequence and grammar.


5. Encoder vs Decoder

Most LLM architectures use two types of networks.

Encoder

Reads and understands input.

Example:

Translate English → French
Enter fullscreen mode Exit fullscreen mode

The encoder converts the sentence into meaning vectors.


Decoder

Generates new text one token at a time.

Example:

The capital of France is → Paris
Enter fullscreen mode Exit fullscreen mode

Many modern chat models are primarily decoder-based.


6. Self-Attention — The Secret Sauce

Self-attention allows each word to look at every other word in the sentence.

Example:

"The animal didn't cross the street because it was tired"
Enter fullscreen mode Exit fullscreen mode

The word "it" must figure out what it refers to.

Self-attention helps the model connect:

it → animal
Enter fullscreen mode Exit fullscreen mode

Instead of reading text strictly left-to-right, the model looks at relationships between words.


7. Softmax — Turning Scores into Probabilities

When predicting the next word, the model produces scores.

Example:

Paris: 8.1
London: 3.2
Pizza: -1.4
Enter fullscreen mode Exit fullscreen mode

Softmax converts these scores into probabilities.

Paris → 0.92
London → 0.07
Pizza → 0.01
Enter fullscreen mode Exit fullscreen mode

The model then chooses the most likely token.


8. Multi-Head Attention — Multiple Perspectives

Instead of one attention mechanism, transformers use multiple attention heads.

Think of it like a team of analysts:

One head focuses on grammar
One focuses on subject relationships
One focuses on context

Together they build a deeper understanding of the sentence.


9. Temperature — Controlling Creativity

Temperature controls how adventurous the AI becomes.

Temperature Behavior
0.1 Very predictable
0.5 Balanced
1.0 Creative
2.0 Chaotic

Example prompt:

The best food in Italy is...
Enter fullscreen mode Exit fullscreen mode

Low temperature → “pizza”
High temperature → “truffle pasta with wild mushrooms”.


10. Knowledge Cutoff — When the Model Stopped Learning

LLMs are trained on huge datasets, but training eventually stops.

That date is called the knowledge cutoff.

If the cutoff is 2024, the model might not know events from 2025 unless external tools provide updated data.


Final Picture: How Everything Works Together

When you type a prompt:

Explain black holes simply
Enter fullscreen mode Exit fullscreen mode

The AI pipeline roughly looks like this:

  1. Tokenization splits the text
  2. Tokens become embeddings
  3. Positional encoding adds order
  4. Self-attention analyzes relationships
  5. Multi-head attention extracts deeper context
  6. The decoder predicts the next token
  7. Softmax selects the most probable word
  8. Temperature controls creativity

Repeat this thousands of times per response.

And that’s how a machine writes paragraphs.


Closing Thought

Large Language Models are not magic.

They are layers of clever mathematics turning language into patterns.

But when these pieces work together — tokens, vectors, attention, and probabilities — something remarkable happens:

Machines start speaking our language.

Top comments (1)

Collapse
 
nyrok profile image
Hamza KONTE

Great accessible breakdown. One thing worth adding for readers moving from understanding LLMs to actually using them effectively: the way you phrase inputs to an LLM has a bigger impact than most people expect. It's not just word choice — structure matters. Separate role, context, constraints, and output format rather than mashing them into one paragraph, and response quality jumps noticeably.

flompt.dev / github.com/Nyrok/flompt