Micheal Angelo

Posted on Jan 13

How Large Language Models (LLMs) Actually Generate Text

#beginners #ai #learning #machinelearning

How Large Language Models (LLMs) Generate Text

These notes summarize how Large Language Models (LLMs) generate text, based on my learning from the DeepLearning.AI RAG course and further exploration.

This is a mental model, not marketing.

High-Level Overview

A Large Language Model (LLM) is fundamentally a next-token prediction system.

Given a sequence of tokens as input, the model:

Predicts the most probable next token
Appends it to the sequence
Repeats the process until the response is complete

That’s it.

What LLMs Do Not Do

LLMs do not:

Look up words in a dictionary at runtime
Search the internet by default
Reason like humans

Instead, they rely entirely on statistical patterns learned during training.

Two Core Components of an LLM

1️⃣ Training Data

LLMs are trained on massive text datasets:

Books
Articles
Websites
Code repositories
Documentation

During training:

The model learns statistical relationships between tokens
It does not memorize exact sentences
It learns generalizable language patterns

Example of a learned pattern:

After “the sun is”, tokens like shining, bright, or hot are statistically likely.

These patterns are encoded into the model’s parameters (weights).

2️⃣ Tokenizer and Vocabulary

Before training begins, every LLM is assigned a tokenizer.

The tokenizer:

Splits text into tokens (sub-word units)
Converts tokens into numeric IDs
Defines a fixed vocabulary (e.g. 20k–100k tokens)

Important properties:

The vocabulary is fixed at training time
The model can only generate tokens from this vocabulary
Different models use different tokenizers

Tokens Are Not Words

A token:

Might be a full word
Might be part of a word
Might include spaces or punctuation

Example:

"unbelievable"

May be split into:

["un", "believ", "able"]

This is why:

Token counts ≠ word counts
Prompt length matters
Context limits exist

How a Single Token Is Generated

At each step:

The model takes the current token sequence
Produces a probability distribution over all tokens
Selects one token (based on decoding strategy)
Appends it to the sequence

This repeats token by token.

Why Output Feels Like “Reasoning”

LLMs appear to reason because:

Language itself encodes reasoning patterns
The model has seen millions of examples of explanations
It predicts tokens that look like reasoning

But internally:

It’s still just predicting the next token.

Mental Model (Remember This)

LLMs generate text one token at a time based on probability, not understanding

If you remember this, most confusion around LLM behavior disappears.

Why This Matters (Especially for RAG)

In RAG systems:

The LLM does not know facts
It only knows patterns
Retrieved context steers token prediction

Good retrieval = better next-token probabilities.

TL;DR

LLMs are next-token predictors
They don’t think or search by default
Tokenizers define what models can generate
Everything happens one token at a time

Understanding this mental model makes prompt engineering, RAG design, and debugging much easier.

Top comments (1)

Andrea Sunny • Jan 13

👌