Louis Liu

Posted on Jan 12

What Really Happens When an LLM Chooses the Next Token🤯

#programming #ai #javascript #beginners

LLM outputs sometimes feel stable. Sometimes they suddenly become random.

Often, the only thing that changed is a parameter.

So what actually happens at the moment an LLM decides which token comes next?

The short answer is simple:

it samples one token from a probability distribution.

Let’s break that moment down with visuals.

The Core Idea

Given a prompt, the model predicts a probability distribution over possible next tokens.

For example:

Twinkle twinkle little

At this point, the model assigns a probability to each candidate token.

You can imagine them laid out on a 0–100 scale:

Higher probability → larger segment
Lower probability → smaller segment

Sampling: What Actually Happens

Next comes sampling.

A practical way to think about it:

Generate a random number

See which segment it falls into

Output the corresponding token

Since “star” has the largest segment, it’s the most likely result:

Twinkle twinkle little star

Temperature, Top-p, and Top-k

only affect this sampling step.

From here on:

Temperature = 1
Top-p = 1
Top-k = 10

We’ll change one parameter at a time.

Temperature

Temperature does one thing:

it stretches or flattens probability differences.

Lower temperature → strong preferences → stable output
Higher temperature → flatter distribution → more randomness

In this example:

The gap between “star” and “car” is 19.6
With Temperature = 0.5, the gap grows to 36.1

With Temperature = 1.68, lower-probability tokens become more competitive:

Key point:

Temperature doesn’t remove tokens.

It only changes how strongly the model prefers one over another.

Top-p (Nucleus Sampling)

Top-p controls how much probability mass is kept.

The process is straightforward:

Start from the highest-probability token
Keep adding tokens until cumulative probability ≥ Top-p
Drop the rest

With Top-p = 0.6, only tokens covering 60% of total probability remain.

The remaining tokens are then renormalized:

This means:

The number of tokens is dynamic
More peaked distributions keep fewer tokens

Top-k

Top-k is simpler.

Keep only the top K tokens.

Top-k = 1 → always pick the most likely token
Top-k = 5 → sample from the top 5

Everything else is ignored.

In one line:

Top-k limits quantity
Top-p limits probability mass

Demo

All visuals in this article come from:

👉 LLM Sampling Visualizer

If sampling parameters feel abstract,

five minutes with this tool builds intuition faster than reading more text.

DEV Community