DEV Community

Louis Liu
Louis Liu

Posted on

šŸ“ˆVisualizing LLM Parameters: Temperature, Top-p, and Top-k in Action

Ever wondered how large language models decide which word to generate next? The secret lies in probability distributions and sampling strategies—but instead of explaining it in abstract terms, let's see it in action.

This article uses visualizations to show how Temperature, Top-p, and Top-k influence the tokens LLMs choose. You can try the interactive visualizer yourself at the end of the article.


Basic Explanation

The output of a large language model predicts the likelihood of the next token based on the given prompt.

Suppose our prompt is:

Twinkle twinkle little

At this point, the model generates a probability table for the next token based on the existing prompt. The image below shows a 0–100 axis, with predicted tokens placed according to their probabilities. The higher a token’s probability, the larger its segment on the axis.

Next, the model performs random sampling from this probability distribution. You can think of it like generating a random number, and the token whose interval contains that number will be selected as the output.

We can see that ā€œstarā€ has the highest probability, so the model is very likely to output:

Twinkle twinkle little star

With this basic principle in mind, let's look at how specific parameters affect this token selection process.

From now on, Temperature and Top-p are set to 1 by default, and Top-k is set to 10. In actual generation, these parameters work together in the same token selection process. For clarity, we will observe the effect of each parameter individually.


Temperature

Lowering the temperature amplifies the differences between candidate token probabilities. Tokens with higher original probabilities become even more likely, making the model’s output more stable. In this example, the probability difference between ā€œstarā€ and ā€œcarā€ is about 19.6; when temperature is lowered to 0.5, this difference increases significantly, reaching 36.1.

Conversely, raising the temperature reduces the differences, making lower-probability tokens more likely to be chosen, resulting in more random output. The image below shows the probabilities when temperature is increased to 1.68.

Note that temperature does not change which tokens are possible—it only affects their relative probability differences.


Top-p

Top-p can be understood as the cumulative probability threshold.

The model starts with the highest-probability token and adds probabilities in descending order. Once the cumulative probability reaches the Top-p threshold, only those tokens are kept as candidates, while the rest are discarded.

For example, when Top-p = 0.6, the model only considers the tokens whose cumulative probability reaches 60%.

These remaining tokens are then renormalized to ensure the total probability sums to 1.

Therefore, Top-p controls the ā€œprobability coverage rangeā€ rather than the exact number of candidate tokens.


Top-k

Top-k controls the number of candidate tokens considered during selection.

When Top-k is 1, the model only considers the highest-probability token, making the output highly deterministic.

When Top-k is 5, the model chooses a token from the top 5 highest-probability tokens, ignoring the rest.


Demo

The screenshots in this article are from the LLM Sampling Visualizer. You can use this tool to explore how Temperature, Top-p, and Top-k influence the model’s output.


References

Top comments (0)