Ever wondered how large language models decide which word to generate next? The secret lies in probability distributions and sampling strategiesābut instead of explaining it in abstract terms, let's see it in action.
This article uses visualizations to show how Temperature, Top-p, and Top-k influence the tokens LLMs choose. You can try the interactive visualizer yourself at the end of the article.
Basic Explanation
The output of a large language model predicts the likelihood of the next token based on the given prompt.
Suppose our prompt is:
Twinkle twinkle little
At this point, the model generates a probability table for the next token based on the existing prompt. The image below shows a 0ā100 axis, with predicted tokens placed according to their probabilities. The higher a tokenās probability, the larger its segment on the axis.
Next, the model performs random sampling from this probability distribution. You can think of it like generating a random number, and the token whose interval contains that number will be selected as the output.
We can see that āstarā has the highest probability, so the model is very likely to output:
Twinkle twinkle little star
With this basic principle in mind, let's look at how specific parameters affect this token selection process.
From now on, Temperature and Top-p are set to 1 by default, and Top-k is set to 10. In actual generation, these parameters work together in the same token selection process. For clarity, we will observe the effect of each parameter individually.
Temperature
Lowering the temperature amplifies the differences between candidate token probabilities. Tokens with higher original probabilities become even more likely, making the modelās output more stable. In this example, the probability difference between āstarā and ācarā is about 19.6; when temperature is lowered to 0.5, this difference increases significantly, reaching 36.1.
Conversely, raising the temperature reduces the differences, making lower-probability tokens more likely to be chosen, resulting in more random output. The image below shows the probabilities when temperature is increased to 1.68.
Note that temperature does not change which tokens are possibleāit only affects their relative probability differences.
Top-p
Top-p can be understood as the cumulative probability threshold.
The model starts with the highest-probability token and adds probabilities in descending order. Once the cumulative probability reaches the Top-p threshold, only those tokens are kept as candidates, while the rest are discarded.
For example, when Top-p = 0.6, the model only considers the tokens whose cumulative probability reaches 60%.
These remaining tokens are then renormalized to ensure the total probability sums to 1.
Therefore, Top-p controls the āprobability coverage rangeā rather than the exact number of candidate tokens.
Top-k
Top-k controls the number of candidate tokens considered during selection.
When Top-k is 1, the model only considers the highest-probability token, making the output highly deterministic.
When Top-k is 5, the model chooses a token from the top 5 highest-probability tokens, ignoring the rest.
Demo
The screenshots in this article are from the LLM Sampling Visualizer. You can use this tool to explore how Temperature, Top-p, and Top-k influence the modelās output.







Top comments (0)