DEV Community

Lini Abraham
Lini Abraham

Posted on

Prompt Response Tuning with Temperature, Top-p, and Top-k

Language models like GPT don’t “think” in full sentences — they predict one token at a time, where a token is a chunk of text (often a word or part of a word) created through a process called tokenization. At each step, the model chooses the next token based on probabilities — and decoding parameters like temperature, top-k, and top-p control how predictable, random, or creative those token choices are.

Temperature

Temperature controls how random or focused the model’s word choices are.

A low temperature (e.g., 0.2) makes the output more predictable — the model sticks to the most likely words.
A high temperature (e.g., 1.0 or more) makes the output more creative, possibly even risky or unusual.

Temperature Behavior
0.0 Deterministic, safest
0.3 – 0.7 Predictable, less risky
1.0 Balanced randomness (default for GPT)
>1.0 Creative, more surprising, possibly noisy
>1.5 Often too chaotic or nonsensical

Temperature is usually in the range of 0.0 to 2.0.

Top-k Sampling

Top-k limits the model to the top k most likely tokens, then picks one randomly from that group.

Top-k = 1 → Always picks the most probable word (like greedy decoding).
Top-k = 40 → Picks from the 40 best guesses, adding variety without going off-topic.

Top-k Value Behavior
1 Safe but repetitive
10–50 Good diversity, still smart
100+ More variety, more risk

Top-k value ranges from a minimum of 1 up to a maximum of the total vocabulary size (which can be ~50,000 tokens for GPT models)

Top-p Sampling (Nucleus Sampling)

Top-p chooses from the smallest set of tokens whose total probability adds up to at least p.
If Top-p = 0.9, the model picks from the most likely words that together make up 90% of the probability mass.
Unlike top-k, this list can grow or shrink dynamically depending on the situation.

Top-k Value Behavior
0.7 Very conservative
0.9 Balanced, avoids outliers
1.0 Like no filter — all options allowed

Top-p value ranges from 0 to 1

Top-p sampling (short for “probability sampling”) is also known as Nucleus sampling. It works in the following way:

  • Sorts the tokens from most to least likely
  • Selects the smallest group of tokens whose cumulative probabilities add up to at least p (like 0.9)
  • Randomly picks one token from that set This selected group is what we call the nucleus.

The nucleus is the tight cluster of highest-probability tokens — the model’s most confident guesses.

Instead of sampling from all possible tokens (many of which are low-probability and often nonsensical), we focus on the most meaningful subset — the nucleus of the probability mass.

Top comments (0)