Temperature and Sampling

Ask a chatbot to describe a sunset, then open a new chat and ask the exact same question again. You’ll likely get two different paragraphs: different word choices, maybe a different structure, sometimes a wildly different tone. Nothing about the model changed between the two requests. The question was identical, word for word. So why didn’t the answer come out identical too?

The dial behind the choice

At each step, the model doesn’t just compute one answer and hand it over. It computes a probability for every plausible next token, out of a vocabulary that can run to 50,000 or 100,000 entries, and ends up with a short list of strong candidates. Imagine it’s picked out the top five: “the” at 40 percent, “a” at 25 percent, “her” at 12 percent, “his” at 8 percent, and “an” at 5 percent, with the rest splitting up the remainder. The model then has to choose one, and how it chooses is governed by a setting called temperature, usually a number between 0 and roughly 2.

This is where the restaurant analogy earns its keep. Picture someone at a familiar restaurant with a menu of thirty dishes. At temperature 0, they order the exact dish they always order, the one they know is good, every single time, no exceptions. Push the temperature up slightly and they’re still likely to order the usual, but every so often they’ll try the second favorite instead. Push it further, toward 1.5 or 2, and they start pointing at things near the bottom of the menu they’ve barely tried, dishes with a 3 percent or 4 percent chance of being picked. Most of those gambles turn out fine. Some are inspired. A few are a mess nobody should have ordered.

Low temperature makes the model pick the highest-probability token almost every time, which produces flat, consistent, occasionally repetitive text. High temperature lets lower-probability tokens win more often, which produces variety and occasional surprise, at the cost of coherence when the gamble doesn’t pay off. At temperature 0, the same prompt produces close to the same output every time, or close enough that it can feel deterministic even though the underlying process is still built around a distribution rather than a single fixed answer.

Why this matters for reading AI behavior

Once you know this dial exists, a lot of casual observations about AI personality stop meaning what people think they mean. “This model is more creative than that one” is sometimes a real difference in training or architecture, and sometimes nothing more than one product being configured at temperature 0.3 for factual reliability and another at 1.0 for brainstorming. A coding assistant is usually tuned low on purpose: nobody wants a function definition that gambles on syntax. A tool marketed for poetry or brainstorming is often tuned higher, favoring novelty over safety. Same underlying model, different dial position, entirely different personality on the surface.

It also explains why identical prompts don’t produce identical replies, and why a single weird or incoherent answer doesn’t necessarily mean the model is broken. It might just mean the dice landed on the 4 percent option that particular time.

The dial isn’t the model’s mood

It’s tempting to talk about a model’s output as a reflection of its character: cautious, or bold, or unpredictable, as though these were traits the system carries around the way a person does. But temperature is not a personality. It’s a parameter sitting in the API call, set by whoever built the product on top of the model, frequently without any note to the person typing into the chat box. The “creative flair” a user attributes to an AI is often nothing more than a number an engineer typed into a configuration file that morning, and the same base model could sound completely different tomorrow if that number changes. What looks like temperament is a setting, adjusted upstream, invisible from where the user sits.

That configuration choice only has something to act on because of the underlying mechanism it’s steering: the model guessing its way through text one token at a time. If you want the fuller picture of what’s being sampled at each of these steps, the piece on next-word prediction covers the base operation that temperature is turning up or down.