Hallucinations: Why They Happen and Why They Don’t Go Away

Ask a chatbot for three academic papers on a niche topic and there’s a reasonable chance one of them doesn’t exist: a real-sounding journal, a plausible author, a publication year that fits, and a title nobody ever wrote. Ask it to cite the specific page of a court ruling and it might hand you a case number that looks exactly like the others around it, formatted correctly, confident in tone, entirely made up. Nothing in the output marks the difference between the citation that’s real and the one that isn’t. Both arrive in the same voice.

Why this isn’t a lie

A lie requires knowing the truth and saying something else. That’s not what is happening here. As covered in an earlier piece in this series, an LLM’s entire operation is guessing the most probable next token given everything written so far, run in a loop, one token at a time. When you ask it a question it has good data on, that process lands on the correct answer. When you ask it something obscure, or something with no clean answer in its training data at all, the exact same process runs and produces the most statistically plausible continuation anyway. The mechanism doesn’t have a mode where it checks first and generates second. There’s only generation, and it does not pause to notice which category the question falls into.

The clearest way to picture this is a student who has learned, across years of tests, that a confident wrong answer scores better than a blank one. Told “I don’t know” earns zero credit while a smooth, plausible guess sometimes earns partial credit or even full marks if the grader isn’t checking closely, the student stops leaving blanks entirely. On exam day they write something for every question, whether they studied that material or not, and the handwriting looks equally sure either way. Ask them afterward which answers they actually knew versus which ones they filled in on the spot, and often even they can’t tell you. The confident tone was never a signal of certainty. It was just the style they’d learned to produce regardless of what was underneath it. A language model is in that position on every single token, for every single answer, all the time.

Why this matters when you read AI news

This is the detail that gets lost whenever a headline announces that a new model has “reduced hallucinations by 40 percent” or that some update has “solved” the problem. A drop in the error rate is real and worth noting; better training data, more careful tuning, and added checks can genuinely shrink how often the model gets it wrong. But shrinking a rate isn’t the same as removing a mechanism, and the mechanism producing the errors is the identical one producing the correct answers people rely on. Any product that claims to have eliminated hallucination outright is either using an unusually generous definition of the word, or leaning on something added outside the model itself to catch its guesses before they reach you. Reading these claims with that distinction in mind is the difference between treating a “99% accurate” model as trustworthy by default and knowing to check the 1 case in 100 that matters.

The part worth sitting with

Push the logic all the way and something clarifies. A model that never hallucinated would be a model that only ever produced text it was certain was true, which means it would need a way to fall silent the moment certainty ran out. But there is no certainty gauge built into next-token prediction, only plausibility, so a model built to never guess wrong would be a model built to never guess, which means a model that stops generating text altogether. Hallucination isn’t a bug riding alongside the real capability. It’s the visible edge of the same guessing process that also writes the working code and the accurate summary. That’s why it can’t be fixed from inside the mechanism, only managed from outside it, with tools that check the model’s output against something external before it reaches you. Retrieval, giving the model actual documents to ground its answer in rather than its own memory, is the main one, and it’s the subject of a later piece in this series.

If you haven’t read it yet, Next-Word Prediction: The One Trick Behind Everything is the natural place to start, since everything above builds directly on that single mechanism.