What a Language Model Actually Is
Ask a chatbot for the capital of France and it answers instantly: Paris. Ask it something with no single clean source, a judgment call, a niche question nobody wrote a tidy answer to, and it still answers instantly, often in the same confident tone. That gap is worth stopping on, because the software genuinely doesn’t feel a difference between the two cases. To understand why, it helps to be precise about what a “model” even is here.
Software that generates, not software that retrieves
A large language model (LLM) is a program built from billions of numerical parameters, tuned during training on huge amounts of text so that, given a sequence of words, it can produce a plausible next stretch of words. That’s the entire job description. There is no table of questions and answers inside it, no row it looks up when you ask something. When you send it a prompt, it runs a computation and generates a fresh response, word by word, based on patterns it absorbed during training, not a record it’s retrieving.
The clearest way to feel the difference is an analogy. A library stores exact text: you find the shelf, pull the book, and the sentence on page 40 is identical every time anyone reads it. Now picture someone who read every book in that library once, years ago, and the library has since closed. Ask that person what page 40 of a specific book said, and they’ll reconstruct something plausible, in their own words, blending memory with reasonable guesses where memory runs out. They’re not lying. They’re doing the only thing available to them: reconstructing rather than retrieving. That’s much closer to what an LLM does on every single reply, including when you ask it the exact same question twice in a row and get two slightly different answers.
Why this changes how you read AI news
Most confusion about AI products traces back to skipping this distinction. When a product says “now it knows your documents,” that’s almost never the model itself storing new facts. It’s usually a separate retrieval system feeding relevant text into the prompt before the model reconstructs its answer (the mechanism behind what’s called RAG). When a model states something false with total confidence, that’s not a database returning a corrupted row. It’s the reconstruction process filling a gap in memory with whatever pattern looked most plausible. Once you know the model is always reconstructing and never looking things up, “it made something up” and “it answered correctly” stop looking like two different modes of operation. They’re the same operation, landing on different outcomes depending on how well the training data covered that territory.
The insight that gets lost
The detail that rarely gets said out loud: an LLM doesn’t contain its answers anywhere, not in compressed form, not in some hidden index. It rebuilds the answer from a general capacity to continue text, applied fresh to your specific prompt, every single time. That’s not an inefficiency someone forgot to optimize away. It’s the entire mechanism working exactly as designed, and it’s also why two people asking the same question, worded slightly differently, can get answers that feel like they came from different sources. There is no single source. There’s only ever the reconstruction, done again, from scratch, each time you ask.
Two pillars build directly on this: how the model actually sees your text before it can reconstruct anything (tokens), and how it represents meaning well enough to reconstruct coherent answers at all (embeddings). Both are coming next in this series.