Open Source vs Closed Models: What 'Open' Actually Means in AI
A company puts out a new model and calls it open source in the announcement. Tech press picks up the phrase and runs with it, sometimes in the headline. Developers start comparing it favorably to closed competitors on exactly those grounds. What actually got published, though, is a set of files containing the model’s trained weights, the long list of numbers that determine how it responds to input, available to download and run on your own hardware. Nothing about what data trained it. Nothing about the code used to train it. No way to check how it was built or to rebuild it yourself. That’s a real thing, and it’s useful. It is not what “open source” has ever meant anywhere else in software, and the gap between the label and the reality is doing a lot of quiet work.
What openness actually requires
Openness in AI is not a single switch, on or off. It’s a spectrum, and most releases sit somewhere in the middle of it while getting marketed as if they sit at the top. At one end, a company hands out only the weights. You can run the model, fine-tune it, build products on it, even redistribute it under whatever license they attached. At the other end, truer to the traditional software meaning of open source, everything is available: the weights, the training code, and the training data itself, all under a license that lets you actually use it. That full combination is rare. More common are partial releases: training code without the data behind it, or a paper describing the data in general terms without the data itself.
The difference is like being handed a finished cake versus being handed the recipe. With weights alone, you get the cake. You can eat it, you can slice it up and share it with other people, and it’s genuinely useful to have. But you have no ingredient list and no method. You can’t verify what’s actually in it, you can’t check whether a claimed ingredient was really used, and you can’t reproduce it in your own kitchen. Full openness would mean getting the recipe too: the exact ingredients, the quantities, the technique, so someone else could bake the same thing from scratch and confirm it matches. Weights-only release is a meaningfully thinner kind of open than that, even when the word attached to it is identical.
Why the distinction matters
This isn’t a pedantic argument about definitions for its own sake. Whether training data and training code are available determines whether anyone outside the company can check what a model actually learned from, including copyrighted material, biased sources, or content the company would rather not disclose. It determines whether independent researchers can reproduce a result, spot a flaw, or confirm a safety claim, rather than taking the company’s description of its own work on faith. It also shapes who can compete: a startup that gets weights alone can build on top of a model, but it can’t learn from or improve the process that created it, which keeps the actual expertise concentrated with whoever trained it in the first place. Calling that arrangement “open source” implies a level of inspectability that a license to download a finished file simply doesn’t provide.
The word is doing marketing work, not technical work
None of this means weights-only releases are worthless or that companies are lying by handing them out. They’re a genuine, useful form of access, just a much narrower one than the phrase “open source” implies to anyone who remembers what that phrase meant before it got attached to AI models. In practice, “open” in this industry is almost always partial, and which part is missing (the data, the code, or both) tends to be exactly the part that would let outsiders actually check the company’s claims. The word functions more as a positioning move than as a precise description of anything, and it tells you far less about how inspectable or reproducible a system really is than most people assume when they read it.
For a closer look at the layer that “open weights” almost never actually covers, see Training Data: Where All That Text Actually Comes From. And for a useful contrast, a case where “open” describes something rigorously verifiable rather than a marketing gesture, MCP: The Standard Becoming the USB of AI looks at an actually open protocol standard.