Fine-Tuning vs Prompting vs RAG: What to Actually Use

A team building a support bot for their software product spent six weeks collecting transcripts, cleaning them up, and fine-tuning a model so it could answer questions about their own features. The result was mediocre: it still got pricing details wrong, still referenced a plan tier they’d discontinued two months earlier, and updating any of it meant repeating the whole training run. Someone eventually tried just pasting the current help docs into the prompt alongside the question. That worked better, on the first try, for a fraction of the cost. The six weeks hadn’t been wasted because fine-tuning is bad. They’d been wasted because the team picked the tool before asking what kind of problem they actually had.

Three ways to adapt a model

There are three basic ways to take a general-purpose model and make it useful for a specific job, and they solve different problems.

Prompting is simply asking well: clear instructions, relevant context, maybe a couple of examples of the output you want, all typed directly into the conversation. It costs nothing beyond the time it takes to write a good prompt, needs no extra infrastructure, and takes effect immediately. It’s the right first move for almost everything.

RAG (retrieval-augmented generation) adds a step before the prompt reaches the model: relevant documents get pulled from wherever they live (a wiki, a database, a folder of PDFs) and inserted into the prompt at the moment of asking. It’s the answer when the model needs to know something specific, current, or proprietary that it couldn’t have learned during training, your actual pricing page, this week’s inventory, the internal policy document nobody outside the company has ever seen.

Fine-tuning means retraining the model further on a set of your own examples, adjusting its internal weights so certain patterns of response become second nature. It’s the slowest and most expensive of the three, often by an order of magnitude or more, and it’s genuinely useful mainly when you need to change how the model behaves at a deeper level than facts: its tone, its output format, the way it structures a decision, not just what it knows.

Think of it as onboarding a new employee. Prompting is giving them clear verbal instructions right before a task. RAG is handing them a reference manual they can flip through while they work. Fine-tuning is sending them through months of specialized retraining to change how they fundamentally behave and make decisions. You wouldn’t put every new hire through a months-long retraining program just so they could answer a question the manual already covers.

Why the distinction matters in practice

The practical test is simple: are you missing information, or are you missing a behavior? If the model doesn’t know something, prompting or RAG fixes it, prompting if the information is small and stable enough to paste in each time, RAG if it’s large, changing, or too big to fit in a prompt at all. If the model knows the relevant facts but keeps producing the wrong shape of answer, the wrong tone, the wrong structure, ignoring a formatting rule you’ve stated ten different ways, that’s a behavior problem, and it’s the one case where fine-tuning starts to earn its cost.

Cost and speed should weigh heavily here. A prompt change takes minutes to test and iterate on. A RAG setup takes days to stand up properly. A fine-tuning run takes real compute, a curated dataset, and a retraining cycle every time something needs to change, which is exactly the problem the support bot team ran into when their discontinued plan tier kept showing up in answers.

The wrong layer

Most of the time, the instinct to fine-tune shows up when the actual problem is one of the other two. Someone notices the model getting facts wrong and assumes it needs deeper training, when what it actually needs is better information at the moment of asking. Fine-tuning isn’t the wrong tool in general, it’s just being reached for far more often than the problem calls for, usually at many times the cost and effort a prompt rewrite or a retrieval step would have taken. Before committing to a retraining run, it’s worth asking honestly whether the model is missing knowledge or missing a behavior, because those two questions point to completely different solutions, and only one of them is ever worth the price of changing what the model fundamentally is.

If you want the deeper mechanics of what that last option actually does inside the model, From Raw Model to Assistant: Fine-Tuning and RLHF covers it, and for the retrieval side, RAG: Giving a Model Your Documents Without Retraining It goes into how that actually gets built.