Function Calling: How a Model Actually Uses a Tool
Ask a chatbot what the weather is like today and something interesting happens. A little indicator appears, maybe “checking weather,” then a real answer shows up: 62 degrees, cloudy, rain expected after 4pm. Ask it to book a meeting and it seems to reach into your calendar, find an open slot, and put something there. It reads like the model stepped outside itself, looked something up, touched a real system somewhere. It didn’t. The model never leaves the conversation. It never queries a weather service, never opens your calendar, never runs a single line of code. All it ever does, in every one of these cases, is write text.
The order goes to the kitchen, not the customer
A useful way to picture this is a customer at a restaurant. The customer doesn’t walk into the kitchen and start cooking. They tell the waiter, clearly and specifically, what they want: the salmon, medium, no butter. The waiter and the kitchen, an entirely separate system with its own tools, ingredients, and staff, take that request and actually execute it. If the kitchen is out of salmon, or overcooks it, that happens on the kitchen’s side, not because the customer somehow cooked it wrong. Stating the request and fulfilling it are two different jobs, done by two different parties.
A model works the same way. When it decides a tool would help, it doesn’t reach for that tool. It generates a specific, structured piece of output, something like a function name and a set of arguments, written as plain text, using the exact same token-by-token process it uses to write a sentence. That output might look like a request to call a function named get_weather with the argument “location: Boston,” formatted so a program reading it knows precisely what’s being asked for. The model produces that request and stops. It has no way to do anything more with it.
A separate outer program, the application wrapped around the model, is watching for exactly this kind of output. When it sees a structured request, it reads the function name and arguments, and actually calls the real weather API or the real calendar system. That call goes out over an actual network and comes back with an actual result: a temperature, a confirmation, an error. That result gets typed back into the model’s context as new text, and the model picks up the conversation again, now with a real number in hand instead of a guess. The “waiter” here is ordinary software, following a rule roughly this simple: if the model’s output matches this pattern, run the function and report back what happened.
Why the distinction is worth holding onto
This matters because it locates where the actual action, and the actual risk of a real-world action, sits. The model only ever produces a request: send this email, delete this file, transfer this amount. Whether any of that actually happens depends entirely on the outer program, on what it’s connected to and on what it’s been built to do automatically versus hold up for a person to approve first. A model that “wants” to do something and a model that can do something are separated by that outer layer, and that gap is where the meaningful permission questions live, questions that deserve their own treatment later. For now, the point is simpler: generating a request costs the model nothing and carries no risk by itself.
It also explains why a tool-using model can sound confident about something it got wrong. If the tool was never wired up, or the call failed, or the arguments were malformed, the model may still narrate the outcome as a success, because writing “the weather is sunny” is exactly as easy for it as writing “I requested the weather.” The words carry no stamp showing whether real execution actually backed them up.
The request is not the deed
None of this is unique to weather or calendars. The same pattern shows up whether the tool is a search engine, a code interpreter, or a database query, and it holds just as well for the kind of standardized connection described in MCP: the standard becoming the USB of AI, which lets many different tools plug into this same request-and-execute pattern without custom wiring for each one. It will resurface in a future piece on AI agents, where several of these tool calls get chained together into multi-step action, one request feeding the next. But the basic fact stays fixed. A model “using a tool” is really a model asking someone else to use it. The actual execution, and everything that comes with taking a real action in the world, sits entirely outside the model, in whatever program decides to act on its request.