Agents: When AI Stops Answering and Starts Doing

Ask a plain chatbot to fix a bug and it will describe the fix: change this line, rename that variable, watch out for the edge case with empty input. You still have to open the file, make the change, run the tests, and push it yourself. Ask an agent to “fix this bug and open a pull request” and something different happens. It reads the failing test, opens the file, edits the code, runs the test suite again, notices a second file needs a matching change, edits that too, commits, and opens the pull request, all without you typing anything in between. Same underlying model, arguably the same “intelligence.” What changed is what happens after it forms an opinion.

Reason, act, observe, repeat

An agent runs in a loop. At each turn it looks at the goal and the current state of the world, reasons about what the single next step should be, and takes that step, usually by requesting a tool call: read this file, run this command, query this database. It doesn’t guess at the result. It waits, gets back whatever actually happened (the file’s real contents, the command’s real output, an error message it didn’t expect), and checks that against the goal before deciding what to do next. It holds onto everything from earlier steps, so a mistake three steps back can still get caught and corrected at step seven. This continues, step after step, until the agent judges the task done or runs into a stopping condition: too many steps, a failed check it can’t resolve, a limit set by whoever built it.

A one-shot reply never does any of this. It answers once, using only what it already knows or was told in the prompt, and stops. There’s no observing, no second step informed by a first one, no chance to notice partway through that the plan needs to change.

The clearest way to feel the difference is to picture a new employee who can only answer questions from behind a desk, versus an intern who’s been handed a badge, a set of keys, and a task list. The desk-bound employee can tell you exactly how to file the expense report, which drawer the stapler is in, who to call about the broken printer. The intern with the badge just goes and does those things: opens the supply closet, walks the form to the right office, calls the vendor. Their judgment hasn’t necessarily improved. What changed is that they can now act on the world directly instead of only advising someone else to.

Why the distinction is worth being precise about

This matters because the two things fail in completely different ways. A one-shot answer that’s wrong is a wrong sentence sitting in a chat window, waiting for a human to notice before it does anything. An agent that’s wrong, or that misreads one observation early in the loop, can carry that error forward through several real actions before anyone looks. It might edit the wrong file, then commit that edit, then open a pull request built on top of it, each step compounding on the last, all before a person has a chance to intervene. The number of steps and the autonomy between them are exactly what makes agents useful for real, multi-step work, and exactly what makes a small early mistake expensive rather than merely wrong. For deeper background on what a single one of those tool-call steps actually involves, see Function Calling: How a Model Actually Uses a Tool, which covers the request-and-execute mechanism an agent strings together repeatedly across its loop.

The loop was never the hard part

Once you see it this way, an agent stops looking like a fundamentally smarter model and starts looking like an ordinary model wired into a loop and handed permissions: the ability to read this repository, write to that file, call this API, send that message, repeatedly and without asking first each time. The reasoning inside each step is the same reasoning a one-shot chatbot already does. What’s new, and what’s genuinely risky, was never the thinking. It’s the badge and the keys: what the outer system actually lets the model’s requests turn into out in the real world, now that there’s a loop patient enough, and willing enough, to keep issuing them until something stops it. The next question worth asking is what happens when that badge gets handed to more than one of these loops at once, which is exactly the territory covered in Multi-Agent Systems: When Several Agents Work Together.