Token Costs: How AI Is Actually Billed, and Why the Bill Keeps Growing
Someone sets up an AI assistant to handle a recurring task: read a folder of invoices, check them against a spreadsheet, flag anything odd, and draft a summary email. It works well, so they let it run daily. A month later the bill arrives and it is several times higher than the simple chatbot subscription they cancelled to make room for it. Nothing about the task seemed that different. Somewhere between “ask a question” and “let it handle this,” the actual unit of billing quietly multiplied.
The unit is the token, and not all tokens cost the same
Every interaction with an AI model gets converted into tokens, small chunks of text, roughly a few characters each, that the model reads and writes one at a time. Billing is built entirely around counting these. Input tokens are everything sent to the model to read: your prompt, any document pasted in, any prior conversation carried along for context, any result handed back from a tool. Output tokens are what the model generates in reply. Output tokens are usually priced noticeably higher per unit than input tokens, often several times more, because generating text token by token takes more computation than reading it.
A plain back-and-forth chat message is cheap by this measure. A short question in, a short paragraph out, might cost a fraction of a cent. An agent working through a multi-step task is a different arithmetic altogether. Every tool call it makes, a web search, a database lookup, a file read, gets its result stuffed back into the context and billed again as input on the next step. A reasoning model’s intermediate steps, the scratch work it produces before committing to an answer, are themselves billed output tokens, not a free internal process. String together a dozen steps like this, each one re-reading everything that came before plus whatever new material just arrived, and the token count for a single task can run many times higher than one plain question and answer.
The nearest everyday version of this is a household’s electricity bill. When the price per kilowatt-hour drops, nobody’s monthly bill shrinks to match, because cheaper power makes it affordable to run more devices, for longer, at the same time: air conditioning left on, several screens going, gadgets that never fully switch off. The falling unit price doesn’t reduce consumption, it enables a lot more of it. Economists have a name for this pattern, Jevons paradox, and it applies just as well here. A cheaper token doesn’t lead to fewer tokens used. It leads to systems that use vastly more of them, because doing so has finally become affordable.
Why this matters beyond curiosity about the bill
This matters for more than the invoice itself. It changes what “AI got cheaper” actually means. Comparing this year’s per-token price to last year’s and concluding that costs should be falling means comparing only half the equation. The other half is how many tokens a given task consumes, and that number keeps climbing as products move from single-shot answers to agents that plan, call tools, check their own work, and loop through several rounds before finishing. A task that used to be one question and one answer might now be twenty internal steps, each carrying forward everything before it.
Anyone budgeting for AI usage, whether a person paying for a personal assistant or a team estimating what a new product feature will cost to run, has to track consumption as closely as price. A falling price per unit and a rising total bill are not a contradiction. They are the expected outcome of making a resource cheap enough that people finally use it the way they always wanted to.
The bill is a function of two numbers, not one
The price per token has genuinely been falling substantially, year over year, and that trend is real and well documented across the industry. It would be easy to stop there and expect bills to follow the same downward line. But the number of tokens consumed per task is growing faster than the per-token price is falling, especially as agentic, multi-step use spreads into more products. So total spending on AI keeps rising for a lot of users and companies even as the per-unit price gets cheaper every quarter. A falling sticker price does not mean a falling bill. It means the bill is being multiplied by a second number growing even faster, and that second number is the one worth watching.
For more on where those extra output tokens come from in the first place, see Models That “Reason”: Chain-of-Thought and Inference-Time Compute, and for the multi-step loops that multiply token usage the most, see Agents: When AI Stops Answering and Starts Doing.