📈 Why AI bills rise as costs fall
Agents eat tokens at rates that are impossible to forecast
Hi all,
Last week, we explored what tokenmaxxing means for CFOs and how firms can buffer unexpected AI costs.
We go further today to show you why AI bills are hard to forecast today and what will happen as we crack that problem.
The token explosion paradox
We estimate that the number of tokens processed per quarter has grown by around 17,000x over four years.1
Token prices have collapsed during this time. Demand for machine intelligence is highly elastic, meaning that as prices fall, consumption increases by more than the decline in price.
One reason is that cheaper tokens have made agents economically viable. At the same time, agents use tokens at rates that are orders of magnitude higher than those of chatbots for single-turn queries. That shows up as the total tokens processed per output token—advanced models do a lot of processing below the surface that a user doesn’t see.
A lot of this growth is driven by China’s domestic demand and its model providers, especially ByteDance and Alibaba.
The cost of the ghost token
When you use an AI agent, the final result you get is really just a summary of all the work the agent has undertaken. There may be dozens of tool calls to browse the web or load up a file to check and validate the work it has done. All of these are steps consume tokens: they become hidden multipliers.
The first of these is token amplification.2 A coding agent that operates over 10 turns might need to re-read its full context every turn. That repetitive reading of context could use as many as 55x more tokens than a single-turn query for the same task.
Actual active inference is probably only 15-20% of the total token consumption. The rest is invisible work that you, as a user, and possibly the company paying for it all, haven’t modelled.
The long tail of tool calls
Agents make anywhere between five and twenty-five tool calls per task. And each call adds more context, tokens and API costs. It also increases the likelihood that the model will need to retry the task to get it right.





