Exponential View

Exponential View

📈 Why AI bills rise as costs fall

Agents eat tokens at rates that are impossible to forecast

Azeem Azhar and William Gildea
May 25, 2026
∙ Paid

Hi all,

Last week, we explored what tokenmaxxing means for CFOs and how firms can buffer unexpected AI costs.

We go further today to show you why AI bills are hard to forecast today and what will happen as we crack that problem.


👾 Play our one‑off, AI‑themed word quiz created just for this edition – and win a prize.


The token explosion paradox

We estimate that the number of tokens processed per quarter has grown by around 17,000x over four years.1

Token prices have collapsed during this time. Demand for machine intelligence is highly elastic, meaning that as prices fall, consumption increases by more than the decline in price.

One reason is that cheaper tokens have made agents economically viable. At the same time, agents use tokens at rates that are orders of magnitude higher than those of chatbots for single-turn queries. That shows up as the total tokens processed per output token—advanced models do a lot of processing below the surface that a user doesn’t see.

A lot of this growth is driven by China’s domestic demand and its model providers, especially ByteDance and Alibaba.

The cost of the ghost token

When you use an AI agent, the final result you get is really just a summary of all the work the agent has undertaken. There may be dozens of tool calls to browse the web or load up a file to check and validate the work it has done. All of these are steps consume tokens: they become hidden multipliers.

The first of these is token amplification.2 A coding agent that operates over 10 turns might need to re-read its full context every turn. That repetitive reading of context could use as many as 55x more tokens than a single-turn query for the same task.

Actual active inference is probably only 15-20% of the total token consumption. The rest is invisible work that you, as a user, and possibly the company paying for it all, haven’t modelled.

The long tail of tool calls

Agents make anywhere between five and twenty-five tool calls per task. And each call adds more context, tokens and API costs. It also increases the likelihood that the model will need to retry the task to get it right.

This post is for paid subscribers

Already a paid subscriber? Sign in
William Gildea's avatar
A guest post by
William Gildea
Product Manager at Exponential View
© 2026 EPIIPLUS1 Ltd · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture