🔮 Why AI isn’t showing up on your bottom line
A framework to understand your firm’s AI transformation
I had tea with a senior exec at a well-known public tech company last month. She has about a thousand engineers working for her, and nearly every one of them works with Claude Code. They are producing more lines of code, submitting more pull requests, getting more done. Productivity is up for individuals, but she doesn’t seeing proportional gains at the organization level. As she put it to me: “one plus one plus one plus one equals one-and-a-half.”
She is not alone. Uber’s COO Andrew Macdonald went on record this week saying that the relationship between AI investment and results is not there yet:
I think maybe implicitly there is more that is getting shipped, but it’s very hard to draw a line between one of those stats and, ‘Okay, now we’re actually producing 25% more useful consumer features.’
AI has delivered something. I have felt it; my team has felt it; most users have felt it, which is why we keep returning and using more of it. Two years ago, only a dozen Anthropic customers were spending over $1 million a year on Claude1; today, more than 1,000 do. More impressively still, Anthropic’s average corporate customer increased their spend by a factor of five in the past year.
But in more than three years since ChatGPT’s release, only 27% of executives say AI has met their ROI expectations. What do we make of the other 73%? Could their expectations be too high? Or too low? Do they even have the right class of expectations?
In a way, we can’t tell, but we can feel the vibes. And the vibes are that individual workers are getting faster and more productive. But for now, those individual gains from AI do not compound into firm-level ROI.
That is the puzzle we are going to solve in today’s essay.
Let’s go!
The productivity puzzle, restated
Back in 1987, Robert Solow famously pointed out that you could see the computer age everywhere but in the productivity statistics. He was right for the next few years, then wrong. Paul David realized this was a common problem with general-purpose technologies. They systematically depressed measured productivity in their early stages because companies needed to invest in all sorts of hard and soft know-how before the gains appeared. Erik Brynjolfsson calls this the productivity J-curve: general-purpose technologies are a drag in their early years because firms have to make complementary intangible investments before the gains materialize.
Paul David’s 1990 paper on electrification is the canonical account of why a general-purpose technology can sit inside firms for ages before we see the results. The story he tells – building on Warren Devine’s earlier survey of the shift from shafts to wires – runs through three phases. And those phases map onto where AI is now.
Stage 1: The lightbulb
One of electricity’s first factory roles was the simplest – lighting. A brighter floor was safer than one lit by gas, and cleaner than one lit by oil. But work still flowed through the same sequence of people, machines, shafts and belts. Electricity had improved the workers’ immediate environment, but it did not change the factory’s operating logic. When ChatGPT was first released, it did something similar. It increased how quickly we could write emails; individuals sped up on some tasks, but the firm did not. This is Stage 1 of AI transformation: the lightbulb.
Most of the AI products we see today are all about individual productivity. Yes, there are enterprise plans for ChatGPT and Claude and whatever else. But the unit of work is still the task that the individual has to hand. The enterprise plan just lets them quickly access the corporate skills repository.
Stage 2: The group drive
The next stage of electricity adoption in factories focused on cost savings rather than productivity. Louis Bell wrote in 1891 that large central steam plants could be five to seven times more coal-efficient than small engines. Factories bought power from central stations and installed electric motors to drive their existing shafts and belts.
Then, a professor of electrical engineering, F. B Crocker, and his colleagues found another application. Electric power frees the shop floor from the shafting. Mechanical power belts, tools and oil, the mess of the factory floor, could all be moved; machines no longer had to be arranged in parallel lines beneath shafts.
The open question was how many motors a factory needed: one per tool, or one per group of tools? The latter became known as group drive. It was a single motor that powered a cluster of machines via a shared shaft.
Group drive preserved existing layouts, reused sunk capital, needed fewer motors, and gave many of electricity’s benefits without the cost of rebuilding the plant. It was cheaper and easier than the alternatives, so it won and dominated factories through the 1890s, 1900s and into the First World War.
AI agents are better than chatbots. They can handle whole workflows rather than single tasks. But, like group drive, they are attached to the existing organizational geometry. In the case of electricity, this was the shop floor layout. For AI, it’s the web of processes designed by the companies well before anyone knew what an LLM was.
An AI recruiting agent speeds up a process that was previously done by humans and an applicant tracking system. Their recruiting pipeline may shrink from weeks to hours. A customer service agent takes on more tickets than the support team could before. The logic behind these examples is essentially cost-saving – fewer support tickets need humans, recruiters screen more candidates with the same headcount and marketers create more variants without additional staff. The firm isn’t making consequential decisions any faster. This is Stage 2, the group drive, machines turning faster on the same shafts.
Stage 3: The unit drive
It was only later, once the organizing logic of the factory moved to throughput from cost-saving, that the deeper value of unit drive, one motor per machine, became clear. In 1913, Ford’s Highland Park plant decided to orient machines and workers around the workflow rather than the geometry of shafts and belts. Over the next decade (1919–1929), as more factories adopted unit drive, US manufacturing labor productivity grew by 5.4% a year.
The pattern for AI will mirror the pattern David spotted for electrification. Stage 1 speeds the individual, Stage 2 a workflow and Stage 3 the firm.
The ladder
You’ll be familiar with other maturity models, like Carnegie Mellon’s Capability Maturity Model. These are useful, but they treat each stage as a monolithic capability tier. A capability tier tells you how well a firm does a “fixed thing.”
Our ladder is different. The stages are organizing logics. The logic is the goal the firm is pursuing with the technology. A workshop that installed electric lighting was no worse than Ford’s factory, but it was pursuing a different goal: a safe way to illuminate the workshop floor. You didn’t get to an assembly line by adding more lightbulbs. Factories were pursuing cost savings and everything was organized to fit that goal, from layout to staffing to supplier relationships.
A maturity model will often tell you to do more of what you are doing to move forward. We reckon that what matters is what you are trying to do. And our stages hint at why companies deploying AI might get stuck. A firm does not graduate stage by stage on every axis. An individual developer who is 50% more productive with AI tools but must submit to traditional review cycles will find themselves in a queue. The product team that can prototype faster than ever but needs to wait for sign-offs, will build up a backlog of features. A sales team, supercharged by AI-assisted proposals, may close deals faster than legal can review them.
As firms move towards Stage 2 on execution, those managerial layers, how decisions get made, stay where they were. We call that mismatch congestion. It is the buildup of individual and then team outputs waiting for somewhere to go. To become a Stage 3 firm, you need to rebuild around decision speed between workflows, not the speed of individual workflows. If you are already suffering from congestion, adding more workflows and more output to a blocked decision pipeline will only make things worse.
We discovered this at Exponential View. We could prototype, even build, new features faster than our usual processes for releasing them. We’re figuring out how to fix that – and we’re only a team of eight, so we are sympathetic to the challenges of larger orgs.
To speed up those decisions, AI needs to be able to take them. The role that managerial oversight has may be the thing preventing the cycle time from shortening. AI will need a new cognitive layer that allows the firm to interpret signals without a worker as an intermediary. If a customer sends a feature request to customer support, it typically passes through a support agent and then a product manager, who decides whether it belongs on the roadmap, before a developer codes it weeks or months later. With AI, the signal is observed by an agent. It orients against the roadmap and the codebase. It decides whether the feature is worth drafting. It builds it. Hours, not weeks. This is the Stage 3 firm, the unit drive.




