🔮 The only AI curve that matters

What 99%-reliable AI could look like by 2030

Oct 03, 2025

∙ Paid

There is one AI metric that we keep a really close eye on:

How many actions can a system take at 99% reliability before a human must intervene?

We call this the 99% step-length: the number of sequential actions an AI can execute with at least 99% reliability without human help.1

Today’s frontier systems reliably manage around 100 steps at that threshold. By our estimates, the number could exceed 10,000 by 2029. A couple of years later, they might have between three and ten times that range. At that scale, an AI system could operate for weeks – potentially months – without supervision.2

Today, we’ll explain our 99% benchmark and show where we believe the step length of AI could go if the trends continue. Many things could derail this trend, but it’s really important to understand what the world could look like if it does continue.

Autonomy is an illusion below 99%

Earlier this year, researchers at METR released work showing the length of time that AI can work for on software and coding tasks before failing. It is a great benchmark that we use regularly.

METR’s headline is this: every seven months or so, AI systems are able to undertake tasks twice as long as previously. Their methodology is sound, but they benchmark task length at 50% and 80% success rates. I’ve found that execs often question the usefulness of those levels. A process that works half the time isn’t really one they want to trust.

Even at 90%, one failure in ten attempts would need constant human monitoring. Around 99%, you approach the threshold where autonomous operation becomes viable.

Where are we now?

Continue reading this post for free, courtesy of Azeem Azhar.

Or purchase a paid subscription.

Exponential View

🔮 The only AI curve that matters

What 99%-reliable AI could look like by 2030

Autonomy is an illusion below 99%

Where are we now?

Continue reading this post for free, courtesy of Azeem Azhar.