☄️ DeepSeek‘s shock: 9 critical things you need to know
The trillion dollar sell-off isn't the real thing to focus on
Happy Chinese New Year!
And what a New Year’s surprise: DeepSeek is still making the headlines.
Donald Trump urged:
The release of DeepSeek AI from a Chinese company should be a wakeup call for our industries that we need to be laser focused on competing to win.
Sam Altman had a more measured response:
DeepSeek’s R1 is an impressive model, particularly around what they’re able to deliver for the price. We will obviously deliver much better models, and it’s legit invigorating to have a new competitor! We will pull up some releases.
And when I spoke with Bloomberg yesterday, I said:
If you work in tech, what you’ve seen over the last thirty to fifty years is that constant decline in the cost of the underlying technology. And these executives will also know that it’s not just the hardware.
Software optimizations are actually often much bigger in magnitude than hardware ones. So I believe these executives would have known, would have expected, there to be significant cost declines and efficiencies.
The first time you build a frontier AI model, you don’t care about efficiency. You care about performance. You’ve got time in the months after to improve it. I think what is surprising is [Deepseek’s] timing, I suspect that people are a little bit shocked.
On Monday, we shared a comprehensive overview of what DeepSeek’s rise means, but this is a fast-moving situation. We all had time to digest it further.
Here are nine critical updates. There is also a rather beautiful graph
Market lurches
Proven methods, huge impacts
IP disputes ignite fast
The scaling race powers on
Cheaper AI, bigger demands
Software undercuts hardware
Chip control matter
Guardrails? Still unclear
AI mainstream, future looms
Update 1: Sell now, think later
Takeaway: The markets over-reacted. I expected them to recant. And they did.
As I wrote in my earlier commentary: “The markets went crazy today. A bit too crazy.” $600 billion was wiped from Nvidia alone, the single largest single-day sell-off in history. They’ve since stabilised somewhat. Chipmakers like Nvidia and ASML—both initially hammered by DeepSeek-induced jitters—have rebounded. Investors quickly realized that making the most capable AI models 90% cheaper is undoubtedly going to fuel more AI adoption.
As
, a technology Infrastructure Analyst at New Street Research, aptly said:This is the Moore’s law of GenAI. Every time the cost of transistors came down 2x we ended up buying more than 2x transistors. And the same dynamic will likely play out with GenAI.
Lower costs free up enterprise budgets for additional AI-driven solutions, potentially driving broader market expansion. Of course, markets are markets, so don’t write anything off.
Update 2: Breakthrough or best practices?
Takeaway: DeepSeek has combined known methods—reinforcement learning, mixture of experts and multi-token reasoning—into a finely tuned recipe. These are best practices rather than magic.
There’s an argument to be made that R1 represents a near-perfect integration of standard best practices: mixture of experts to lower compute requirements, multi-token prediction to speed up responses, and a large-scale RL phase that automatically verifies code or math solutions to refine reasoning. These have all been used before in other models.
DeepSeek’s ability to produce a strong AI model at a lower cost does not represent a radical new breakthrough in AI economics; it is roughly “on trend” with the continual drop in training costs. I said as much when discussing OpenAI’s o3 model back in December:
early versions are often expensive, but we can assume that the performance we get … will cost us substantially less within no more than a couple of years.
And Dario Amodei, the boss of Anthropic, put the case even more clearly:
Every frontier AI company regularly discovers many of these [software optimisations]: frequently small ones (~1.2x), sometimes medium-sized ones (~2x), and every once in a while very large ones (~10x). Because the value of having a more intelligent system is so high, this shifting of the curve typically causes companies to spend more, not less, on training models
Dario goes as far as to argue that
DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLMs; it’s an expected point on an ongoing cost reduction curve.
I’d suggest it is an expected point on that curve, but we’ve likely got here much more quickly than expected.
It’s a testament to how effectively these methods can be combined—and a sign that “innovation” sometimes looks a lot like engineering the best stuff right.
For example, James Watt’s meticulous improvements to the steam engine between 1763 and 1776. He introduced a separate condensation chamber, an insulating jacket for the main cylinder and a better boiler to harness steam expansion— all incremental changes that collectively slashed coal usage by 75% for equivalent output. The result, of course, was a vast expansion of the use of coal in steam engines in England.
There is something unique this time: the company that was first to demonstrate the expected cost reductions was Chinese.
Update 3: Protecting Your IP
Takeaway: DeepSeek’s alleged use of o1 outputs highlights the next big legal frontier: is training on a competitor’s proprietary model output a valid “fair use” derivative or a violation of intellectual property?
R1 improved a bunch of open-source via distillation, but was R1 developed itself via training off OpenAI’s API, and responses from GPT4 and o1? I was told that last week by someone running large parts of AI infrastructure who was well placed to make the claim.
OpenAI claims it has evidence to that effect. The anecdotal evidence is quite convincing, asking DeepSeek’s V3, “what model are you?” it will reply ‘ChatGPT’ the majority of the time.
OpenAI’s statement underscores this concern:
We know [China]-based companies — and others — are constantly trying to distil the models of leading US AI companies. We engage in countermeasures to protect our IP, including a careful process for which frontier capabilities to include in released models, and believe . . . it is critically important that we are working closely with the US government to best protect the most capable models from efforts by adversaries and competitors to take US technology.
Keep reading with a 7-day free trial
Subscribe to Exponential View to keep reading this post and get 7 days of free access to the full post archives.