š®DeepSeek: everything you need to know right now.
The markets went crazy today. A bit too crazy.
My WhatsApp exploded over the weekend as we received an early Chinese New Year surprise from DeepSeek. The Chinese AI firm launched its reasoning model last week, and analysts belatedly woke up to it. The firmās consumer app jumped to number 1 in the Apple AppStore and American stock markets, overly indexed on big tech, are taking a pounding.Ā
Weāve been tracking DeepSeek for a while. I first wrote about it all the way back in EV#451 in Dec 2023, with the question: āIs China the new open-source leader?ā.
And last month when writing about DeepSeek V3, I wrote:
The gap between open source (like DeepSeek) and closed source (like OpenAI) is narrowing rapidly. It calls into question efforts to constrain open-source AI developmentā¦ The elegance of the approach, more refined than brute force, ought to be a wake-up call for US labs following a 'muscle-car' strategy.
It was, I said back in December 2024, āthe āChinese Sputnikā that demands our attentionā.
There are many significant ramifications from the R-1 release and the response to it: ramifications on geopolitics, the speed of AI adoption and, if you hold any of your assets in the Nasdaq, on your own personal wealth.
At the time of writing, about $1.2 trillion has been wiped off the US markets, led of Nvidia getting a hammering.
Your three key takeaways:
Cost breakthrough: DeepSeekās R-1 rivals OpenAIās o1 in performance at 10% the cost, enabling affordable, high-quality AI.
Open-source edge: Freely available, runs on modest hardware, sparking rapid developer adoption and open-source innovation.
Industry disruption: Big Tech (OpenAI, Google, Meta) and Nvidiaās GPU business face pressure as low-cost models challenge closed, expensive systems.
R-1 at a glance
DeepSeek-R-1 was released just last week. It performs about as well as OpenAIās o1 reasoning model but is about a tenth the cost. DeepSeekās non-reasoning model, V3, is similarly disruptive. About as good as GPT-4o but one-fifth the price.Ā
Early benchmarks already show it performing on par with OpenAIās o1 model.
Whatās truly intriguing about R-1 isnāt merely that it matches o1ās quality; itās that it is 90% cheaper and nearly twice as fast. Speed counts.Ā
Today you can reach up to 275 tokens per second if you access R-1 through the American AI cloud, Groq. For reference, a human speaks at around 2-3 tokens per second. R-1 delivers so much more efficiency than o1 for a comparable level of performance, as illustrated by the extra whitespace in the chart below.
I would add that these are just benchmarks. In the real world, things might look different. I put an analytical tasks involving critiquing one of my essays to DeepSeek R-1, OpenAI o1 and OpenAI o1 Pro and evaluated them subjectively.Ā
OpenA1 o1: Time to response: 49s, quality of response: C+
OpenAI o1 Pro: Time to response: 171s, quality of response: A-
DeepSeek R-1: Time to response: 49s, quality of response: B-
My experience is subjective. But R-1 is normally faster and better than o1, but o1 Pro, the pedantic and methodical version, is better.
In the past 30 days, Iāve run 48 queries on various DeepSeek models compared to 160 or so on Claude and 45 on OpenAIās various models. (This excludes using services like Granola or Replit which may call OpenAIās APIs). So DeepSeek is definitely making some headway in my usage.Ā
Many users even prefer R-1ās output to o1ās ā
noted that its writing quality feels ā10x less lobotomized and has 10x more flairā than other models.Deedy Das, an investor at Menlo Ventures, argues that under some constraints R-1 is as good at coding as OpenAIās unlreased o3 model.
Moreover, R-1 is open-source1. Anyone can download and run it on their own hardware. I have R-1-8b (the second smallest model) running on my Mac Mini at home.
This local capability gives R-1 immense scalability.
For context, the previous DeepSeek-V3 model demonstrated how efficient these architectures can be. The model itself is huge, at 671B parameters (or variables). GPT-4 weighs in at 1.76 trillion parameters.
But DeepSeek uses a technique called mixture of experts quite effectively so it only needs to activate 37B active parameters at any time, enough to run on just two consumer-grade Nvidia 4090 GPUs (costing roughly $2,000).
R-1 builds on these efficiency innovations, meaning you can even try its smallest version on a standard laptopāit takes only a few minutes to set up.
So how did DeepSeek achieve such a breakthrough?
R-1 appears to borrow heavily from methods pioneered by reasoning models like o1. The remarkable achievement is how DeepSeek researchers managed to replicate advanced reasoning on what would typically be considered ālower-gradeā hardware, simultaneously lowering both cost and latency. Itās important to note that China already had a range of GPT-4-level models by late 2023āQwen 72B, for instance, exceeded GPT-4 on certain Chinese benchmarks in December 2023. While o1 showcased a new paradigm of reasoning models by scaling test-time compute, DeepSeek demonstrated that replicating such reasoning may not be as resource-intensive as originally assumed.
Whereas the main US labs have access to the latest Nvidia Chips, DeepSeek, at the mercy of US export restrictions does not. This meant they used Nvidiaās cut-down H800 chips. Necessity is the mother of invention. How many? We donāt know.Ā
Our favourite semiconductors analyst,
suggests this was done on 50,000 chips, which would be a huge number. But I think the jury is out. Kai-Fu Lee told me last year that his āteam at ZeroOne.ai managed to train a top-tier model on ājustā 2,000 H100 GPUsā, about an order of magnitude less than the biggest US labs.ĀMy view is that it is more likely the case that DeepSeek found huge numbers of optimisations and was able to do this on the cheap, the $6m number that is hanging around. Software lends itself to optimisations across the board. In signal processing, Fast Fourier Transform algorithm delivers a 100-fold speed-up on other methods. In cryptography, elliptic curves reduced key size about 100-fold for equivalent level of security compared to RSA. Changing compilers can yield 10x improvements.Ā
So we know it can be done.
And here we have an exceptional engineering team who has documented strategies they have used to improve efficiency: using faster, less precise calculations when accuracy isnāt needed, reducing unnecessary computations, and dropping down to PTX, the assembler language the sits just above Nvidiaās GPUs. DeepSeekās own documentation suggests they achieved a 45-fold increase in training efficiency compared to standard practices.Ā (Other good threads on this topic include this one, this one and this one, which is the best.)
This is the remarkable, jaw-drop, moment. The āholy hell! What have we been doing?ā that is echoing around Silicon Valley and beyond right now.Ā It fundamentally shifts the ROI equation.
Impact on key players
The math has changed. Google, OpenAI, Meta, and Nvidia have all bet on capital spending being the path forward and huge amounts of it. Cash would buy chips. Lots of chips. This was going to provide the moat, the source of advantage.
US model makers have been locked into a single paradigm of building ever-larger, more compute-hungry models. After all, the capital markets were willing to fund outsize spending on GPUs, so why not go for it?
With Chinaās venture capital market becoming moribund, local players could not access enough capital. Even those that could, such as the Qwen team from Alibaba or the Doubau from ByteDance, export restrictions would hamper access to compute.
put it aptly when he observed that the history of computing is one of innovation followed by a scale-up, eventually disrupted by a āscale-outā approachāwhen bigger and faster methods are replaced by smaller, more numerous alternatives.According to Steven,
China faced an AI situation not unlike Cisco did in its early years. Many point to the Nvidia embargo as the cause, but the details donāt really matter. The point is they had different constraints: more engineers than data centers to train in. Inevitably, they would develop a different kind of solution.
One thing for certain is that all firms will look at model development practices with an emphasis on driving efficiencies. As I wrote about OpenAIās o3 in December:
Early versions are often expensive, but we can assume that the performance we get at $3,500 will cost us substantially less, perhaps a dollar or two, within no more than a couple of years.
The cost of GPT4 quality results has declined by more than 99% in the last two years. GPT-4 launched in March 2023 at $36 per million tokens. Today, Chinaās DeepSeek offers similar performance for $0.14, or 250 times cheaper.
But what does this mean for who?
OpenAI
Of the model makers, OpenAI is likely the most affected. DeepSeek is a fast follower and it is following really fast, like getting-into-Samās-personal-space fast. Their model that is open-source to boot. In response, OpenAI has already taken defensive measures by offering o3-mini for free. Whether this is cost-effective or a loss leader remains unclear. If a competitor can replicate what you have doneāonly more cheaplyāwithin four months, how defensible is your position?
Microsoft and AWS
Both companies are relatively model-agnostic, focusing primarily on providing computing infrastructure. In Microsoftās case, this also includes enterprise applications. They likely welcome R-1 because cheaper, more efficient models could spur faster and broader enterprise adoption. Satya Nadella has already referenced the Jevons paradoxāan idea I explored in my earlier essayāthat when something becomes cheaper or more efficient, usage expands rather than levels off. Iāll say much more about this below.
Microsoft recently announced CoPilot Studio, which lets customers build agentic workflows. These eat up tokens for breakfast, so having access to cheaper models only makes the product more attractive.Ā
Meta
Meta is distinctly concerned. It has gone from being a leading open-source model provider to trailing behind DeepSeekās R-1. According to The Information:
Leaders including AI infrastructure director Mathew Oldham have told numerous colleagues they are concerned that the next version of Metaās flagship AI, Llama, wonāt perform as well as the Chinese AI, DeepSeek, according to two Meta employees with direct knowledge of efforts to catch up.
The company has organised āwar roomsā to analyze how R-1 was trained at such low cost and may restructure Llama to mirror DeepSeekās approach.
Google
Very few people really talked of Google today, perhaps it is just where the vibe sits around the firm. Despite its impressive speed, large context window and competitive pricing, Googleās Gemini Flash Experimental 2 reasoning model has received surprisingly little attention from the tech community.
The large context window is a real winner. You can dump a million tokens in it (several books) and it can reason back to you with up to 65,000 tokens. Itās a great product. However, we can assume that Google hasnāt been as deft as Deepseek in its optimisations, relying on its practically infinite access to compute. That attitude may change.
Apple
For Apple, this might be good news. The company lags in AI research. Its most interesting models have tweaked around the edges ā for example, they built the MM1 model family whose largest model is only 30 billion parameters. There internal efforts have been focused on on-device AI, while its data centre investments lag the others. An open-source model that can be locally tweaked and improved might offer Apple an unexpected advantage.Ā The markets, which have pushed Apple stock up about 3% seem to have recognised this.
Nvidia
Prima facie, Nvidia may have the most serious reason for concern. As Jeffrey Emanuel argues, DeepSeekās demonstration of state-of-the-art performance with far less compute undermines Nvidiaās key growth driver, large-scale GPU sales. If the broader industry realises it can run top-performing AI models at a fraction of the usual costāand if those methods are open-source and easily replicatedādemand for expensive H100s could drop significantly. Nvidiaās forward sales multiple of 20x and 75% gross margins make it especially vulnerable to any major shift in GPU orders.
Nvidiaās 75% gross margin is a huge opportunity for AMD, Cerebras and Groqā¦ and it comes under more pressure if we can do less with more.Ā
I say prima facie because itās not as compute has ever been demand limited. Never in history. Quite the contrary: we always want more compute, whether biological or silicon. As Pat Gelsinger, the former boss of Intel, pointed out today:
Computing obeys the gas law. This means, it fills the available space as defined by resources (capital, power, thermal budgets etc). As we saw in CMOS, PCs, multicore, virtualization, mobile and numerous othersm; making compute resources broadly available at radically lower price points, will drive an explosive expansion, not contraction, of the market. AI will be in everything going forward and today, it is orders of magnitude too expensive to realize that potential.
There are occasions where a firm has a utilisation issue, because they have had to scale their compute for peaks and have it lying fallow during the trough. The story is this was the genesis of AWSāand of DeepSeek, too.
But the market yearns for compute. Price pressure only increases consumption.Ā
Although, a 75% gross margin is pretty juicy and I think itās not unreasonable for investors to question their assumptions about Nvidia. Not least because markets donāt like surprises. If they have been surprised by DeepSeek seemingly making a mockery of OpenAI and Metaās approach to models, what other surprises are there? A killer GPU waiting in the wings? It isnāt so absurd. After all, it was just a year ago that Huawei surprised the world with a new phone built on an advanced homegrown chip.Ā
The market depended on this
The US stock market depends on technology. Since the Global Financial Crisis on 2008, the Nasdaq 100 is up close to 20-fold. The recent run of tech, led by Nvidia, is premised on ongoing US AI dominance. And that dominance meant scaling compute.Ā
With nearly a third of the US market held by retail investors (late to the party, early to leave) and a similar proportion of equities in passive funds, the edifice of the market is pretty vulnerable to a change in vibes. If the correction had legs, it could spill into the real economy.Ā
And of the large companies, on track to spend nearly $300 billion in capital equipment to serve up powerful and expensive models?Ā
Itās incredibly unlikely they can walk back from those commitments. It would be a bloodbath in the markets, the sign of management teams that canāt see a month ahead, let along years ahead.
But equally, they canāt build out assets that are underutilised. The first questions Wall Street would ask on earnings calls would be: āso what is the utilisation of the new datacenter you put $50 billion of shareholders capital into?ā
Pat Gelsinger is obviously correct. Applications consume the compute available, but it doesnāt simply happen magically. New, compute-heavy applications need to be built and they need to be marketed to eager customers.Ā
So the simple reality is that the demonstration of lower cost, more efficient models creates both the helpful economics for developing new, more sophisticated applications and the incentives to ensure those applications are widely used.Ā
Big techās hand forced
Iāll put forward a few ideas for what we can do with these models.
When a system can process 250 tokens per second, the possibilities for real-time AI expand significantly. Customer service agents can respond almost instantly and latency can be low enough for seamless integration into gaming. More intriguing still is the prospect of multiple AIs thinking and conversing with one another at speeds beyond human comprehension. This could give rise to collaborative AI networks, the society of AI, filled with agents communicating with each other 50-100 times faster than humans do.
For instance, consider a network of a hundred high-end reasoning models operating in both collaborative and adversarial modes. They could tackle complex challengesāranging from climate modeling to financial market simulationsāby pooling diverse perspectives, cross-verifying each otherās results and iterating on solutions collectively. In adversarial mode, the models would identify weaknesses or blind spots in each otherās logic, driving continuous refinements and potentially giving rise to emergent intelligence. Such an approach might also lead to breakthroughs in content creationāgenerating personalised narratives, interactive storytelling or even new scientific insightsāall at a speed and scale that no single model could match.
Token-intensive tasks become newly feasible.
Open vs. closed-source
DeepSeekās choice to open-source R-1 is already having a profound effect on the AI ecosystem. Developers are building on top of R-1. See for example this fully local AI research assistant emerging within days of release. HuggingFace is also aiming to reproduce and fully open-source R-1, including the training data and scripts that DeepSeek has not yet disclosed.
The power of open-source lies in its compounding nature: each breakthrough builds on prior innovations, accelerating the pace of discovery. This approach challenges proprietary model makers to consider their own openness. As startups, academics, and hobbyists flock to R-1ārather than o1āit seems that the gap between open- and closed-source platforms may soon close entirely. In fact, with R-1, it appears that moment has already arrived.
Ecosystem considerations
DeepSeekās origins may be just as intriguing as its technical breakthroughs. The model was developed as a side project by High-Flyer, a hedge fund rather than a conventional AI labāunderscoring the growing synergy between AI and quantitative finance. In the United States, it is commonplace for some of the brightest minds from top universities to gravitate toward finance roles, drawn by high-paying opportunities in hedge funds and investment banks. As Arnaud Bertrand points out, this often leaves fewer brilliant graduates working on innovations far away from tech.
China looked at the Westāthe U.S. in particularāand saw the overbearing importance of the finance industry at the expense of the real economy. Many of the countryās most brilliant graduates from Ivy League schools ended up in the increasingly parasitic finance industry instead of working on projects that move society forward. Simply put: you want your best minds building real value, not just extracting it.
The success of DeepSeek highlights the potential reallocation of talent that may lie ahead. Rather than remaining tied to traditional finance, highly skilled technologists are beginning to shift toward AI research and development. This phenomenon can be described as a ābrain drainā in traditional sectors, with specialists in mathematics, statistics and computer science moving into AI instead of staying in finance or related fields. (And the team is on fire. As I type this, they just released a new multi-model opensource model.)
At the same time, this cross-pollination between finance and AI can yield powerful results. Quantitative finance often relies on advanced optimization and algorithmic techniquesāexactly the kind of methods that can be invaluable in training and refining AI models. By merging financial ingenuity with cutting-edge machine learning, projects like DeepSeek push the boundaries of what is possible, even if they emerge from unexpected corners of the tech-finance landscape.
Proliferation
We need to acknowledge that advanced AI models will now proliferate further. As
points outChinaās best model training crew come out with a powerful reasoning model - and show how to turn any other model into one.
The ideas of control in AI policy
get harder if you need fewer than a million samples to convert any model into a āthinkerā. There is now an open weight model floating around the internet which you can use to bootstrap any other sufficiently powerful base model into being an AI reasoner. AI capabilities worldwide just took a one-way ratchet forward.
A huge accelerant
Overall the dynamics that led to the development of DeepSeek R-1 point to a huge accelerantā¦
It changes the cost base for start-ups and corporates using GPT-4 or o1 models. Now they have a dramatically cheaper model. Expect an acceleration of AI adoption.
Speed and low costs open new classes of applications like networks of reasoning models arguing with each other.Ā
Open-source has a compounding innovation cycle. We'll only see better technology emerge as people tinker. Expect further cost reductions and capability improvements.
We are seeing the smart kids working in finance move into the real economy. Look what happens when your best and brightest do something other than chasing arbitrage.Ā
And no, this does not mean that all that Capex was for waste. The Jevons paradox will mean itās put to more use. There is still more room for technical innovation in AI, both to improve capabilities and efficiency. The approach used to develop o1 and R-1 can still be scaled, as highlighted by the breakthroughs in o3ās performance. And letās say that even if we have exhausted all technical improvements (we have not), more compute means more economic activity can be done by these reasoning models ā meaning more of the economy can benefit.
It will rightly lead to a scrutiny of who the current AI leaders are. But it means even more on the macro scale.
š The year of the wooden snake
On Wednesday, China celebrates traditional New Year, the year of the Wood Snake, a once-in-sixty-year occurrence.
I asked my friend Christel what the Wooden Snake symbolises. She says (with my emphasis):
It teaches us āto bend like willow, yet rise like oak.ā
The Wooden Snake is a bridge between worlds. Its scales shimmer with the verdant vitality of the Wood element, symbolising growth, resilience, and the quiet strength of saplings bending in storms. Yet its gaze holds the lunar coolness of Yin, the feminine essence: introspective, intuitive, and veiled in the mystery of moonlit tides.
But do not mistake Yinās stillness for passivity. The Snake, though coiled in contemplation, is a shapeshifter. It knows when to strike with Yangās fiery precision; the masculine force of action, sunlight, and unapologetic boldness.
In 2025, the cosmos demands harmony between these dualities. Plans will unravel; surprises, like sudden storms, will drench the unprepared. Yet the Wooden Snake does not fear chaos. It thrives in adaptation, teaching us to shed old skins and rebuild with grace.
I wonder what New Yearās surprise DeepSeek has planned for us!
XÄ«nniĆ”n kuĆ ilĆØ!
Azeem Azhar
Technically, āopen-weight,ā since the training and data-processing code remain undisclosed)
I think the open source insight that was very explicitly called out by DeepSeek's CEO is part of a bigger story about organisational cultural transformation, that's also relevant to *every* organisation.
Indeed, over a year ago I wrote how AI would trigger a new, unconventional talent strategy:
"'Giving awayā solutions to systemic problems is an amazing way to attract and retain exceptional talent...
AI is radically reducing the ācost' of innovation ā empowering small, brilliant and hyper-productive teams to move faster and reach further. DeepMind has less than 1/100th of the number of employees that Google has, yet its leveraging AI to transform multiple scientific fields at an astonishing pace...
It wonāt just be āAI companiesā that experience this exponential empowerment.
AI is a general purpose technology. Whatās happening today within the AI industry will happen tomorrow in your industry."
Read more: https://thefuturenormal.substack.com/p/designing-a-people-first-ai-strategy-488
Great post Azeem...Pmarca could have at least credited YOUR Sputnik analogy!!??