DeepSeek's Deep Shock to the US AI Behemoths
Prabir Purkayastha
THE tech world was shocked when a little-known Chinese company released an AI model called DeepSeek that appears to match the Open AI's most advanced models while spending a small fraction of its cost. The tech world has been buzzing for the last month with leading the US tech investors first following Nvidia's performance with bated breath and then bemoaning that the AI's Sputnik moment – DeepSeek's AI models – had wiped nearly a trillion dollars of leading tech companies. Interestingly, Nvidia, which manufactures high-end Graphic Processing Units (GPUs), took the biggest hit, losing nearly $600 million in one day.
GPUs were originally developed for the parallel processing of image data, hence the name, but are now used for all parallel computational tasks, including AI models. The other feature in the eye-popping advance of this Chinese frugal innovation is not simply that it built its most advanced models at 3-5 per cent of the cost incurred by OpenAI, Anthropic, Google, Meta, etc., but that it came in spite of stringent sanctions imposed by the US (with bi-partisan support) on the export of advanced chips to China.
The specific target to cripple China's AI advance was not to allow advanced GPUs thought essential for any major AI advance. Sam Altman, the prevailing guru of OpenAI, had dismissed during his tour of India last year that any attempt to match AI advances of the big US tech companies in building foundational AI models with small investments and a much smaller team was "totally hopeless". Almost in the same vein, India's tech guru Nandan Nilekani had argued that India should not build the basic AI models but only use them in their work, ceding the tech baton completely to the US. His opinion was hotly contested by Aravind Srinivas, Cofounder and CEO of the AI company Perplexity.
Sam Altman was obviously wrong. Not only did DeepSeek create a model on a shoe-string budget that can go toe-to-toe with companies that have spent hundreds of millions of dollars, but it did it using hardware that was "designed" to precisely hamstring such advances. The H-800 chips was developed by Nvidia specifically for the Chinese market, and supposed to prevent such AI advances. The tech world is waking up belatedly to the simple historical truth that it is difficult to stop science and technology advances using just a bunch of trade restrictions.
The AI models we are discussing here are not the ChatGPT or DeepSeek chatbots that answer your questions, do some decent summaries and even create research summaries, all of which can be considered as superior versions of a Google search. After having "ingested" (fed with) virtually all internet content, there is not much that ChatGPT tools can be stretched to generate new insights. The new models, while using Large Language Models (LLMs), the basis of ChatGPT and their counterparts, additionally have reasoning models built on reinforced learning. It has also been argued that for the holy grail of Artificial General Intelligence (AGI), the machine counterpart to biological intelligence, reasoning models are the way to go. Even if the goal AGI is not as close as Sam Altman and his AI tribes would have us believe. The new advances that we are talking about are in the reasoning models, and here DeepSeek has been able to create models ahead or on par with what the US digital behemoths can do. Or, as a news headline states on DeepSeek models: Did China Just Eat America's AI Lunch?
What has shocked the tech world is not that China has matched the AI development of the US tech giants, but a company worth only $8 billion with no previous tech feats has managed this by spending a small fraction of the cost: they spent just two months and under $6 million to build an AI model comparable to OpenAI's. On top of that, they did it using Nvidia's crippled H800 chips (to conform to US restrictions for exporting hardware to China). For those who are deeply suspicious of any Chinese claims, DeepSeek has not only open-sourced the model but has published detailed papers documenting what their team has done.
So what is the company behind DeepSeek, and who are they? The people behind DeepSeek are a bunch of what in the financial world are called "Quants". Quants are mathematics, modelling and programming people who work in the financial world. They were held responsible for blowing up Wall Street in 2008, the subprime disaster for the global markets. Though quants were partially discredited after the market meltdown of 2008, the world of finance cannot do without them. In China, the financial markets are more tightly controlled. The quant who set up DeepSeek is Liang Wenfeng, who, after a stumble in which his funds lost about a third of its $12 billion value in 2012, decided to channel some of his money and a team of his quants into AI.
It is not that DeepSeek found some brand new mathematics to solve the problem of AI. Instead of just throwing money and computing power at the problem, they decided to do some clever engineering to build and release two new models. These models, analysed by Jeffrey Emanuel (and also others), a well-known techie familiar with the area, who writes, "have basically world-competitive performance levels on par with the best models from OpenAI and Anthropic (blowing past the Meta Llama3 models and other smaller open source model players such as Mistral). These models are called DeepSeek-V3 (answer to GPT-4o and Claude 3.5 Sonnet) and DeepSeek-R1 (answer to OpenAI's O1 model)." The price? At the most 5 per cent of what others have or would have spent. Emanuel's guesstimate is that DeepSeek is 45x-50x more efficient than other cutting-edge platforms.
Not only have the DeepSeek models been released in the public domain, but they have also been released as open-source models under an MIT license, with the weights in the model available on GitHub. They have also released two detailed technical reports explaining each step of the way what they have done. So the models, the theory, and how they analysed and solved the problems are all set down in a way that people can not only track and use what they have done, but can also reproduce and run them on their servers.
There are three major implications for the digital world with the DeepSeek market shock. One is that Nvidia, the major beneficiary of the AI boom, is in for a major correction of its stock price. That is already visible. The second is that many more players will now be willing to enter the AI race, knowing that the entry price is not as steep as the biggies had told them: the race is not necessarily won by the biggest; just as it happened in animal evolution! The last is that technology sanctions don't work. It did not against India in the nuclear and space sectors; neither has it worked against China's AI developments.
And that is not all. If scaling up computing power is not the only way to improve models and get a market edge for AI players, do we need the huge data centres that the AI industry was planning? For those who remember the development of microprocessors and the PC revolution, is the DeepSeek moment likely to provide a similar shock? Remember the world of IBM when huge rooms were built for IBM machines, which were the hallmarks of computing advances? It is this expectation that led to Trump announcing the $500 billion StarGate project of OpenAI on the second day of his new presidential term. Implicit in this vision was a large number of data centres housing very large arrays of powerful GPUs, almost entirely from Nvidia. This brought into focus the issue of energy, as such data centres would also be huge energy guzzlers. The plan, which dovetails with Donald Trump's vision of "drill, baby, drill, was to use natural gas. That, of course, would cause a jump in US greenhouse gas (GHG) emissions. Without the need for such an immediate energy demand, natural gas in the US is finding it difficult to compete with solar and wind energy, whose costs have dropped and continue to drop below that of natural gas. So, not only has DeepSeek essentially deep-sixed the concept that bigger is better, but it has also reduced the threat of a rapid increase in US GHG emissions.
As a well-known philosopher has said: "There are decades where nothing happens; and there are weeks where decades happen." This appears to be one of those moments. At least for AI.