DeepSeek-R1 is a new generative artificial intelligence model developed by the Chinese startup DeepSeek. It has caused a considerable stir as it is positioned as a competitor to leading models such as OpenAI o1. At the same time, DeepSeek-R1 is open source and more cost-effective than most AI models on the market. Let’s figure out how the Chinese managed to do it. And is it really that simple…
Artificial intelligence from DeepSeek outperforms ChatGPT in the App Store ranking, dropped Nvidia shares by 12%, and Meta and Microsoft by 4%. European companies ASML and Tokyo Electron also suffered losses There were also cases related to AI and cryptocurrency mining. And this is not all that Chinese AI managed to do in just a week.
According to data consulting company Preqin, US investments in artificial intelligence in 2023 exceeded China’s sixfold, amounting to $26.6 billion versus $4 billion. How did it happen that China managed to overtake the Americans in less than two years?
OpenAI and Google did not disclose the exact cost of training AI models (which is the most difficult and painstaking part of the work), such as GPT-4 and Gemini. But it is obvious that this is a terribly expensive business. When OpenAI released GPT-3 in 2020, cloud provider Lambda suggested that training this model with 175 million parameters cost more than $4.6 million using the Tesla V100 cloud instance. OpenAI doesn’t disclose the size of GPT-4, which it released a year ago, but reports have mentioned between 1 trillion and 1.8 trillion parameters.
In addition, OpenAI CEO Sam Altman vaguely estimated the cost of training at «more than» $100 million. Anthropic CEO Dario Amodei suggested that «by 2025 we can have a model worth $10 billion».
Report Epoch AI’s May 2024 technical assessment showed that the volume of training computations of advanced AI models is growing 4-5 times a year. Thus, according to average estimates, the cost of training the most expensive AI models will reach $140 billion by 2030 (excluding researchers’ salaries).
If OpenAI used Nvidia V100 GPUs in its supercomputer, then to complete GPT-4 training need to about 5-6 months.
In other words, the most expensive, complex, and time-consuming process of creating a model is preparing the data on which it will be trained.
And this is where DeepSeek comes in, claiming to have developed V3 in just 2 months and spent only $5.6 million. At the same time, while the leading companies use up to 16,000 integrated circuits, the Chinese use only about 2,000 NVIDIA H800 PCIe chips. And some versions of DeepSeek models can be run locally. How is this possible?
Content
It all started with the release of the open-source DeepSeek-Coder model in November 2023, followed by — DeepSeek-LLM, which could also generate text. In April 2024, an updated version of DeepSeek-Coder called DeepSeek-Math was released. In the same year, two updates of DeepSeek-LLM were released: V2 and V2.5. In November, a preliminary version of DeepSeek R1 was released, based on DeepSeek-V3-Base. At the end of the year, DeepSeek-V3 — an update of DeepSeek-V2, — was released, and DeepSeek R1 was created on its basis, which made a splash in early 2025.
DeepSeek-V3 was trained on 14.8 trillion tokens, and DeepSeek R1 — on DeepSeek-V3-Base training data and about 800 thousand more samples. Queries on R1 are 98% cheaper than on ChatGPT. Despite the US restrictions on exporting powerful chips, DeepSeek used available Nvidia H800s and some of its innovations.
At the same time, the generative AI model DeepSeek-R1 is open source and has the performance of closed models, such as o1 by OpenAI.
DeepSeek R1 is based on a large basic model called DeepSeek-V3and uses the architecture of Mixture of Experts (MoE), which allows it to efficiently process complex tasks by activating only a fraction of its parameters during computation. The total number of parameters is 671 billion (the model takes up 400 GB), but only about 37 billion are activated during the processing of each request, which strikes a balance between performance and efficiency. That is, it saves time and resources.
For this purpose, submodels with different expertise (mix of experts) are created. Depending on the user’s request, only the necessary experts are activated, and resources are allocated among them.
The architecture of the MoE (which, by the way, can be part of AI agents) consists of several independent neural network experts specializing in different aspects of data processing. The main components of this architecture:
Gating Network
Experts
Combining results (Weighted Summation)
Thus, MoE uses only a fraction of the experts during computation, which reduces the cost of inference (model operation, its reasoning). Also, experts can specialize in different types of tasks, which makes MoE more powerful than traditional transformer models (like ChatGPT). In addition, increasing the number of experts does not require a significant increase in training costs. However, when using MoE, it is necessary to properly configure the gating network to avoid overusing some experts when it is not necessary. Even if only a part of the experts is activated, the entire model should be stored in memory.
DeepSeek R1 uses a MoE architecture with 64 experts, of which only 2 or 4 are activated during the processing of each query.
Another important feature of DeepSeek R1 is its ability to generate a «chain of thought» (Chain of Thought, CoT) before generating an answer. This approach allows the model to improve the accuracy and logicality of its answers, especially in complex tasks requiring multistep reasoning.
Interestingly, DeepSeek-R1 was trained using reinforcement learning without a preliminary stage of controlled fine-tuning. This approach allowed the model to develop reasoning and decision-making skills based on feedback, making it capable of complex logical inferences and generating coherent text.
To increase efficiency and reduce computing requirements, the developers used a distillation method that involves creating simplified versions of the model. They retain the main features of the original, but have fewer parameters. In particular, they created models based on Llama (LLM by Meta AI) and Qwen (LLM by Alibaba) with 32 and 70 billion parameters.
Distilled versions can be deployed locally on your hardware:
There are several other versions that you can find and download at this link.
DeepSeek R1 is most often compared to the OpenAI o1 model. In terms of performance, it demonstrates similar results in math, coding, and reasoning tasks. However, the Chinese model achieves these results at a much lower cost. Using DeepSeek R1 costs about $0.55 per million tokens, while OpenAI o1 costs about $15 for the same volume.
How to said According to Andriy Nikonenko, Machine Learning & Data Science at Turnitin, independent tests have shown that GPT-4o1 is slightly superior to DeepSeek-R1. The latest Anthropic Claude Sonnet 3.5 and Google Gemini 2.0 are better than DeepSeek-V3.
At the same time, DeepSeek-R1 and V3 are strong open-source models, setting a new high baseline for LLM and outperforming LLaMA models. In addition, R1 may become a new benchmark for open-source reasoning models, making low-cost production AI more accessible.
Let’s start with the good stuff:
This way, ordinary users can enjoy the competition that makes technology more accessible. The recent release of the AI model Sky-T1which is also capable of reasoning and costs only $450, has shown that powerful models can be cost-effective.
The advent of DeepSeek-R1 pushed OpenAI to open up general access to powerful models and lower prices for some services.
And now for the bad news.
Do not forget that DeepSeek-R1 is an artificial intelligence model from China. Therefore, you should be careful when using it. Chinese gadgets have repeatedly been caught stealing their owners’ information. As expected, DeepSeek is not doing well either. Recently, Wiz Research found out. The DeepSee database, which was open to the public on the Internet, was not properly secured. This allowed anyone to access over a million records, including user chat history, API keys, and other system parameters.
Most importantly, this vulnerability allowed for complete control of the database and potential privilege escalation within the DeepSeek environment, without any authentication or protection mechanism from the outside world.
Wiz researchers discovered the vulnerability by noticing ports 8123 and 9000 open on DeepSeek’s servers, which pointed to a publicly accessible interface to the ClickHouse database. After reporting the problem, DeepSeek closed access to the interfaces. However, it is unknown how much data was copied by unauthorized persons.
Also, some studies have shown that DeepSeek-R1 can disseminate information that is in line with Beijing’s official position and does not always provide accurate data. DeepSeek avoids answers to 85% of questions on «sensitive topics» related to China.
For example, when discussing politically sensitive topics such as the Tiananmen Square events
In addition, there are concerns about the privacy of user data. There is already information that DeepSeek collects data about usersincluding hardware: IP addresses, phone models, language, even «keystroke patterns or» rhythms. And then it sends it to servers in China.
In addition, it alsoand when DeepSeek-V3 was launched, there were suspicions of data theft from OpenAI. During testing, the Chinese AI model called itself ChatGPT. Later, one of the ChatGPT developers stated that DeepSeek uses OpenAI data to discipline. Also, former Meta developer Yangshun Tai noticed suspicious compatibility of DeepSeek and OpenAI libraries. Thus, the Chinese company saved weeks of development of Node.js and Python client libraries by simply using OpenAI libraries.
It’s worth remembering that DeepSeek is a relatively new player in the artificial intelligence field. According to Wikipedia, it was founded in April-May 2023. The company’s hiring strategy focuses on technical aptitude rather than work experience, which leads to a workforce that consists mostly of recent graduates or developers with less established careers in AI.
News about the technical and financial benefits of DeepSeek’s AI models led many organizations and startups to rush to implement these tools in their products. However, they forgot that such steps also involve the transfer of confidential data. And this requires a high degree of trust…
In 1956, the term «artificial intelligence» (AI) was first coined during the Dartmouth Conference in the United States. This event became the starting point for active research in this area. By the way, we owe the emergence of the Turing test to this conference. Alan Turing, a British mathematician, proposed the idea of a test that would allow us to determine whether a machine can demonstrate intelligent behavior that is indistinguishable from human behavior.
So, in 2017, China announced its plan to become the world leader in AI by 2030. Significant financial injections into the research and development of this niche, as well as support from the government, contributed to rapid progress. Chinese companies Baidu, Alibaba (which includes the well-known AliExpress), and Tencent have begun to develop AI technologies, and the government is introducing AI in various areas, including security and healthcare.
At the end of his presidency, Joe Biden signed a decree to facilitate the faster creation of computer data centers and other infrastructure for AI development in the United States. With Donald Trump coming to power in 2025, he declared his intention to make the United States a leader in AI technologies. He has already started a project called Stargateled by OpenAI, the Japanese conglomerate SoftBank, and Oracle. It envisages investments of $500 billion in the development of artificial intelligence (AI) infrastructure in the United States and the construction of 10 data centers in Texas over the next 4 years, followed by other states.
And just a couple of weeks later, DeepSeek R1 was released. Because this model can do the same thing as o1, but for free, OpenAI is forced to consider halving the cost of ChatGPT Plus subscription: to $10. Against the backdrop of an escalating race with China, OpenAI has even granted US government agencies special access to its models with dedicated infrastructure: ChatGPT Gov.
And then came the second blow. The Chinese giant company Alibaba released its own generative AI model called Qwen2.5-Max. It claims that it is better than DeepSeek V3.
Meanwhile, the European Union is trying to keep up with its competitors. In 2023, the EU Law on Artificial Intelligence (AI Act), which came into force on August 1. It establishes rules for the development and use of AI aimed at ensuring the safety and ethicality of technologies. As for the race in AI technology, while the US and China are actively investing and showing results, the EU has just launched a «simplification» program to cut red tape and spur innovation.
At the end of January 2025 published the European Commission’s document entitled «Competitiveness Compass», which sets out the EU’s economic development plan for the next 5 years, including measures aimed at developing «green» technologies, artificial intelligence and quantum computing. It proposes to create «AI Gigafactories» that will allow startups and researchers to train and develop models. A separate strategy will focus on the development of this AI technology in sectors such as manufacturing, automotive, and financial services. The proposals also include initiatives aimed at developing biotechnology, robotics, and space technology.
EU executive Ursula von der Leyen emphasized that the EU’s business model over the past 20-25 years has relied on «cheap labor from China, probably cheap energy from Russia» and «partly on security outsourcing» but «those days are over».
On January 30, the European Commission decided the fifth annual Work Program under the European Defense Fund (EDF 2025), allocating more than €1 billion for joint defense research and development projects. The EDF 2025 Work Program covers the artificial intelligence technology challenge as well as a request for research and development (R&D) «to promote synergies between civilian and defense innovation, focusing this year on space, energy resilience, ground combat and cyberspace».
The AI race is gaining momentum every day. While the US is leading the way in basic research and development of innovative technologies, China is focusing on large-scale implementation of AI in various industries, taking advantage of the large amount of data and rapid implementation. The EU, as usual, is lagging.