CEO OpenAI Сем Альтман / Depositphotos
ChatGPT developer comments Financial Times claimed to have evidence that China’s DeepSeek used OpenAI data to train a competitive artificial intelligence model.
We are talking about the presence of features «distillation» — a technique that developers use to improve the efficiency of smaller models by using the results of larger and more powerful versions. Distillation is a common practice in the industry, but given that DeepSeek could have used it to create a competitor, it violates OpenAI’s terms of service.
«The problem is that the model was created for its own purposes,» a person close to OpenAI told the publication.
OpenAI’s terms of service state that users cannot «copy» any of the company’s services or «use the results to develop models that compete with OpenAI».
Issue of the latest DeepSeek reasoning model shook up the artificial intelligence market, by crashing the shares of key companies in the industry. Nvidia alone suffered a drop of more than 17% and lost almost $600 billion per day — on Tuesday, the situation stabilized somewhat and the company’s shares gained 9%.
It is reported that OpenAI, together with its key partner Microsoft «, investigated accounts allegedly owned by DeepSeek and used last fall and blocked them on suspicion of distillation».
Earlier, entrepreneur David Sachs, who is responsible for the development of artificial intelligence in the Donald Trump administration, said that «data theft is quite possible».
«There’s a technique in artificial intelligence called distillation … where one model learns from another model and sort of sucks the knowledge out of the parent model,», Sachs told reporters. «There’s substantial evidence that DeepSeek has done the same thing, extracting knowledge from OpenAI’s models, and I don’t think OpenAI is very happy about that».
Experts say that for smaller Chinese and American AI labs, stealing training data from companies like OpenAI is commonplace — since a full-fledged training process requires a lot of investment. As a reminder, DeepSeek claimed to have used a cluster of 2000 Nvidia H800 graphics cards and a total of $5.6 million to train the V3 model with 671 billion parameters — while training GPT-4 alone cost $100 million. At the same time, suspicions of data theft were raised by at the time of the launch of the Chinese model, when the same model claimed that «it was ChatGPT».
«We know that Chinese companies — and others — are constantly trying to copy the models of leading American AI companies», — OpenAI wrote in its latest statement. «We are taking countermeasures to protect our intellectual property».
Currently, OpenAI itself is fighting allegations of copyright infringement from publications and content creators — including a lawsuit from The New York Times, which claims that Sam Altman’s company trains its models on their articles without permission.