AI models can be easily distorted by buying $60 domains or editing Wikipedia

A group of artificial intelligence researchers recently discovered that for as little as $60, an attacker can tamper with the datasets generated by AI tools like ChatGPT.

Chatbots or image generators can produce complex responses and images by learning from terabytes of data from the Internet. Florian Tramer, associate professor of computer science at ETH Zurich, says this is an effective way to learn. But this method also means that AI tools can be trained on false data. This is one of the reasons why chatbots can have biases or simply give wrong answers.

Tramer and a team of scientists in a study published on arXiv, were looking for an answer to the question of whether it is possible to deliberately «poison» the data on which an artificial intelligence model is trained. They found that with some spare cash and access to technical solutions, a low-level attacker can tamper with a relatively small amount of data, which is enough to make a large language model produce incorrect answers.

The researchers considered two types of attacks. One way is to purchase expired domains, which can cost as little as $10 per year for each URL that will host the desired information. For $60, an attacker can effectively control and «poison» at least 0.01% of the data set.

The researchers tested this attack by analyzing datasets that other researchers rely on to train real-world large-scale language models and purchasing expired domains from them. The team then tracked how often researchers downloaded data from domains belonging to the research team.

«A single attacker can control a significant portion of the data used to train the next generation of machine learning models and influence how that model behaves,» says Tramer.

Scientists also investigated the possibility of poisoning Wikipedia, as the site can serve as the main source of data for language models. Relatively high-quality data from Wikipedia can be a good source for AI training despite its small share on the Internet. A fairly simple attack involved editing Wikipedia pages.

Wikipedia does not allow researchers to take data directly from their site, providing instead copies of pages that they can download. These snapshots are taken at known, regular, and predictable intervals. That is, an attacker can edit Wikipedia just before a moderator can undo the changes and before the site takes the snapshots.

«This means that if I want to put garbage on a Wikipedia page… I’ll just do some math, figure out that this particular page will be saved tomorrow at 3:15pm, and then I’ll add garbage to it at 3:14pm tomorrow».

The scientists did not edit the data in real time, but instead calculated how effective an attacker could be. Their very conservative estimate was that at least 5% of the edits made by the attacker would go through. usually the percentage is higher, but even this is enough to provoke the model to undesirable behavior.

The research team presented the results on Wikipedia and provided suggestions for security measures, including randomizing the time during which the site takes snapshots of pages.

According to the scientists, if the attacks are limited to chatbots, data poisoning will not be an immediate problem. However, in the future, artificial intelligence tools will start interacting more with external sources — browsing the web, reading emails, accessing calendars, etc.

«From a security standpoint, these things are a nightmare,» says Tramer. If any part of the system is hacked, an attacker could theoretically instruct an AI model to look for someone’s email or credit card number.

The researcher adds that data poisoning is not even necessary at the moment due to the existing shortcomings of AI models. And identifying the pitfalls of these tools is almost as easy as making «models behave badly».

«At this point, the models we have are fragile enough that they don’t even need to be poisoned,» he said.

Source: Business Insider

AI models can be easily distorted by buying $60 domains or editing Wikipedia — research