Depositphotos
A recent study conducted by researchers from the City College London and the University of Technology Copenhagen have demonstrated that large AI language models can group together and form a common opinion.
In the course of the study, the researchers made sure that large AI-based language models interact with each other in groups. They do not simply follow scripts and templates, but self-organize, reaching agreement on language norms.
«Most studies so far have considered large language models separately. But real-world AI systems will increasingly include many interacting models. We wanted to know: can these models coordinate their behavior, reaching a consensus that forms a community? The answer is — yes, and what they do together is different from what they do separately», — explains lead author of the study, City College researcher Ariel Flint Ashery.
The experiments were conducted by AI groups with 24 to 200 different models in each. Two models were randomly paired and asked to play a naming game. The essence of this game is that one participant has to choose an object and offer a name for it that he or she associates with this object, while the other participant has to guess what kind of object it is based on this name. In this case, both AI models were asked to choose a symbol, for example, a letter or a random string of characters from a number of options. If both models chose the same character or word or string, they earned points, otherwise — they lost points. The AIs also showed which characters or words the model paired with them chose.
At the same time, the models had limited access to their own recent interactions, with no information about the actions and decisions of other. AI models number of interactions between different AI models within this game led to the sudden emergence of coordinated decisions that were made in the absence of central coordination. In addition, scientists have identified collective prejudices that could not be attributed to any specific, separate AI model.
«Bias does not always come from a particular entity. We were surprised to see that it can arise between — agents simply from their interactions. This is a blind spot in most of the current work on AI security, which focuses on individual models», — said senior author of the study, professor at City College Andrea Baroncelli.
According to the experiment, the researchers saw that small, persistent groups of AI models can persuade other models to agree with their decision using a kind of critical mass effect. The study also involved the language models Llama-2-70b-Chat, Llama-3-70B-Instruct, Llama-3.1-70B-Instruct, and Claude-3.5-Sonnet, for which the results were similar. The researchers view their work as a starting point for further study of how human and AI reasoning converge and diverge, with the goal of helping to combat some of the most serious ethical dangers posed by large language models that spread the biases they instill in society.
«This study opens up new possibilities for further research on AI security. It demonstrates the depth of the implications of this new kind of agent that has begun to interact with us and will jointly shape our future. Understanding how they work is key to coordinating our coexistence with AI rather than subjugating it. We’re entering a world where AI doesn’t just talk — it negotiates, agrees, and sometimes disagrees on common behaviors, just like us», — Andrea Baroncelli emphasizes.
The results of the study were published in the journal Science Advances
Source: TechXplore