Microsoft's new security system «catches» hallucinations in Azure customers' AI programs

Microsoft’s Responsible Artificial Intelligence team has developed several new security features for customers of the Azure AI Studio platform.

The tools, built on a large language model, can detect potential vulnerabilities in systems, track «plausible» AI hallucinations, and block malicious hints in real time — when Azure AI customers work with any model hosted on the platform, says Sarah Baird, head of the team.

«We know that not all customers are experienced in instant attacks, so the assessment system generates the hints needed to simulate these types of attacks. Then customers can get the assessment and see the results»,” she says.

The system can potentially offset controversies about generative AI caused by unwanted or unintended responses — such as the recent ones with outright fakes about celebrities in the Microsoft Designer image generator or historically inaccurate results from Google Gemini or disturbing images of animated characters piloting an airplane towards the Twin Towers generated by Bing.

Currently, three features are available in the preview version on Azure AI:

Prompt Shieldswhich blocks fast queries or malicious hints that cause models to forget their training data;
Groundedness Detectionwhich detects and blocks hallucinations;
Security assessment functionthat weighs the vulnerabilities of the model.

Two other features for guiding models to safe outcomes and tracking hints to flag potentially problematic users will be coming soon.

Whether a user enters a hint or the model processes third-party data, the monitoring system will evaluate it to see if it triggers any forbidden words or has hidden clues before deciding to send it to the model for a response. The system then reviews the answer and verifies it, whether the model was hallucinating (i.e., it gave false data).

In the future, Azure customers will also be able to receive reports on users who try to initiate unsafe exits. Baird says this allows system administrators to distinguish between red teams and people with malicious intent.

It is noted that the security features immediately «connect» to GPT-4 and other popular models such as Llama 2. However, since the Azure model collection contains many AI systems — users of lesser-used open source systems may need to add them manually.

Source: The Verge

Microsoft’s new security system «catches» hallucinations in Azure customers’ AI programs