New training method will make AI less deceptive

Новий метод навчання зробить ШІ менш брехливим

Researchers from the United States have developed an innovative method of training artificial intelligence models with automatic error removal before the AI goes through the learning process.

A team of scientists from the Center for AI Autonomy at Florida Atlantic University’s College of Engineering and Computer Science has developed a method for automatically detecting and removing incorrectly labeled examples that can subsequently reduce the performance of AI models. Artificial intelligence models mostly rely on the method of support vectors — a number of similar learning algorithms that subsequently determine the decisions made by AI.

This method is widely used in training AI models for image and voice recognition, as well as in medical diagnostics and text analysis. During the training process, AI models discover the boundary that best divides various categories of data. If several examples are incorrectly divided, this can distort the decision-making boundaries of the AI and reduce its performance in real-world conditions.

Before the AI starts learning, researchers apply a technique that automatically removes strange or unusual data examples that don’t quite fit into the overall set. Such data is deleted or flagged, ensuring that the AI uses only validated information from the start.

«SVMs are among the most powerful and widely used classifiers in machine learning, with applications ranging from cancer detection to spam filtering. What makes them particularly effective-but also uniquely vulnerable-is that they rely on only a small number of key data points, called support vectors, to draw the line between different categories. The consequences of this can be serious, whether it is a missed cancer diagnosis or a security system that cannot recognize a threat», — explains Professor Dimitris Pados.

Removing false data from the training set of information for AI models is based on a mathematical algorithm called principal component analysis of the L1-norm. This method identifies and removes suspicious data markers in each category based solely on how well they match the rest of the information.

«Data points that appear to deviate significantly from the others — often due to labeling errors — are flagged and deleted. Unlike many existing methods, this process does not require manual tuning or user intervention and can be applied to any AI model, making it scalable and practical», — Dimitris Pados notes.