Depositphotos
Japanese researchers from Osaka University have discovered the ability of generative artificial intelligence models Vision Transformers (ViT) spontaneously develop mechanisms visual processing of information similar to humans.
In a new study, researchers have demonstrated that the right teaching method allows AI to independently recreate human-like visual processing mechanisms. The researchers compared human eye tracking data and visual processing models generated by ViT. The AI models were trained using a special DINO method without using fixed filters for image analysis.
The ViT models trained with DINO demonstrated visual information processing close to the way adults watch video clips. Meanwhile, the models trained with fixed filters and algorithms demonstrated unnatural visual processing.
«Our models didn’t just randomly pay attention to visual scenes, they spontaneously developed specialized functions. One subgroup of models consistently focused on faces, another captured the contours of entire figures, and the third mainly paid attention to background features. This accurately reflects the way human visual systems segment and interpret scenes», — explains the lead author of the study Takuto Yamamoto.
Further careful analysis confirmed that the abilities that brought visual processing closer AI models to the human one, arose naturally as a result of DINO training. These visual processing patterns were both qualitatively similar to human gaze and quantitatively consistent with established eye tracking data, especially in scenes involving people.
«This result is remarkable because these models have never been told what a face is However, they learned to prioritize faces, apparently because this maximized the information they received from the environment. This is a compelling demonstration that self-supervised learning can capture something fundamental about how intelligent systems, including humans, learn from the world», — notes senior author of the study, Shigeru Kitazawa.
The results of the study were published in the journal Neural Networks
Source: TechXplore