
Ukrainian researchers have created the first EIT test for AI and proved that even top models like ChatGPT would not be able to successfully graduate from school.
Finally, someone decided to conduct not only «tricks with a dead grandmother» for Windows 7 activation keys and whether AI is sexist. Researchers have created a multi-format test called ZNOVision — it tests AI’s knowledge of 13 school subjects in Ukrainian. The test includes more than 4,300 tasks in various categories: physics, math, history, literature, etc. More than half of the questions have diagrams, maps, or graphs. Some of the tests require logical conclusions, while others require an accurate understanding of the wording in Ukrainian.
The researchers used six large models for testing: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Qwen2VL72B, Paligemma3B, and PaligemmaFT. The De Novo — cloud infrastructure with GPU clusters certified according to Ukrainian security requirements was used to run the models and process the data.
The result is as follows: not a single model could score 70% of correct answers. It turns out AI is not only a bad therapist and gradually driving you crazy and also loses to Ukrainian schoolchildren. Gemini Pro was the best performer with 67.5%. Claude 3.5 got 64.3%, Qwen2VL 51.2%, and GPT-4o only got 47%. For comparison, if we chose answers at random, it would be about 22%.
The worst results were in the visual-text tasks, where he had to process pictures and text simultaneously in Ukrainian. Claude was able to give correct answers in 26.7% of such questions, GPT-4o in 29%, and Qwen2VL in 34.4%. For English-speaking models, this figure usually exceeds 60%.
«Artificial intelligence should not be a monopoly of a few languages. Ukrainian should sound as confident in the systems of the future as English. And we at De Novo believe that we can create the technological basis for this here in Ukraine», — says Maxim Ageev, CEO of De Novo.
ZNOVision was created not just for the sake of experimentation. It can be used for testing Ukrainian-language models in education, automating support, content moderation, and localization. Startups can use it to train their own AI ideas, and EdTech services can use it as a basis for adaptive tests. But so far, we see that the Ukrainian language is not very well integrated into AI systems, so there are many mistakes. What to say if AI chatbots like ChatGPT are extremely sensitive to spelling and this is in the language in which the model was developed.
Source: : ZNOVision
Spelling error report
The following text will be sent to our editors: