Artificial intelligence outperforms the world's top 30 mathematicians at a test in California

This May, thirty leading mathematicians from around the world gathered in Berkeley, California, to compete in math skills with the o4-mini chatbot from OpenAI.

For two days, mathematicians forced the chatbot based on the Big Language Model to solve complex math problems and were stunned by the result. o4-mini gave answers and solutions to complex math problems, demonstrating almost a professorial level.

«I have colleagues who have literally said that these models are close to mathematical genius», — said the meeting leader, a mathematician from the University of Virginia Ken Ono.

Models such as the o4-mini are lighter and more flexible than previous models. They are trained to predict the following result in sequence. o4-mini was trained on specialized datasets with strong reinforcement from experts.

To evaluate the effectiveness of o4-mini, OpenAI first turned to the nonprofit organization Epoch AI, which created 300 math problems for the LLM benchmarking. Traditional AI models were able to answer only 2% of the tasks correctly. However, OpenAI’s model significantly outperformed.

Epoch AI hired Elliot Glaser, who recently received his PhD in mathematics, to participate in a new FrontierMath benchmark collaboration in September 2024. The project included a large number of questions of various difficulty levels. In particular, the first three covered bachelor’s, master’s, and research level tasks. In April 2025, Glaser found that o4-mini was only able to solve 20% of the tasks.

After that, the analyst decided to check how The LLM will cope with the fourth level of tasks intended for academics. Only a small group of people in the world would be able to develop such questions, let alone answer them. The mathematicians who participated had to sign a non-disclosure agreement requiring them to communicate exclusively through the Signal messaging program.

Every problem that o4-mini failed to solve guaranteed a mathematician a $7.5k prize. During the decisive meetings, 30 leading mathematicians were divided into groups of 6 For two days, the scientists competed with each other, coming up with tasks that they could solve but that would stump the OpenAI AI model.

As he says Ken Ono, he came up with a problem on the level of a good doctoral dissertation on a number theory question. Over the next 10 minutes, he watched o4-mini unfold the solution and demonstrate the course of his own reasoning. The chatbot spent the first few minutes studying the literature on the topic. After that, it solved a simpler problem as a training exercise. Five minutes later, o4-mini presented a correct but daring solution to the problem that Ono had come up with.

«I was not ready for such a confrontation with LLM I have never seen such reasoning in a model before. This is exactly what scientists do. It’s scary», — Ken Ono admits.

And although the mathematicians found ten problems that proved to be beyond the o4-mini, they were impressed by the progress of AI over the past year It only took him a few minutes to complete a task that would have taken a human expert weeks or months.

As a result, the mathematicians concluded that artificial intelligence models will probably soon be able to solve the most complex problems that even the world’s leading experts cannot solve. In this case, mathematicians will only have to give AI tasks and wait for an answer.

Source: LiveScience