Anthropic’s Claude 3 Opus Large Language Model (LLM) has outperformed OpenAI’s GPT-4 for the first time by Chatbot Arena.

«The King is dead», — wrote software developer Nick Dobos on Twitter in a post comparing GPT-4 Turbo and Claude 3 Opus.

Chatbot Arena is a crowdsourced open platform for evaluating large speech models. To compile a rating, a large number of human reviews of the models’ performance are evaluated using the Elo rating system. How the test works — people enter a query and choose the best answer from several options from different models. Thousands of user tests are used to generate and rank the top models.

The Chatbot Arena leaderboard was launched on May 3, 2023, and GPT-4 was included in the ranking on May 10. Since then, various variations of GPT-4 have been consistently at the top of the chart. Until now. That’s why the emergence of a new leader in this area is noteworthy. Moreover, one of Anthropic’s smaller models, Haiku, has also attracted attention with its performance in the leaderboard.

«For the first time, the best available models — Opus for complex tasks, Haiku for economy and efficiency — are available from a non-OpenAI vendor,» said independent AI researcher Simon Willison. «It’s reassuring — we all benefit from the diversity of the leading vendors in this area. But GPT-4 has been around for over a year at this point, and it’s taken this year for anyone to catch up».

Following Claude 3 Opus and two versions of GPT-4, Google’s Bard (Gemini Pro) is in the ranking. However, while the difference in Elo’s score between the first three positions is insignificant (2-3 points), Bard is already 45 points behind the third place. All other competitors scored less than 1200 points.

