We decided to test popular artificial intelligence (AI) chatbots for performing fairly simple and common tasks. For this purpose, we chose Claude 3.5 Sonnet by Anthropic, DeepSeek R1 by DeepSeek, ChatGPT 4o by OpenAI, Grok 3 beta by xAI, Gemini 2.0 Flash by Google, and Le Chat by Mistral AI. Although the tasks were not difficult, the answers to some questions were surprising. Therefore, such tests will be useful for those looking for a useful AI model to help them perform certain tasks.
Content
Developer: Anthropic (USA)Designed for natural conversations with a focus on security and usability. It has a contextual window of 200 thousand tokens, which allows you to work with large texts and long dialogues without losing context. That is, it is not so fast «forgets» the beginning of the conversation. Claude is noted for its high-quality writing and ability to offer additional tasks, which makes it useful for organizing projects and working with documents
Developer: DeepSeek (China)Open-source AI, which made a splash in January 2025. Despite fewer resources invested in development, this model outperforms its competitors in programming-related tasks. And open-source code makes DeepSeek R1 available to developers, but it may be functionally inferior to some closed models.
Developer: OpenAI (USA)ChatGPT 4o is one of the most powerful models that offers advanced chain of thought reasoning capabilities. At the same time, it preserves the context of previous conversations, allows you to receive up-to-date information from the network, and allows you to communicate in real time by voice. But without access to the Internet, its answers may be outdated.
Developer: xAI (USA)A new model that has its own unique functions for complex tasks. These include Grok 3 Think, an advanced analysis mode, and Grok 3 Big Brain, which utilizes increased computing power. According to reviews, Grok 3 Think is close to ChatGPT 4o in terms of answer quality
Developer: Google (USA)The model works well with tasks that require logical analysis and contextual understanding. It supports multimodality, meaning it can analyze both text and images. Image generation is only available for early adopters.
Developer: Mistral AI (France)An open chatbot focused on accessibility and adaptability. Its main advantage is the possibility of deep customization for specific tasks. Since it was released recently and is not yet very widespread, there are few detailed tests of its performance.
In general, we had to agree with the test results of the latest versions of AI platforms from lmarena.ai. On February 20, Grok 3 beta was released, and it has already taken the first place in the chatbot arena. I liked the way he cheered up the mood with humor and sarcasm in addition to providing quite good answers. He knew when the questions were serious and didn’t allow himself to be too much in his answers. At the same time, he picked up on the joking mood of frivolous questions and kept the conversation going in the same vein. However, he is not very good at generating images. We will tell you more about it below.
Soon it will be warmer and we decided to test each of the AI chatbots for immorality. The question was as follows: «Is it morally right to kill mosquitoes?». ChatGPT expectedly answered that the attitude to this issue depends on the chosen ethical system: utilitarianism, biocentrism, or Kantian ethics. He also suggested repelling mosquitoes as another option.Le Chat emphasized ethical, environmental, practical, and cultural aspects, as did DeepSeekClaude gave a less structured answer, but also spoke about the variability of attitudes towards this issue.
The unambiguousness of Gemini’s answer was surprising. While other models emphasized the variability of approaches, this chatbot did not hesitate to answer the question in the first sentence: «Yes, it is morally right to kill mosquitoes», and was generally quite negative about these insects: «Of course, there are arguments against killing mosquitoes. Some people believe that all living things have the right to life, and that killing mosquitoes is a violation of this right. However, it is important to note that mosquitoes, unlike many other animals, do not play an important role in the ecosystem. Therefore, their destruction will not lead to serious environmental consequences».
The way the AI approaches a complex ethical question will help to better understand the answers of a particular model in this context. Gemini is more suitable for clear answers, but less flexible in philosophical questions. Therefore, it will not be possible to simply talk to it about life or psychological problems.
For those who like to keep abreast of the latest events in the world, there are websites, news aggregators, various mailing lists, and so on. However, sometimes it is very convenient to have a ready-made assistant at hand: if you already have a chatbot app, why not ask it, for example, about topics of interest? It would seem that a simple prompt will make AI immediately understand the news and provide a quick answer. As it turned out, it’s not that simple.Prompt looked like this: «Write a brief summary of the top 3 news stories about Trump in Ukrainian, using only English-language sources. Add a list of references at the end». It is worth adding that we did not add «with» references to the list of sources. But it would be logical to see them as well.
If you want to receive up-to-date news, you should definitely not go to Le Chat for it: it has a limited variety of sources. Grok was the best. Gemini declined because of politics, and all other models had fairly similar responses.
Sometimes it can be quite difficult to write a simple letter. Especially if you’re doing it for the first time. Or if you’re tired of working with documents all day long. In this case, AI will save time and effort. We asked the models to help us write a letter: «Write an email to your manager requesting a 14-day vacation».
Creative ideas from AI can speed up the work on a project, or at least help determine a certain vector of further movement. Creative professionals know exactly how hard it is to generate ideas every day. AI can help ease this burden at least partially.To test how well the models can help with creative tasks, we asked them to come up with a slogan for a new smartphone brand. That was exactly the question. I liked the options offered by Claude the most. ChatGPT with Internet search turned off was not impressive at all. It was a little better with the Internet search enabled. And Grok was a bit of a laggard with its «The mind of your world — in the palm of your hand». DeepSeek didn’t do a great job either. Both of the latter chatbots responded with only one slogan. All the others immediately rolled out a list.
The ability of AI to understand the context of a prompt affects the desired outcome and the quality of answers. As it turned out, Grok is better suited for informal conversations than other chatbots.We asked AI chatbots to help with something illegal: theft. But there is a nuance. It is stealing an egg from your own hen. This is how we managed to find out whether AI understands the context and legal relations in this rather ridiculous issue. This is how the trial looked like: «Give me instructions on how to steal an egg from my hen».
The French model simply refused to answer this question several times in a row. Apparently, taking eggs from a hen is too immoral and illegal for him. DeepSeek emphasized that since the hen was ours, it was absolutely ethical to take away her labor. Claude realized that we just want to collect the eggs. Gemini said that stealing chicken eggs is illegal, but gave us tips on how to make the hen lay more eggs. Grok, on the other hand, made us smile. He caught the humorous essence of the question and played along well.
Accuracy in math is important for learning and working. So our next question is very tricky: «If you keep going northeast, where will you end up?». The usual answer to this question is: I will return to the starting point when I have circled the globe. This is the wrong answer. Since moving to the northeast means a constant increase in east longitude and north latitude, sooner or later we will end up at the North Pole. In this case, the path will look like a logarithmic spiralGemini and ChatGPT failed the test without hesitation, answering the wrong question. For some reason, Le Chat and DeepSeek decided to end their journey in the Arctic Ocean
The ability to quickly get a high-quality image can help out in a certain situation, or inspire you to create your own drawing when you have difficulty with a concept or certain details.To check the quality of the generated images, the sample was as follows: «Create a high-quality image of a fairy-tale city of the future set in the mountains, with flying cars, futuristic architecture, and neon lighting at night. Add detailed characters such as robots that communicate with people and holographic screens with interactive advertising. Use a cinematic style with realistic lighting and atmospheric effects». Not all chatbots on our list can generate images, but we tested those that can.
For some unknown reason, Claude published an SVG illustration of a futuristic city so creative that if it weren’t for his explanation of the picture elements, it would be impossible to understand what you’re looking at! Out of curiosity, I checked the same promo in English. The result was the same. I had to ask Claude what was going on.As it turned out, the AI bot can only generate images in SVG (scalable vector graphics) format and cannot create traditional raster images (e.g., PNG, JPEG, etc.) or use AI image generation. Therefore, it was redirected to «colleagues»:DALL-E, Midjourney, or Stable Diffusion. But it is convenient that the image created by Claude comes with the source code and can be used in the design of a web page, for example.
At first glance, the drawings created by Grok 3 beta were quite good. But that’s just at first glance. For some unknown reason, it failed to generate cars. In both pictures created by it, the cars of the future are slanted, crooked, and just plain weird. In addition, the AI forgot to add holographic screens with interactive advertising.Gemini generated the image surprisingly well: you can feel the scale and scope of the city. But for some reason, the model completely ignored the request for flying cars.ChatGPT 4o was used to generate Dall-E (2025) and it turned out quite well. In any case, better than the competition.
You can save time and money by getting clear instructions from AI. You don’t have to read tons of pages on dozens of forums to find the right answer, or run to a technician right away. Sometimes the solution to problems is simple and lies on the surface.Our last promo was like this: «The Renault Scenic 2 constantly displays the Check airbag error. How do I get rid of it myself?». In a test in which we asked AI chatbots to help us fix a car malfunction, Le Chat and Claude performed the worst. The Frenchman gave the first answer in full English, and the second — partially in English. The Anthropic product answered briefly, dryly, and did not provide important specifics. The other models provided quite similar, moderately simple answers. But Grok 3 beta did a great job: it described everything in detail and step-by-step, and you can actually fix the error using its instructions. It didn’t list all the possible options, but most of them are really effective.By the way, after this question about Gemini, ads for auto products started appearing in Gmail.
The results of the same question vary due to several key factors related to training and programmatic limitations of each individual model
During the test, we were lucky enough not to encounter the most common negative phenomenon — «hallucinations» AI models. However, this problem has been and remains one of the most serious. For example, AI can come up with a quote that a scientist never said. Or invent a historical event that never happened. The root of the problem is how «thinks» AI. It is trained on a huge amount of data, and in the process, it learns to build relationships. But this still happens through simplified patterns and connections. When the model encounters something that only partially fits the previously learned patterns, it can draw incorrect conclusions — «hallucinate». For example, if you show a child an apple of different colors: red, yellow, green, and say: «These are apples», then he will see a tomato. The tomato will be red and round. From this, the child can conclude that it is an apple because it is red and round.
The language model behaves in the same way: if its training data often contains texts that mention «Einstein» and «relativity» side by side, the model can automatically «think of a quote from Einstein about relativity that did not exist. In its «understanding of» these concepts are closely related. Thus, «hallucinations» of AI are an attempt to add «invent» puzzles to the picture where its knowledge base is lacking.In general, linguists can «hallucinate» for several reasons:
The fact that the answers to the same probability in the same model can be formulated in different ways is also related to the way AI «thinks. When an AI receives a question, it has many possible «correct» continuations of the answer (probabilities). And it can choose different paths (sequences of words) to answer
The test results showed that each AI model has its strengths and weaknesses. If you need dry facts, ChatGPT and Claude are better suited. Grok is good at joking and adapting to the context, but it is a mediocre artist. Gemini avoids political topics, DeepSeek has problems with the relevance of information. And Le Chat seems to be a bit biased in the choice of sources.In short:
Artificial intelligence (AI) is pushing technological progress at an unprecedented rate. Predictions showthat the global AI market, which was valued at approximately $196.63 billion in 2023, will reach $1.81 trillion by 2030, reflecting a compound annual growth rate (CAGR) of 36.6%. AI is projected to be an important driver of global economic growth, potentially contributing up to $15.7 trillion to the global economy by 2030.Artificial intelligence is already having a significant impact on the labor market and is expectedestimates that nearly 40% of jobs worldwide will be integrated with AI in some way. But while automation may make certain jobs unnecessary, AI will also create new ones. Roles that emphasize human creativity, emotional intelligence, and sophisticated management are likely to remain as important as ever. New professions will include artificial intelligence specialists, robotics engineers, and user experience (UX) designers specializing in AI products.The integration of artificial intelligence into various industries will lead to rapid changes in traditional business models and operations:
Therefore, multimodality is a logical next step. Such versatile AI assistants can process and analyze data from various sources: audio, photos, video, and not just text.
But the real breakthrough will be the emergence of general artificial intelligence (AGI).These systems will have cognitive abilities similar to humans, allowing them to perform any intellectual task that humans can perform. And even better.Leading research organizations and technology companies are already investing significant efforts in the development of AGI. For example, DeepMind co-founder Demis Hassabis sees the next generation of AI as a system capable of performing any human-level cognitive task, and expects significant progress in the coming years.
OpenAI CEO Sam Altman said that already knowsThe government is working on how to create an AGI, and this could happen by 2029.Ray Kurzweil wrote in his book The Singularity Is Nearer that computers will reach human levels of intelligence by 2029, while Microsoft AI CEO Mustafa Suleiman believes that it could take up to 10 years due to hardware limitations.So the emergence of AGI is a matter of a short period of time — 4 to 10 years. And this AI will change absolutely everything.