Рубрики NewsSoftwareTechnologies

It is 10 times faster than GPT-4o: Inception Labs unveils Mercury — the first diffusion speech model

Published by Vadym Karpus

For a long time, there have been active discussions about finding a better architecture for large language models (LLMs) that could become an alternative to transformers. It seems that California-based startup Inception Labs already has a promising solution. The company has introduced Mercury, the world’s first diffusion-based large language model designed for commercial use.

According to the independent testing platform Artificial Analysis, Mercury is 10 times faster than today’s advanced models. Its performance exceeds 1000 tokens per second on NVIDIA H100 GPUs, which was previously possible only on specialized chips.

How does it work?

«Transformers dominate LLM text generation and create tokens sequentially. Diffusion models offer an alternative – they generate all the text simultaneously, applying a coarse-to-fine process»,” explained Andrew En, founder of DeepLearning.AI, in his post on X.

This last phrase is key to understanding why Inception Labs’ approach looks interesting. To put it simply, transformer-based LLMs learn autoregressively, meaning they predict words (or tokens) from left to right. Diffusion, however, is a technique that artificial intelligence typically uses to generate images and videos. Diffusion works differently – it doesn’t move from left to right, but creates all the text at once. In this case, everything starts with «noise», which is gradually cleaned and a stream of tokens is obtained.

Mercury can change the rules of the game and open up new opportunities for LLMs. And according to testing, this approach significantly affects the speed of text generation.

Mercury speed and performance

In tests on standard coding benchmarks, Mercury outperformed high-speed models such as GPT-4o Mini, Gemini 2.0 Flash, and Claude 3.5 Haiku.

In particular, the Mercury Coder Mini version reached 1109 tokens per second.

Performance comparison of Mercury with other language models

Moreover, the startup stated that diffusion models have an advantage in logical thinking and structured responses, as they are not limited to only previous tokens.

Performance comparison of Mercury with other language models

In addition, they can continuously improve the output, reducing hallucinations and errors. It is diffusion methods that are used in video generators such as Sora and Midjourney.

The company also criticized modern methods of logical inference that require significant computing resources to generate complex answers.

«Creating long logic chains leads to huge computing costs and unacceptable latency. To make high-quality artificial intelligence affordable, a paradigm shift is needed,” Inception Labs said.

The startup has released a preliminary version of Mercury Coder to help users could test its capabilities.

RecentlyAnthropic presents Claude 3.7 Sonnet — the first hybrid reasoning model and «the best AI for IT people».

Source: analyticsindiamag