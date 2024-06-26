Etched is one of NVIDIA’s competitors in the AI processor market. The startup offers a different approach to their creation, which is similar to the production of ASICs for mining — specialization in a specific type of generative AI, namely the so-called transformers. The chips will not work with other models, but will have orders of magnitude higher performance with specific ones. The presented Sohu processors work with Llama 70B and are capable of processing 500 thousand tokens per second. A server with 8x Sohu chips can replace 160 NVIDIA H100 processors.

Sohu is the first specialized chip for transformer models, according to Etched. With much higher performance than any existing general-purpose solutions, Sohu cannot run CNN, LSTM, SSM, or any other AI models. It is manufactured on TSMC’s 4 nm process.

The company says that currently, every major AI product on the market (ChatGPT, Claude, Gemini, Sora) is based on transformers, and allegedly, in a few years, every major AI model will run on specialized chips. Etched considers this process inevitable.

The Sohu processor is claimed to be more than 10 times faster and cheaper than new generation NVIDIA Blackwell chips (B200). A single Sohu server processes Llama 70B tokens 20 times faster than an H100 server (23,000 tokens/s) and 10 times faster than a B200 server (~45,000 tokens/s). The results were obtained when running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output data. The 8xH100 benchmarks are obtained from TensorRT-LLM 0.10.08 (latest version), and the 8xB200 benchmarks are estimates. «This is the same benchmark used by NVIDIA and AMD», — Etched says.

Criticizing the universal architecture of GPUs, Etched notes that they are not getting better, they are just getting bigger. Over the past four years, their computing density (TFLOPS/mm²) has improved by only about 15%. New generation GPUs (NVIDIA B200, AMD MI300X, Intel Gaudi 3, AWS Trainium2, etc.) use two chips as one to «double» their performance. According to the startup, with Moore’s Law slowing down, the only way to improve performance is to specialize. ‍

The business case for specialized chips is based on their relatively low cost compared to the cost of training and operation of AI. Today, AI models use more than $1 billion to train and tens of billions to operate. On this scale, a 1% improvement would justify $50-100 million for an in-house chip design. ASICs are 10-100 times faster than GPUs.

«When [specialized] bitcoin miners entered the market in 2014, it became cheaper to throw away GPUs than to use them to mine bitcoin. Billions of dollars are at stake, and the same is happening with AI… The architecture that runs the fastest and cheapest on the hardware wins».

When scaling the performance of models from $1 billion to $100 billion, the risk of testing a new architecture increases rapidly. Etched believes that efforts should be better directed at improving the efficiency of transformers rather than simply scaling them.

«As soon as Sohu (and other ASICs) hit the market, we will reach a point of no return. Transformers killers will have to run faster on GPUs than Transformers on Sohu. If that happens, we will create ASICs for that too!».

Etched, which has been around for only two years, was founded by Harvard graduates Gavin Uberti (OctoML and Xnor.ai) and Chris Chu, who, together with Robert Vahen and former Cypress Semiconductor CTO Mark Ross, sought to create a chip that would do one thing: run AI models.

This is not surprising. Many startups and tech giants are developing chips that work exclusively with AI models, also known as inference chips. Meta has MTIA, Amazon has Graviton and Inferentia, etc. But Etched chips are unique in that they work with only one type of model: Transformers. «In 2022, we predicted that transformers would take over the world. We’ve now reached a point in the evolution of artificial intelligence where specialized chips that can perform better than general-purpose GPUs are inevitable — and the world’s tech decision makers know it», says Uberti, CEO of Etched. How does Sohu achieve the performance shown? In several ways, but the most obvious is a simplified hardware and software pipeline. Since Sohu doesn’t work with non-transformer models, the Etched team can eliminate hardware components that are not relevant to them, and the same goes for software.

«In short, our future customers won’t be able to afford not to switch to Sohu. Companies are willing to bet on Etched because speed and cost are important for the AI products they are trying to build», — says Uberti.

So far, Etched has no competitors that have gone this far, but the fight is already starting. If more efficient technologies emerge or other models of artificial intelligence become trendy, the company says it will simply develop a new chip.

Sources: Etched, TechCrunch