News Technologies 06-18-2025 at 13:26 comment views icon

New Chinese programming AI MiniMax-M1 breaks DeepSeek R1 records — 1 million tokens, only $535k for training

author avatar

Andrii Rusanov

News writer

New Chinese programming AI MiniMax-M1 breaks DeepSeek R1 records — 1 million tokens, only $535k for training

Chinese artificial intelligence startup MiniMax is known for its realistic generative video model Hailuo. Its LLM for programming MiniMax-M1 is free for commercial use.

The open source MiniMax-M1 is distributed under the Apache 2.0 license. This means that companies can use it for commercial applications and modify it to their liking without restrictions or fees. The open-source model is available on Hugging Face and on Microsoft GitHub.

The MiniMax-M1 features a context window of 1 million input tokens and up to 80 thousand output tokens, making it one of the widest models for contextual reasoning tasks. For comparison, GPT-4o by OpenAI has a context window of only 128,000 tokens. This is enough to exchange information about the size of a literary novel in one interaction. With 1 million tokens, the MiniMax-M1 can exchange the information of a small book collection. Google Gemini 2.5 Pro also offers an upper limit of 1 million tokens, with a 2 million window in development.

Новий китайський ШІ програмування MiniMax-M1 б’є рекорди DeepSeek R1 — 1 млн токенів,лише $535 тис. за навчання
Comparing AI models in texts / MiniMax

According to the technical report, the MiniMax-M1 requires only 25% of the FLOPs required by DeepSeek R1 when generating 100,000 tokens. The model is available in MiniMax-M1-40k and MiniMax-M1-80k variants, with different output sizes. The architecture is based on the previous platform, MiniMax-Text-01, and includes 456 billion parameters, of which 45.9 billion are active for a single token.

The M1 model was trained using an innovative and highly efficient methodology. It is a hybrid mixture of experts (MoE) with a lightning-fast attention mechanism designed to reduce inference costs. The training cost was only $534,700. This efficiency is attributed to CISPO’s specialized algorithm that prunes importance sampling weights rather than token updates, as well as the hybrid attention design that helps optimize scaling. For comparison, DeepSeek R1 training cost $5.6 million (although There are doubts about this), while the cost of GPT-4 training from OpenAI is estimated to exceed $100 million.

This is the first release in the MiniMaxWeek series that the company announced at X. Apparently, users are in for five days of exciting announcements.

Source: VentureBeat



Spelling error report

The following text will be sent to our editors: