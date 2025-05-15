Researchers from China and Canada presented a new AI-based data compression concept called LMCompress.

Users store large amounts of data on their own electronic media and share it frequently. This creates a need to improve applications that allow lossless compression of large amounts of information to speed up their transfer.

As the lead author of the study, Ming Li, explains, the concept is that data compression is based on understanding it. If a person is good at something, for example, they can summarize it.

As part of the study, Ming Li and his colleagues tried to demonstrate that the better the large language models based on AI learn data, the better they can combine and compress it. This idea was first proposed by the mathematician Claude Shannon in 1948.

Shannon suggested that if the data to be transmitted is properly understood, it can be compressed and thus reduce communication time. This idea was discussed for a long time until large language models emerged based on AI.

Ming Li emphasizes that if a large language model is able to understand the data properly, it can predict what should come next. This allows for much better data compression without any loss of information, or the quality of that information. The key idea is the ability of AI to generate the data that the user plans to transmit, which will eliminate the need to transmit anything at all. After testing this approach, the researchers were convinced that AI helped to at least double the following indicators compression for various types of data, including text, video, and audio files.

«LMCompress — is a compression algorithm that uses large models (large language model for texts, large model for images, videos, etc.). It compresses texts more than twice as well as classical algorithms, images and audio twice as well, and videos just under twice as well. Therefore, you can work about twice as fast when transferring data», — Ming Li emphasizes.

The LMCompress algorithm may soon be improved and introduced for real-world applications. As Ming Li emphasizes, their research paves the way for a new era of data compression using large AI-based language models. In his opinion, this method will be used in all gadgets without exception in the future and will replace classic applications.

The results of the study are published in the journal Nature Machine Intelligence

Source: TechXplore