Скриншот HunyuanVideo / Tencent
Nearly a year ago, the generative AI Sora from OpenAI, which creates realistic videos, caught the general attention. Tencent announced a more open model, HunyuanVideo.
HunyuanVideo — the first major open-source video generation model with open-source inference code and weights, accessible to everyone. Tencent claims that the model can create videos comparable to leading proprietary models — with high image quality, variety of motions, text-to-video consistency, and generation stability. With over 13 billion parameters, it is the largest among all open-source video generation models.
Tencent validated the model through professional human evaluation. According to the announced results, HunyuanVideo surpasses leading contemporary proprietary models.
Instead of using separate models for generating text, images, and video, Tencent used a technique of division and combination to achieve better video quality:
“HunyuanVideo features a Transformer design and uses the Full Attention mechanism for unified image and video creation. Specifically, we use a ‘Two-stream to One’ hybrid model for video creation. In the dual-stream phase, video and text tokens are processed independently through several Transformer blocks, allowing each modality to learn its respective modulation mechanisms unimpeded. In the single-stream phase, we merge video and text tokens and feed them into subsequent Transformer blocks for efficient multimodal information fusion. This design captures the complex interactions between visual and semantic information, enhancing the overall performance of the model”.
Tencent states that the open publication of the code and ‘weights’ of the base model and its applications are intended to bridge the gap between proprietary and open-source video base models. The initiative promotes the accessibility of creating higher quality videos based on artificial intelligence. On Huggingface, you can learn more about the project, the official HunyuanVideo website HunyuanVideo contains video demonstrations, and the code is available on GitHub.
Source: NeoWin