in ,

Alibaba Releases Wan2.2 to Uplift Cinematic Video Production

The Industry’s first open-source MoE large video generation models offering superb control for global creators and developers

Alibaba recently released Wan2.2, the industry’s first open-source large video generation models incorporating the MoE (Mixture-of-Experts) architecture, that will significantly elevate the ability of creators and developers to produce cinematic-style videos with a single click.

The Wan2.2 series feature a text-to-video model Wan2.2-T2V-A14B and image-to-video model Wan2.2-I2V-A14B, and Wan2.2-TI2V-5B, a hybrid model that supports both text-to-video and image-to-video generation tasks within a single unified framework.

Built on the MoE architecture and trained on meticulously curated aesthetic data, Wan2.2-T2V-A14B and Wan2.2-I2V-A14B generate videos with cinematic-grade quality and aesthetics, offering creators precise control over key dimensions such as lighting, time of day, color tone, camera angle, frame size, composition, focal length, etc.

The two MoE models also demonstrate significant enhancements in producing complex motions, including vivid facial expressions, dynamic hand gestures, and intricate sports movements. Additionally, the models deliver realistic representations with enhanced instruction following and adherence to physical laws.

To address the issue of high computational consumption in video generation caused by long tokens, Wan2.2-T2V-A14B and Wan2.2-I2V-A14B implement a two-expert design in the denoising process of diffusion models, including a high-noise expert focusing on overall scene layout and a low-noise expert to refine details and textures. Though both models comprise a total of 27 billion parameters, only 14 billion parameters are activated per step, reducing computational consumption by up to 50%.

Wan2.2 incorporates fine-grained aesthetic tuning through a cinematic-inspired prompt system that categorizes key dimensions such as lighting, illumination, composition, and color tone. This approach enables Wan2.2 to accurately interpret and convey users’ aesthetic intentions during the generation process.

To enhance generalization capabilities and creative diversity, Wan2.2 was trained on a substantially larger dataset, featuring a 65.6% increase in image data and an 83.2% increase in video data compared to Wan2.1. Wan2.2 demonstrates enhanced performance in producing complex scenes and motions, as well as an enhanced capacity for artistic expression.

 

A Compact Model to Enhance Efficiency and Scalability

Wan2.2 also introduces its hybrid model Wan2.2-TI2V-5B, a dense model that utilizes a high-compression 3D VAE architecture to achieve a temporal and spatial compression ratio of 4x16x16, enhancing the overall information compression rate to 64. The TI2V-5B can generate a 5-second 720P video in several minutes on a single consumer-grade GPU, enabling efficiency and scalability to developers and content creators.

Wan2.2 models are available to download on Hugging Face and GitHub, as well as Alibaba Cloud’s open-source community, ModelScope. A major contributor to the global open source community, Alibaba open-sourced four Wan2.1 models in February 2025 and Wan 2.1-VACE (Video All-in-one Creation and Editing) in May 2025. To date, the models have attracted over 5.4 million downloads on Hugging Face and ModelScope.

Written by dotdailydose

Comments

Leave a Reply
  1. Someone necessarily help to make seriously posts I would state.
    This is the very first time I frequented your web page and up
    to now? I amazed with the research you made to create this
    actual publish extraordinary. Great job!

  2. My brother recommended I would possibly like this website.
    He was totally right. This put up truly made my day.
    You can not consider just how so much time I had spent for this info!

    Thank you!

  3. Somebody essentially lend a hand to make critically posts I
    would state. That is the very first time I frequented your website page and
    up to now? I amazed with the analysis you made to create this actual put up
    amazing. Wonderful job!

  4. Hello there! This blog post could not be written much better!
    Looking through this post reminds me of my previous roommate!
    He always kept talking about this. I am going to send this post to him.
    Pretty sure he’ll have a great read. Thanks for sharing!

  5. The MoE architecture in Wan2.2 is a smart move for video generation — activating only relevant expert pathways per token keeps inference costs down while scaling model capacity. The hybrid TI2V model in a single framework is particularly useful for production pipelines where you switch between prompting modes.

  6. The MoE architecture is a smart move for video generation. Running full dense compute on every token has always been the bottleneck. Curious how the quality holds up on longer clips compared to dense models.

Leave a Reply

Your email address will not be published. Required fields are marked *

Loading…

0

TECNO makes PHANTOM V Flip2 5G more affordable – lowest to PhP24,999 from August 1

HUAWEI Remains at the Forefront of Mobile Imaging with the Industry’s First Switchable Dual Telephoto Camera on the HUAWEI Pura 80 Series, Coming Soon to the Philippines