models/Wan Video 2.2 - 14B Image-to-Video

Wan Video 2.2 - 14B Image-to-Video

10/14/2025

11:29:34 PM

Related Keywords & Tags

14b image-to-video,720p 24fps,base model,checkpoint,image-to-video generation,mixture-of-experts,text-to-video generation,theally,video diffusion model,wan video 2.2,wan video 2.2 i2v-a14b,wan2.2,wan2.2-i2v-a14b,wan2.2-t2v-a14b,wan2.2-ti2v-5b

A sleek white robot serving a cup of coffee to a man sitting in a cozy cafe booth under warm hanging ceiling lights, captured with natural smartphone photography.

Nostalgic 1990s photograph of a college student typing code on a vintage 90s computer in a dorm room with green wallpaper and carpet floor, lit by a desk lamp.

A mysterious figure wearing a full black cloak stands in a dimly lit back alley surrounded by tall buildings with visible pipes, steam leak, and electrical wiring, evoking a cyberpunk and slightly creepy atmosphere.

View from inside a car driving through a wet tropical highway with palm trees lining the road during a rainy daytime.

View from inside a car driving through a suburban neighborhood on a rainy, windy day with American-style houses lining the street.

Man wearing a white pinstripe suit and sunglasses standing near palm trees with a modern Miami skyscraper in the background, shot from a low camera angle.

African American man with afro and sunglasses wearing a pink suit standing on a Miami street at night with palm trees and neon-lit skyscrapers in the background, viewed from a low angle.

Group of young women dressed in denim shorts and crop tops enjoying a night out near palm trees, illuminated by neon lights and modern skyscrapers in Miami

Recommended Parameters

resolution

720x480, 720x720

vae

Wan2.2-VAE - advanced

Tips

Wan2.2 benefits from a large-scale dataset with +65.6% more images and +83.2% more videos compared to Wan2.1.

Using the Mixture-of-Experts (MoE) architecture allows maintaining computational cost while increasing model capacity.

The model supports stable video synthesis with reduced unrealistic camera movements, especially for image-to-video generation.

Version Highlights

Wan 2.2 14B for Image-to-Video on-site Generation

Creator Sponsors

Check out the official Wan2.2 GitHub repository for source code and updates.

Download the ComfyUI Repack of Wan2.2 models from HuggingFace.

Original Diffusers multi-part safetensors files are available at Wan-AI HuggingFace Repo.

Wan Video

Note: There are other Wan Video files hosted on Civitai - these may be duplicates, but this model card is primarily to host the files used by Wan Video in the Civitai Generator.

These files are the ComfyUI Repack - the original files can be found in Diffusers/multi-part safetensors format here.

Wan2.2, a major upgrade to our visual generative models, which is now open-sourced, offering more powerful capabilities, better performance, and superior visual quality. With Wan2.2, we have focused on incorporating the following technical innovations:

👍 MoE Architecture: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.

💪🏻 Data Scaling: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.

🎬 Cinematic Aesthetics: Wan2.2 incorporates specially curated aesthetic data with fine-grained labels for lighting, composition, and color. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.

🚀 Efficient High-Definition Hybrid TI2V: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.

Wan2.2-T2V-A14B

The T2V-A14B model, supports generating 5s videos at both 480P and 720P resolutions. Built with a Mixture-of-Experts (MoE) architecture, it delivers outstanding video generation quality. On our new benchmark Wan-Bench 2.0, the model surpasses leading commercial models across most key evaluation dimensions.

Wan2.2-I2V-A14B

The I2V-A14B model, designed for image-to-video generation, supports both 480P and 720P resolutions. Built with a Mixture-of-Experts (MoE) architecture, it achieves more stable video synthesis with reduced unrealistic camera movements and offers enhanced support for diverse stylized scenes.

Wan2.2-TI2V-5B

The TI2V-5B model is built with the advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can runs on single consumer-grade GPU such as the 4090. It is one of the fastest 720P@24fps models available, meeting the needs of both industrial applications and academic research.

GitHub: https://github.com/Wan-Video/Wan2.2

Originally HuggingFace Repo: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models