First unveiled in 2025, the Wan series has undergone continuous upgrades, reflecting Alibaba’s endeavours in AI-driven multimedia technologies. The Wan series is Alibaba’s AI platform for generating videos and images from text, media references, and audio, enabling creators to produce cinematic content with AI.
In December last year, Alibaba unveiled the latest evolution of its visual generation models, the Wan2.6 series.
Wan 2.6 capabilities
The Wan 2.6 series enables creators to appear in AI-generated videos as themselves and in their own voices with flexible multi-shot storytelling. The new features are designed to unlock creative possibilities for professional-grade content production with enhanced multi-person dialogue and extended duration for richer narratives.
The Wan2.6 series features a new reference-to-video generation model. Wan2.6-R2V enables users to upload a character reference video with both appearance and voice, utilising text prompts to generate vivid new scenes starring that same character. Users can create videos featuring a person, animal or object, or even multiple subjects together, while preserving the distinctive look and sound of the original reference.
China’s first reference-to-video generation model
Powered by multimodal reference generation capabilities, Wan2.6-R2V is China’s first reference-to-video generation model. It changes the way short-form drama creators tell stories and streamline their production processes.
The Wan2.6 series also includes enhancements to its text-to-video model (Wan2.6-T2V), its image-to-video model (Wan2.6-I2V), and to its two image generation models (Wan2.6-image and Wan2.6-T2I).
The new models introduce intelligent multi-shot storytelling capabilities that allow for richer, more expressive narratives with visual consistency throughout, delivering realistic scenes in video form with richer sound effects. Supporting video outputs of up to 15 seconds, the models give creators more room to develop their stories.
Interleaved text-image output with advanced logical reasoning capabilities
For image generation, the Wan2.6 series enables users to create interleaved text-image output with advanced logical reasoning capabilities, to support coherent visual storytelling of cinema-grade. Advanced understanding of lengthy Chinese and English text prompts enables creators to produce high-quality, expressive content that captures nuance and artistic intent.
Users can access and deploy the models through Model Studio , which is Alibaba Cloud’s AI development platform, and Wan’s official website. The models will also be integrated into Qwen App, Alibaba’s flagship AI application.












