Use the Google Veo 3.1 model to generate videos guided by 1–3 reference images. It supports 720p and 1080p resolutions, as well as 16:9 and 9:16 aspect ratios. The video length is fixed at 8 seconds. Only "asset" reference types are supported.

Google

Veo 3.1: Generating Video from Start and End Frames

Using the Google Veo 3.1 model, generate a transition video based on the provided start and end frames. Supports durations of 4, 6, or 8 seconds, resolutions of 720p and 1080p, and optional audio generation.

Google

Veo 3.1 Fast: Generating Videos from Reference Images

Use the Google Veo 3.1 Fast model to generate videos guided by 1–3 reference images. Supports 720p and 1080p resolutions, as well as 16:9 and 9:16 aspect ratios. The video duration is fixed at 8 seconds. Only "asset" reference types are supported.

Google

Veo 3.1 Fast: Generating Video from Start and End Frames

Generate videos by specifying a start frame and an end frame, combined with text prompts. The model interpolates between the two frames to generate coherent motion content. Use the Google Veo 3.1 Fast model (veo-3.1-fast-generate-001) for faster generation.

Kling

Kling v3.0 Pro: Image to Video

Kling 3.0 is a high-quality model designed for video generation. Its strengths lie in smooth motion and cinematography that closely mimics real-life footage, with excellent control over the rhythm of character movements, camera movements (zooms, pans, and tilts), and spatial relationships within scenes. It delivers consistent results in terms of material texture, lighting variations, and detail consistency (including character clothing, props, and backgrounds). It is ideal for creating short films, storyboards for commercials, and dynamic proof-of-concepts, and its controllability can be further enhanced through clear shot script prompts.It supports an ultra-fast inference API, offers stable performance with no waiting time, and delivers exceptional value for money.

Kling

Kling v3.0: Standard Image-to-Video

Alibaba

Wan 2.1: Image to Video

Alibaba Tongyi Wan is renowned for its high image quality, strong temporal consistency, and ability to follow complex prompts, making it ideal for large-scale commercial video generation. Wan 2.1 enhances motion stability and texture detail, making it suitable for bulk production in e-commerce and advertising. Image-to-Video supports driving motion and camera movements using a single reference image, making it ideal for character dance sequences, product demonstrations, and style extensions. The real-time inference API offers stable performance, no waiting time, and affordable pricing.

Alibaba

Wan 2.2: Image to Video

Alibaba Tongyi Wan is renowned for its high image quality, strong temporal consistency, and ability to handle complex prompts, making it ideal for large-scale commercial video generation. Wan 2.2 enhances shot continuity and the naturalness of character movements, delivering more stable results in complex scenes. Its image-to-video generation supports driving both motion and camera work using a single reference image, making it suitable for dance performances, product demonstrations, and style extensions. The real-time inference API offers stable performance with no waiting time and is affordably priced.

Alibaba

Wan 2.5 Image-to-Video Preview

Alibaba Tongyi Wan is renowned for its high image quality, strong temporal consistency, and precise prompt adherence, making it ideal for large-scale commercial video generation. Wan 2.5 delivers further improvements in image clarity and prompt adherence, while the preview version facilitates rapid trial-and-error testing. Image-to-video generation supports using a single reference image to drive motion and camera work, making it suitable for dance performances, product demonstrations, and style extensions. The real-time inference API offers stable performance, zero wait times, and affordable pricing.

ByteDance

Seedance 1.5 Pro: Image to Video

The Seedance series offers reliable generation capabilities, making it ideal for production environments. Designed for production-level use, this series prioritizes stability and controllable output. Its image-to-video feature allows a single reference image to drive motion and camera work, making it suitable for dance performances, product demonstrations, and stylistic extensions. The real-time inference API delivers stable performance with no waiting time and is affordably priced.

Alibaba

Wan 2.6 Image to Video

The Wan2.6 series offers reliable generation capabilities, making it ideal for production environments. Designed for production-level use, this series prioritizes stability and predictable output. Image-to-video generation supports driving motion and camera work using a single reference image, making it suitable for dance performances, product demonstrations, and stylistic extensions. The real-time inference API delivers stable performance with no waiting time and is affordably priced.

OpenAI

Sora 2: Image to Video

The Sora 2 series offers reliable generation capabilities, making it ideal for production environments. Designed for production-grade use, this series prioritizes stability and controllable output. Its image-to-video feature uses a single reference image to drive character movements and camera work, making it suitable for dance performances, product demonstrations, and stylistic extensions. The real-time inference API delivers consistent performance with no waiting time and is affordably priced.