Image to Video

Materials

ComfyUI

Image to Video

Updated: 2026-05

1. What You’ll Learn on This Page

It generates short video clips based on still images. Comfy Cloud includes the Wan 2.2 and LTX-2.3 video generation models, which work even on the free plan.

In class, students will gain a firsthand understanding of how video-generating AI works and its current limitations through this hands-on experience. This will then lead to a perspective of using Runway as a tool with a clear understanding of its inner workings.

2. How Video-Generating AI Works (Repost)

As mentioned in “Diffusion Mechanism,” here are the key points:

Image version: Removing 2D noise in latent space
Video version: Removing 3D noise (including the time axis) in latent space

Wan 2.2 and LTX-2.3 are essentially “3D diffusion models.” Concepts such as prompts, sampling, and CFGs are common to image generation. The intuition you’ve gained from working with images can be applied directly to video.

3. Main Models (Comfy Cloud Built-In)

3.1 Wan 2.2

An open-source video generation tool developed by Tongyi Lab in China.

Image-to-Video (i2v): 1 image + text prompt → video
Text-to-Video (t2v): text prompt → video
Standard 5-second clip, 640×640 or 720p
Official estimate: Approx. 11.4 credits per video (including 4-step fast LoRA)

3.2 LTX-2.3

A video generation model developed by Lightricks.

Lightweight and fast, with strong performance in lip-syncing and specific workflows
Includes LoRA models for talkvid and celebvhq
Similar credit consumption (10–15 cr per video)

3.3 Wan 2.2 Animate

A character replacement workflow built into Comfy Cloud templates.

Video + character image → Reconstruct the video using that character
Change the character’s appearance while preserving poses and movements

4. Basic i2v Workflow (WAN 2.2)

Comfy Cloud Templates > Getting Started > “1.2 Starter: From Image to Video” is an introductory workflow for learning.

Key Components:

Load Image — The starting still image
Load Wan 2.2 i2v Model
Load VAE / CLIP (for Wan)
Prompt Encoding
K-Sampler (Video Version) — Combined with 4-step fast LoRA
VAE Decoding (Video Version)
Save Video (mp4 output)

Tip: Some templates include a learning guide, such as “Step 1 - Connect nodes.” Follow the instructions to complete the workflow.

5. Key Parameters (Video-Specific)

Parameter	Description	Recommended Value
frame_count	Total number of frames	81 (5 seconds @ 18 fps)
fps	Frame rate	18
resolution	Resolution	640×640 (Lightweight) / 1280×720 (Standard)
steps	Sampling steps	4 (when used with high-speed LoRA) / 20 (Standard)
prompt	Include movement instructions	“the woman turns her head slowly to the left”, etc.

The key to writing prompts is to include instructions for “movement” rather than just still images. Examples include “a gentle breeze,” “ripples on the water,” or “a person walking.”

6. Limitations of Video Generation (as of May 2026)

Limits that students should experience firsthand during class:

Duration: A realistic target is 5–10 seconds per generation. Anything longer carries a high risk of failure.
Facial Consistency: Faces may change to look like different people midway through (Wan 2.2 experiences this relatively less often).
Physics: The representation of gravity, inertia, and contact is still unreliable.
Multiple Characters: Simultaneous animation of two or more characters is prone to failure.
Text: Text within videos (signs, subtitles) is virtually impossible
Complex movements: Dance and combat scenes are unstable

While these issues have been partially addressed in top-of-the-line commercial models such as Runway Gen-4 and Sora 2, none of them are perfect. The design intent of this course is to ensure that students understand these limitations before proceeding to the Runway exercises.

7. Credit Usage

What can be done within the time available for class (based on a budget of 400 credits per student per month):

Operation	Cost	Monthly Limit
Wan 2.2 5-second video (Standard)	Approx. 11 cr	Approx. 35
LTX-2.3 5-second video	Approx. 10–15 cr	Approx. 25–40
Wan 2.2 Animate (Character Replacement)	Approx. 15–25 cr	Approx. 16–26

When designing lessons, limit the number of videos to “one or two per student.” Conduct trial and error at the image stage, and move on to video production only once you are confident in the results.

8. Tips for Writing Prompts

8.1 Describing Movement

Bad example: “a beautiful landscape” (like a still image)
Good example: “a beautiful landscape, with a gentle breeze blowing through the grass and clouds drifting slowly”

8.2 Specifying Camera Movements

“static camera” (fixed camera)
“slow zoom in” (slow zoom in)
“panning right” (pan to the right)
“dolly forward” (dolly forward)

The choice of camera can drastically change the look and feel of a video production.

8.3 Narrowing the Focus

Limiting the moving elements to a single one creates a sense of stability.

Instead of “the entire character moving,” have “only the character’s face move.”
Instead of “a scene where everything moves,” have “a scene where only the water’s surface ripples.”

9. Exercises (for Class Use)

Exercise A: Animating a Still Image

Select one of your images generated in C-1 or A-1
Load it into the Wan 2.2 i2v workflow
Prompt: Movement appropriate for the scene (wind, water, facial expressions, etc.)
Generate for 5 seconds and observe the results

Exercise B: The Effect of Prompts

Using the same static image as a starting point, vary only the motion prompts
“static, no movement” / “slow zoom in” / “gentle wind, leaves swaying”
Compare the differences in the results

Exercise C: Finding the Limits

Intentionally try difficult movements
Examples: “a person running,” “two people having a conversation,” “text being displayed,” etc.
Observe where things break down → Understand the limits

This is an important experience before moving on to the Runway exercise.

10. Saving Videos

After the workflow runs, a video player will appear in the Save Video node
Click the three-dot menu in the upper-right corner of the player → select Download to save the MP4 file to your computer
You can also redownload past videos from the History panel

11. What’s Next

Prompt Play — Experiment with fun prompts
Algorithm Exposure — Experiments that peek inside the system
To Runway — An introduction to the world of video-generating AI and a bridge to Runway exercises

LoRA Prompt Play