ControlNet

Materials

ComfyUI

ControlNet

Updated: 2026-05

1. What You’ll Learn on This Page

The prompts we’ve looked at so far were ways to specify “what” to draw. ControlNet is a system that uses a separate image to specify “how” or “where” to draw.

Specify “medieval knight” in the prompt
Specify “jumping pose” in ControlNet’s Pose
Result: A medieval knight is drawn in the specified pose

This enables precise control over composition, posture, and perspective—something that is difficult to achieve with prompts alone.

2. Types of ControlNet

Comfy Cloud comes with over 20 pre-installed ControlNet profiles. Commonly used ones:

Type	Input Image	Purpose
OpenPose	Stick figure skeleton	Specifying character poses
Depth	Depth map	Specifying depth structure
Canny	Line art with edge detection	Specifying shape outlines
Scribble	Hand-drawn rough sketch	Specifying rough shapes
Lineart	Line art	Specifying comic-style outlines
Normal	Normal map	Specifying 3D surface texture
Tile	Identical image	Increasing resolution, adding details

The tools that students tend to try first are OpenPose, Depth, and Scribble.

3. How It Works (Intuition)

In a typical workflow:

Prompt → CLIP → Diffusion process → Image

When ControlNet is installed:

Prompt → CLIP ─┐
                   ├→ Diffusion process → Image
Control image → ControlNet ┘

ControlNet nodes analyze control images (such as pose images and depth images) and send additional instructions to the diffusion process, telling it to “create a structure exactly like this.” The prompt handles the content, while ControlNet handles the structure.

4. Comfy Cloud’s ControlNet Templates

The “Introduction” section of the Comfy Cloud templates includes a ControlNet introductory workflow titled “2.2 Creator - Diffusion Guidance”. This workflow uses a Z Image Turbo + ControlNet configuration and is ideal for beginners looking to get started with the technology.

ControlNet is also integrated into Qwen Image Edit 2509 in the Popular category.

5. Basic Workflow (Example using OpenPose)

An advanced version of img2img. Differences from the minimal workflow:

Add an Image Load node (for pose images)
Add an OpenPose Preprocessor node (to extract poses from photos; not needed if you already have stick figure images)
Add a Load ControlNet node (select a model for OpenPose)
Add a Apply ControlNet node
Connect the output of the Apply ControlNet node to the positive input of the K-Sampler

Key parameters for ControlNet-enabled nodes:

strength: Control strength (0.0–1.5; typically 0.7–1.0)
start_percent / end_percent: The percentage range of the diffusion process during which ControlNet is active (typically 0.0–1.0 to cover the entire process)

6. Pose Extraction Workflow

When using a real-person photo as input to automatically extract poses.

Load any portrait (whether it’s a selfie or stock imagery)
Pass it through the OpenPose Preprocessor node
The result is a stick figure image (with keypoints for the face, hands, and feet connected by lines)
Send that stick figure image to the ControlNet application node
Specify the “character you want to draw” in the prompt and run it

Now you can “draw another person using your own pose.”

7. When to Use Depth

OpenPose is designed almost exclusively for people. If you want to specify the structure of landscapes or objects, Depth is a useful tool.

Load any photo (of a building, interior, landscape, etc.)
Run it through a depth preprocessor (such as MiDaS)
A depth map is generated (grayscale, with the foreground in white and the background in black)
Apply that depth map using ControlNet
Specify “a different world with the same depth structure” in the prompt

“You can transform the layout of the same room from a Scandinavian style to a traditional Japanese architectural style.”

8. Stacking Multiple ControlNets

ControlNet nodes can be daisy-chained.

Example: Applying OpenPose (human pose) and Depth (background depth) simultaneously

Prompt: “A warrior holding a sword in a medieval market”
ControlNet 1: OpenPose (extracted from a photo of a person’s pose)
ControlNet 2: Depth (extracted from a photo of a scene resembling a medieval market)

An image that satisfies both constraints is output. The impact is adjusted based on the balance of the strength values.

9. Estimated Credit Usage

The computational load increases with each additional layer of ControlNet. This is an estimate, not an actual measurement:

Standard T2I: 0.3–0.5 cr
1 ControlNet: 1–3 cr per image
2 ControlNets: 2–5 cr per image

Limit the number of test runs during class. Focus on generating the final output using the settings you like best.

10. Exercises (for Class Use)

Exercise A: Pose Control with OpenPose

Prepare one selfie (or a photo of a famous pose)
Convert it into a stick figure using the OpenPose preprocessor
Prompt: “a samurai warrior in traditional armor”
Transform yourself into a samurai using your own pose

Exercise B: Generating Images from Rough Sketches Using Scribble

Draw a rough sketch on an iPad or paper (e.g., building silhouettes, character outlines)
Take a photo with your smartphone and upload it
Scribble preprocessor → ControlNet
Use a prompt to “bring the sketch to life”

Exercise C: Multiple Characters in the Same Pose

Generate 5–6 images from a single pose image by varying the prompts
Examples: “Ninja,” “Astronaut,” “Medieval Knight,” “Modern Office Worker,” etc.
Experience the sensation of different personas emerging from the same pose

11. Important Notes

Retains the style of the training data: Since ControlNet inherits the characteristics of the training data, it may break down with extreme compositions
Don’t set strength too high: If you exceed 1.0, the prompt will have no effect and the model will output the control image as-is
Model compatibility: There are separate models for SD 1.5, SDXL, and Flux. Check that the model selected in Node matches the checkpoint you are currently using

12. What’s Next

LoRA — Additional layers trained on specific art styles or characters
Image to Video — Converting still images into videos
Algorithm Exposure — Experiments that provide insight into the model’s inner workings, such as CFG extremization and latent space interpolation

img2img and Inpaint LoRA