Skip to content
ControlNet

ControlNet

Updated: 2026-05

1. What You’ll Learn on This Page

The prompts we’ve looked at so far were ways to specify “what” to draw. ControlNet is a system that uses a separate image to specify “how” or “where” to draw.

  • Specify “medieval knight” in the prompt
  • Specify “jumping pose” in ControlNet’s Pose
  • Result: A medieval knight is drawn in the specified pose

This enables precise control over composition, posture, and perspective—something that is difficult to achieve with prompts alone.

2. Types of ControlNet

Comfy Cloud comes with over 20 pre-installed ControlNet profiles. Commonly used ones:

Type Input Image Purpose
OpenPose Stick figure skeleton Specifying character poses
Depth Depth map Specifying depth structure
Canny Line art with edge detection Specifying shape outlines
Scribble Hand-drawn rough sketch Specifying rough shapes
Lineart Line art Specifying comic-style outlines
Normal Normal map Specifying 3D surface texture
Tile Identical image Increasing resolution, adding details

The tools that students tend to try first are OpenPose, Depth, and Scribble.

3. How It Works (Intuition)

In a typical workflow:

Prompt → CLIP → Diffusion process → Image

When ControlNet is installed:

Prompt → CLIP ─┐
                   ├→ Diffusion process → Image
Control image → ControlNet ┘

ControlNet nodes analyze control images (such as pose images and depth images) and send additional instructions to the diffusion process, telling it to “create a structure exactly like this.” The prompt handles the content, while ControlNet handles the structure.

4. Comfy Cloud’s ControlNet Templates

The “Introduction” section of the Comfy Cloud templates includes a ControlNet introductory workflow titled “2.2 Creator - Diffusion Guidance”. This workflow uses a Z Image Turbo + ControlNet configuration and is ideal for beginners looking to get started with the technology.

ControlNet is also integrated into Qwen Image Edit 2509 in the Popular category.

5. Basic Workflow (Example using OpenPose)

An advanced version of img2img. Differences from the minimal workflow:

  1. Add an Image Load node (for pose images)
  2. Add an OpenPose Preprocessor node (to extract poses from photos; not needed if you already have stick figure images)
  3. Add a Load ControlNet node (select a model for OpenPose)
  4. Add a Apply ControlNet node
  5. Connect the output of the Apply ControlNet node to the positive input of the K-Sampler

Key parameters for ControlNet-enabled nodes:

  • strength: Control strength (0.0–1.5; typically 0.7–1.0)
  • start_percent / end_percent: The percentage range of the diffusion process during which ControlNet is active (typically 0.0–1.0 to cover the entire process)

6. Pose Extraction Workflow

When using a real-person photo as input to automatically extract poses.

  1. Load any portrait (whether it’s a selfie or stock imagery)
  2. Pass it through the OpenPose Preprocessor node
  3. The result is a stick figure image (with keypoints for the face, hands, and feet connected by lines)
  4. Send that stick figure image to the ControlNet application node
  5. Specify the “character you want to draw” in the prompt and run it

Now you can “draw another person using your own pose.”

7. When to Use Depth

OpenPose is designed almost exclusively for people. If you want to specify the structure of landscapes or objects, Depth is a useful tool.

  1. Load any photo (of a building, interior, landscape, etc.)
  2. Run it through a depth preprocessor (such as MiDaS)
  3. A depth map is generated (grayscale, with the foreground in white and the background in black)
  4. Apply that depth map using ControlNet
  5. Specify “a different world with the same depth structure” in the prompt

“You can transform the layout of the same room from a Scandinavian style to a traditional Japanese architectural style.”

8. Stacking Multiple ControlNets

ControlNet nodes can be daisy-chained.

Example: Applying OpenPose (human pose) and Depth (background depth) simultaneously

  • Prompt: “A warrior holding a sword in a medieval market”
  • ControlNet 1: OpenPose (extracted from a photo of a person’s pose)
  • ControlNet 2: Depth (extracted from a photo of a scene resembling a medieval market)

An image that satisfies both constraints is output. The impact is adjusted based on the balance of the strength values.

9. Estimated Credit Usage

The computational load increases with each additional layer of ControlNet. This is an estimate, not an actual measurement:

  • Standard T2I: 0.3–0.5 cr
  • 1 ControlNet: 1–3 cr per image
  • 2 ControlNets: 2–5 cr per image

Limit the number of test runs during class. Focus on generating the final output using the settings you like best.

10. Exercises (for Class Use)

Exercise A: Pose Control with OpenPose

  • Prepare one selfie (or a photo of a famous pose)
  • Convert it into a stick figure using the OpenPose preprocessor
  • Prompt: “a samurai warrior in traditional armor”
  • Transform yourself into a samurai using your own pose

Exercise B: Generating Images from Rough Sketches Using Scribble

  • Draw a rough sketch on an iPad or paper (e.g., building silhouettes, character outlines)
  • Take a photo with your smartphone and upload it
  • Scribble preprocessor → ControlNet
  • Use a prompt to “bring the sketch to life”

Exercise C: Multiple Characters in the Same Pose

  • Generate 5–6 images from a single pose image by varying the prompts
  • Examples: “Ninja,” “Astronaut,” “Medieval Knight,” “Modern Office Worker,” etc.
  • Experience the sensation of different personas emerging from the same pose

11. Important Notes

  • Retains the style of the training data: Since ControlNet inherits the characteristics of the training data, it may break down with extreme compositions
  • Don’t set strength too high: If you exceed 1.0, the prompt will have no effect and the model will output the control image as-is
  • Model compatibility: There are separate models for SD 1.5, SDXL, and Flux. Check that the model selected in Node matches the checkpoint you are currently using

12. What’s Next

  • LoRA — Additional layers trained on specific art styles or characters
  • Image to Video — Converting still images into videos
  • Algorithm Exposure — Experiments that provide insight into the model’s inner workings, such as CFG extremization and latent space interpolation