Skip to content
Minimum Workflow

Minimum Workflow

Updated: 2026-05

1. What You’ll Learn on This Page

  • Configuring the minimal workflow for generating images from text
  • Ensuring that the function of each node can be explained in a single sentence
  • Making the internal processing of the diffusion model visible as “connections between nodes”

The goal here is to get used to working with nodes. We’ll focus on the quality of the generated images in future sessions.

2. Overview of the Workflow

This is the default workflow that appears when you open Comfy Cloud. With this alone, you can generate a single image from text.

There are six main types of nodes, and data flows from left to right.

  1. Load Checkpoint
  2. CLIP Text Encoding × 2 (for positive and negative cases)
  3. Empty Latent Image
  4. K-Sampler
  5. VAE Decode
  6. Save Image

The colors of the lines have specific meanings. They represent the types of data flowing between nodes, such as the model (purple), CLIP (yellow), latent image (pink), and pixel image (blue).

3. The Role of Each Node

3.1 Loading a checkpoint

A node that loads the model from a disk. A single .safetensors file actually contains three components.

  • Model: The core of the diffusion process (the component that reduces noise)
  • CLIP: A “text interpreter” that converts text into numerical vectors
  • VAE: A “transformer” that moves between the latent space and the image (pixels)

That’s why there are three output pins.

By default, Stable Diffusion 1.5 (v1-5-pruned-emaonly-fp16.safetensors) is loaded. Comfy Cloud comes pre-installed with over 900 models.

3.2 CLIP Text Encoding (Prompt)

A node that converts text written by humans into feature vectors that AI can process internally. Use the same node twice, one for positive examples and one for negative examples.

Positive side (example)

Beautiful scenery, nature, glass bottle, landscape, purple galaxy bottle

Negative side (e.g.)

text, watermark

In the “Negative” field, enter the elements you don’t want to appear in the image.

3.3 Latent Images of the Sky

A node that prepares the “noise foundation” in latent space. Specify the width, height, and batch size.

It’s important to note that the work here is not done in pixel space. The diffusion model performs most of the generation within a numerical array known as the “latent space,” where the image is highly compressed. Only at the very end is it unfolded through a VAE into an image that humans can see.

3.4 K-Sampler

The heart of the workflow. It executes the diffusion process (a process that gradually reduces noise to bring out the image).

Key Parameters

  • Seed: A seed for randomness. Results are fully reproducible if the same seed is used.
  • Post-generation control: How to handle the seed for each execution (randomize / fixed / increment)
  • Steps: The number of times noise is reduced. A higher number results in greater detail but increases the time and credits required.
  • cfg: Fidelity to the prompt. A low value allows for more freedom; a high value ensures fidelity, but an excessively high value can cause the image to break down
  • Sampler Name: The algorithm used to reduce noise. Choose one that works well with the model, such as euler or dpmpp_2m
  • Scheduler: The strategy for reducing the amount of noise at each step

All of these factors affect the results. On the next page, “Parameters,” we’ll adjust them one by one and compare the results.

3.5 VAE Decoding

A node that converts a sequence of numbers in the latent space into images (pixels) that humans can see. VAE = Variational Autoencoder.

During training, the data is compressed from “image → latent space,” and during generation, it is expanded from “latent space → image.” Connect the vae pin from the checkpoint to the input.

3.6 Saving Images

A node that exports a pixel image to a file. The prefix filename_prefix determines the beginning of the output filename.

In Comfy Cloud, you can download generated images to your computer by right-clicking on them and selecting “Save Image.”

4. Run

  1. Click the Run button in the upper-right corner of the screen.
  2. A preview image will appear in the K-Sampler window, and as the noise is gradually reduced, the subject will become visible.
  3. With SD 1.5, 512×512 resolution, and 20 steps, the process takes about 3 to 5 seconds to complete.

During execution, a progress bar appears at the top of the screen, and the node currently being processed is highlighted with a green border.

5. Estimated Credit Usage

Actual results from the free plan (400 credits/month).

Model Resolution Settings Per Image Number of Images per 400 Credits
SD 1.5 512×512 20-step Euler Approx. 0.3–0.5 cr Approx. 1,000–1,200
Z Image Turbo 1024×1024 Standard Template Approx. 2 cr Approx. 200

If you’re using an SD 1.5 base, you can keep changing the seed and spamming the process as many times as you like without running out of budget. Z Image Turbo is geared more toward final output.

You can check your credit balance by clicking on your avatar in the upper-right corner of the screen.

6. Give it a try

A quick tip to help you get used to Node. Try each one out and see for yourself what happens when you change different settings.

  • Change the seed: Enter a different value in the K-Sampler seed field → The image will change even if the prompt remains the same
  • Rewrite the prompt: Change the positive side (e.g., purple galaxy bottlered sunset wine glass)
  • Change the size: Set the width/height of the empty latent image to 768×768 → This increases computational load and credit consumption
  • Change the number of steps: Compare K-Sampler steps at 5, 20, and 40 → 5 results in a grainy image, while 40 produces a detailed one

7. What’s Next

  • Parameters — Compare the effects of steps, CFG, sampler, and seed side by side
  • Node Philosophy — Why a node-based approach? Differences from the Stable Diffusion Web UI
  • ControlNet — Specify composition and pose using a separate image