Crazy fast image generation with LCM LoRA for SDXL

Stable Diffusion keeps improving at an astounding pace! This time, it’s the idea of distilling a model into a Latent Consistency Model (LCM) for very, very fast image generation with a quality trade-off. On 24 Oct 2023, the distilled Segmind Stable Diffusion 1B (SSD-1B) model was released, followed by a better implementation in the form of Latent Consistency LoRAs for SDXL and SDD-1B released on 9 Nov 2023.

Explanations by way of the links above:

Latent Consistency Models (LCM) are a way to decrease the number of steps required to generate an image with Stable Diffusion (or SDXL) by distilling the original model into another version that requires fewer steps (4 to 8 instead of the original 25 to 50). Distillation is a type of training procedure that attempts to replicate the outputs from a source model using a new one.”

“The Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of the Stable Diffusion XL (SDXL), offering a 60% speedup while maintaining high-quality text-to-image generation capabilities.”

Installing

First off, make sure you update to the latest ComfyUI. See the sample workflow in response to a issue raised a couple of days ago.

If you, like me, already have SDXL Base 1.0, then just download and apply the LCM LoRA for SDXL, which “can then be applied to any fine-tuned version of the model without having to distil them separately.”

Alternatively, the full LCM SDXL model can be used avoid the two steps of first loading SDXL and then loading the LoRA.

Similarly, you can download segmind/SSD-1B and the apply the LCM LoRA for SSD-1B, or you can just download the full LCM SSD-1B model.

ComfyUI workflow

As usual, I only regurgitate what I have read elsewhere:

The ideal inference steps is between 2 and 8
LCMs must be used with the lcm sampler and the sgm_uniform scheduler.
The classifier-free guidance_scale cfg must be between 1 (which ignores the negative prompt) and 2
Since full LCM model is loaded using a UNETLoader node, instead of the usual CheckpointLoaderSimple node.

Here is a flow I used to quickly test the speed of the three models:

using the base SDXL model, a typical 20-step flow used by many of my previous posts
using the base SDXL model with the LCM LoRA, with only 3 steps
using the LCM derived from SSD-1B model, also with 3 steps (without the need to load a LoRA)

On my GeForce 2060, SDXL base takes just under a minute for a batch size of 2, the LCM LoRA is 6.5x faster at about 7 seconds, and the last the fastest, though only by a second. Amazing.

Bit of a warning: My flow above often crashes / hangs Windows. I do not know why but what I ultimately did was to run the first two models together, but run the SSD-1B model separately. To disabling the nodes,set to Mode = never.

❮ Older

Newer ❯