Less than a week after my post testing diffusers/controlnet-canny-sdxl-1.0, along comes Stability AI’s own ControlNets, which they call Control-LoRAs! Not one but 4 of them - Canny, Depth, Recolor and Sketch models!
Get caught up:
Part 1: Stable Diffusion SDXL 1.0 with ComfyUI
Part 2: SDXL with Offset Example LoRA in ComfyUI for Windows
Part 3: CLIPSeg with SDXL in ComfyUI
Part 4: Two Text Prompts (Text Encoders) in SDXL 1.0
Part 5: Scale and Composite Latents with SDXL
Part 6: SDXL 1.0 with SDXL-ControlNet: Canny
Part 7: Fooocus KSampler Custom Node for ComfyUI SDXL
Part 8: SDXL 1.0 with SDXL-ControlNet: OpenPose (v2)
Part 9: This post!
Installing
Download the Rank 128 or Rank 256 (2x larger) Control-LoRAs from HuggingFace and place them in a new sub-folder models\controlnet\control-lora
.
The former models are impressively small, under 396 MB x 4. Compare that to the diffusers’ controlnet-canny-sdxl-1.0 which comes in at 2.5 GB (fp16) and 5 GB (fp32)!
Also, Stability AI also released stability-ComfyUI-nodes, which is installed in custom_nodes
. It contains:
- a ColorBlend node which is required for the Recolor workflow.
- a GetImageSize node which sounds useful... if ComfyUI used the same primitive type for image dimensions.
- a ControlLoraSave to “create a Control Lora from a model and a controlnet.”
Experiment with Depth Map
The ControlNet control-lora-depth
model will apply depth based on a reference depth map image. The example given actually uses a custom node to generate a depth map, but... being too lazy to create a depth map, I just downloaded the shark example from ComfyUI.
I wired it up as follows:
- Simply use an ControlNetApply node to the conditioning and image.
- Using a
strength
a high strength of1
results in a grid like artefact, but as you can see,0.4
works well for me.
Stability AI has a Portrait Depth Estimation API on Clipdrop for creating a depth map from a reference image.
Experiment with Canny Edge
The ControlNet control-lora-canny
model will follow the edges of a reference image.
The workflow is very similar to the one I created for SDXL 1.0 with SDXL-ControlNet: Canny:
- Compared to the workflow above, the reference image is passed through a Canny node to detect the edges.
- I am using a ControlNetApplyAdvanced node, but the standard ControlNetApply node works as well.
Experiment with Recolor
The ControlNet control-lora-recolor
model will re-color a black and white photo.
The workflow required is more complicated and requires the ColorBlend node from the Stability AI Custom Nodes. Based on the Stability AI example, the main things to note are:
- The input
latent_image
to the KSampler node is a VAEEncode’d version of the black and white image, it is not an empty latent image. - Therefore, the KSampler must not add noise (disable
add_noise
). - I used the same prompt as the example
color photograph, detailed, DSLR, natural skin, highly detailed
but a descriptive prompt works too, e.g.a pretty young woman with pink hair
. - The output
image
after VAEDecode must pass through a ColorBlend node to merge the originalbw_layer
andcolor_layer
. - Also, the standard ControlNetApply node works as well.
I also tried to re-color and genuine B&W war photo, and it worked pretty well. I won’t post an image of that experiment, as I do not know who owns the copyright.
Experiment with Sketch
The ControlNet control-lora-sketch
model will color black and white line art.
I used SDXL to generate line art as the input image and I initially got terrible results. The image does get colored, but it is, how do I explain... blotchy? I have to experiment more:
- If I stop ControlNetApplyAdvanced
end_percent
at0.5
then I do not get good results. Increasing to0.7
was better. - I am also constantly getting
RuntimeError: Tensor on device cuda:0 is not on the expected device meta!
after each run (I am using the Rank 128 model).
Experiment with Revision?
Stability AI also describes Revision as a “novel approach of using images to prompt SDXL” (image mixing). Get the revision clip_vision_g.safetensors
and chuck it in the models\clip_vision
folder. I can appreciate the concept - provide sample images and have SDXL generate new images that are somehow... similar.
But after trying it numerous times, I have no idea what this model is trying to do! I seem to get a batch of random images that are nothing like either source image, sometimes filled with cars, despite both my source images being pictures of people. Weird. This will take me more time to figure out...
Update on 20 Aug 23: A day later, I stand corrected - read on to discover Revision along with me!