Stability AI just released an new SD-XL Inpainting 0.1 model. Here is how to use it with ComfyUI.

Installing SDXL-Inpainting

  1. Per the ComfyUI Blog, the latest update adds “Support for SDXL inpaint models”. With the Windows portable version, updating involves running the batch file update_comfyui.bat in the update folder.
  2. Go to the stable-diffusion-xl-1.0-inpainting-0.1/unet folder,
  3. And download diffusion_pytorch_model.fp16.safetensors or diffusion_pytorch_model.safetensors - I use the former and rename it to diffusers_sdxl_inpaint_0.1.safetensors, because it is 5.14 GB compared to the latter, which is 10.3 GB!
  4. Place it in the ComfyUI models\unet folder.
  5. Re-start ComfyUI.

SDXL Base in-painting

Here is a simplified version of the workflow I previously described using CLIPSeg with the SDXL model for in-painting, using the SDXL base model for in-painting:

ComfyUI workflow using SDXL Base Model for In-painting

  • I used a LoadImage node mask the image (right-click and Open In Mask Editor). The image is then...
  • wired to a GetImageSize node, from the stability-ComfyUI-nodes, just for convenience so I do not have to determine the source image dimensions manually,
  • and wired to a VAEEncodeForInpaint node, used in place of an EmptyLatentImage node - I increased the grow_mask_by so as to avoid visible “edges” around the masked area.
  • Then the latent output from the VAEEncodeForInpaint node is wired up to the KSampler node.
  • The rest of the workflow is the typical SDXL workflow... I merely placed the final generated image beside the source image just for easy comparison.

SDXL-Inpainting

Here is the workflow, based on the example in the aforementioned ComfyUI blog.

ComfyUI workflow using the new SDXL Inpainting 0.1 Model

  • The UNetLoader node is use to load the diffusion_pytorch_model.fp16.safetensors node,
  • And the model output is wired up to the KSampler node instead of using the model output from the previous CheckpointLoaderSimple node.

In this case, this model certainly produced a better image. Of course I cherry picked the best!

Summary

I only used this one image so I draw no conclusions. Is the output always better with this model?The model is very large, and SDXL Base model can also be used for in-painting.

Just a few words about what I noticed:

  • I did not use any words to describe the style of the image, and I left the second clip_l prompt blank.
  • Both models did well to appreciate the art style and generate something that fit properly.
  • I originally just described the masked area, e.g. an old man..., but I found that describing the whole image worked better. Hence my prompt also includes words about a girl with red hair... dogs in the background, etc. I have no idea which is better.
  • But the models both seem to prefer colors present in the rest of the image... I can’t get the old man in a jacket of a different color.