I previously tried Thibauld’s SDXL-controlnet: OpenPose (v2) ControlNet in ComfyUI with poses either downloaded from OpenPoses.com or created with OpenPose Editor. Here are a few more options for anyone looking to create custom poses.
The problem with SDXL
Many of Stable Diffusion / SDXL images that include a person are either close up shots or full body shots in static poses (standing or sitting). Rarely are there images with multiple people. Additionally, SDXL cannot count, and therefore, a person may have multiple hands or feet and most frequently, varying number of fingers. In addition, figures further away will often have gnarled faces... something off and disturbing!
I previously tried to create an image with multiple people, as I described in my post on Composite Latents. Then I posted about using OpenPose and Control-LoRAs to control the way people are generated.
Today, I will expand on that, trying a few tools and methods to generate more anatomically “accurate” people in interesting poses - with the correct number of hands, legs and fingers. Fixing faces and fingers post hoc will be a topic for another day though (i.e. fixing them after the image is already generated).
Here are the tools I tried:
- hnmr293’s Posex on GitHub, which is also directly accessible at hnmr293.github.io/posex/ - the very basic and similar to OpenPose Editor, so I won’t test it further.
- Yu Zhu’s open-pose-editor on GitHub, which is also directly accessible at OpenPoseAI.com.
- PoseMy.Art, which is directly accessible at app.PoseMy.art.
In the process, I discovered a few free 3D modelling tools - while these not able to generate OpenPose “skeletons”, they may be used to generate Canny Edge maps or depth maps:
- PoseManiacs - this site provides reference pose images for artists, but I figured to test them with the Canny Edge Control-LoRA.
- VRoid Studio - a downloadable application to create 3D anime models.
- MakeHuman - an open-source downloadable 3D modelling application.
I also tried to find other pose-related LoRAs...
- Action SDXL by Zovya, trained to generate “dynamic poses, action camera angles and lots of energy and movement to your images.”
However, it lacks any pre-defined poses, and the only alternative for quick generation is to Set Random Pose under the File menu. Hands and feet are “added” to extremities but can be exported separately as a depth map or canny map.
Alas, I do not get good good results trying to merge these with the OpenPose - the depth map has very hard edges at no not lead to smooth transitions to the pose.
- Set the image size in the floating window
- Adjust the camera until the preview looks right
- Click Generate on the toolbar
- Next to the preview window are the generated outputs, click to download them:
- the first is the pose
- the second is the depth map for the hands and feet (which can be toggled off in the Settings)
- the third is a 3D model of the hands and feet
- the fourth is a Canny edge map
Below is a ComfyUI workflow using the pose and the Canny edge map instead. The first ControlNet “understands” the OpenPose data, and second ControlNet “understands” the Canny map:
You can see that the hands do influence the image generated, but are not properly “understood” as hands. The feet though are consistently accurate.
PoseMy.art is the most powerful on-line pose editor, designed for artists, but with a good tools to export poses too! The free version limits the number of models and poses available, and has a limitation of “5 uses per session” which can be overcome by “Refresh[ing] the page for more uses”.
After getting your pose just right, then:
- Click the last button in the toolbar, which oddly, looks more like a Crop tool than an Export tool,
- From the bottom tool bar, adjust the image size, then
- Adjust the viewport using the left-/right-/middle- mouse buttons and scroll-wheel.
- Click respective buttons to export as a depth map, a Canny edge, or as an OpenPose “skeleton” with or without hands.
- For a Depth map, make sure to adjust the adjust the Depth slider so that there is sufficient variation between the limbs in the foreground and background.
- Regarding the Canny map - I did not find this usable because the hard lines are too strong, and there is some sort of shading or shadow applied.
Experiment to Compare ControlNets
To compare, here are the outputs I got using the Canny edge map, depth map, and OpenPoses without and with hands, as exported from PoseMy.art. The images are overlayed with the ControlNet images. As you can guess, the first two use the Stability AI Control-LoRAs and the last two use the SDXL-controlnet: OpenPose (v2) ControlNet.
I think using Control-LoRA’s Depth Map model gives the best results.
Experiment with Depth Maps
So I was wondering if I could use this method to “fix” hands on images I previously generated. In this example, the original image is the second one in the grid.
Using PoseMy.art, I created a Depth Map that roughly matches the original pose. I did this by eye, so of course they do not line up. I tested various ControlNetApply Strengths, and found that the output starts to conform to the model at around strength
3D Modelling Tools
None of these tools can generate OpenPoses... but I wanted to see if images exported could be used as a Canny Edge map source.
PoseManiacs is a great resource for artists. I downloaded Pose 0000698: Pose of receiving an attack from a sword:
I also tried VRoid Studio, which has a few Poses baked-in. Under the Look > Outline and increase the outline widths. Then click the Camera icon and under Poses & Animations select a Preset Pose and set the Pic Size. Finally, click the Camera icon to export to a PNG.
The open-source MakeHuman tool is similar. Load a pre-defined pose under the Pose/Animate > Pose tabs. Select the resolution under the Rendering tab and click the Render button, then save the PNG.
Action SDXL LoRA
Alas, it’s heavily biased towards female superheros!
For the prompt
full body, runner from above, rushing towards viewer, hand outstretched zdyna_pose, I get these from random seeds at cfg
0.7 with all else being equal.
The first output I got was... a female Flash? The second more closely achieves what I had in mind... The third is not a superhero and very creative, though the figure is running away from the camera... The fourth, a superhero again... By this time, I was about to give up. So I gave in and changed the prompt, adding the phrase
posing in zdyna_pose, dutch angle, foreshortening, motion blur, from above, from below, blurred foreground verbatim. I know this does not make sense, but, this being the author’s keyword set, I just tried it anyway. And guess what? A great result... but all are still women though.
Image slider is Image Compare Viewer by Kyle Wetton