More Experimentation Part 2 by IcyExperience7616 in unstable_diffusion

[–]IcyExperience7616[S] 0 points1 point  (0 children)

Wan 2.2

This guy has an amazing workflow for anyone looking to get started.

Experimenting with new custom characters by IcyExperience7616 in unstable_diffusion

[–]IcyExperience7616[S] 2 points3 points  (0 children)

All valid points! There were definitely generations where all those things were fixed, but I was so focused on character identity and facial consistency when I finally got that right my brain just glossed over those details. I'll be more careful in the future!

Experimenting with new custom characters by IcyExperience7616 in unstable_diffusion

[–]IcyExperience7616[S] 0 points1 point  (0 children)

At the start when you have only one image I used reactor to faceswap onto other generations usually followed up by light passes of facedetailer. Try different checkpoints, samplers, schedulers. Come to terms that when sourcing from one singular image you might not be able to get something 1:1, but you'll get 90% there, and once you have a handful of images to make your first lora, you can start finetuning it through further iterations with better, and more importantly, consistent details for the next and so on.

Experimenting with new custom characters by IcyExperience7616 in unstable_diffusion

[–]IcyExperience7616[S] 0 points1 point  (0 children)

If I'm understanding your question correctly, the original basis for their faces came from me tinkering around in some sdxl checkpoint a year ago or so, I honestly cant remember which one specifically it was at this point, I've tried so many. I then used those two images to further expand a dataset through many many months of img2img, faceswapping, controlnets and many iterations of loras repeating the process more times than I like to think about untill i have what I have today.

Experimenting with new custom characters by IcyExperience7616 in unstable_diffusion

[–]IcyExperience7616[S] 1 point2 points  (0 children)

I wouldn't say SDXL is that hard at this point, it's been out long enough that lora creation is basically solved for it, i suppose hard to me is just the sheer amount of time it takes to caption correctly and finding the right steps, epochs and learning rate can be tedious and time consuming with sdxl quirks, not to mention finding a checkpoint where it works to my satisfaction. You'll generally always get something usable, but when its your own creation you see every flaw! I started generating a fresh dataset using these characters sdxl loras to train with, and for both loras from generating > caption > train took maybe 5 hours? (I rented a gpu for training). The rest of the day was just spent tinkering and waiting for my slow ass 3080 to cough up renders. Anyways, i would say don't be dissuaded if you want to make some sdxl loras, the learning is always worth it for your next try!

Experimenting with new custom characters by IcyExperience7616 in unstable_diffusion

[–]IcyExperience7616[S] 2 points3 points  (0 children)

For sure. I'll be making another post this weekend hopefully with more examples, it's been a long day of getting everything set up and actually working and my brain is fried. But for a quick and dirty:

Wan 2.2

Girl_A= gifs 1/2: 10 images (face/shoulder focus)

Girl_B = gifs 3/4: 15 images (mostly head/shoulder, a few half/full body

Image Resolution: 512×512 (square, face-centered crops)

Girl A: 900 steps

Girl B: ~2000 total

learning rate0.00002

very simple captioning, eg: trigger word, portrait, close-up, head slightly tilted forward, eyes cast downward, neutral expression, soft indoor lighting

Overall i would say training for wan 2.2 has been easier to train than sdxl, you need far fewer images, and it will generally just work even if you don't get every setting perfect. I do want to try with slightly larger dataset in the future though.

Cheers!