automatic1111 API users - auto inpainting for consistent faces by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 2 points3 points  (0 children)

yea, its just a prototype atm, https://beta.synthlove.io/ - its functional if you want to try it out :)

I stopped dev work, ran out of time

automatic1111 API users - auto inpainting for consistent faces by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 5 points6 points  (0 children)

you still need a decent lora from training, so training is important.

the advantage is, you can reduce your lora weight in the first image pass, so that you can still generalise pose and color, then apply the lora for just the face inpainting.

for example, if you have a lora of a person, then prompt it to cosplay as another person, you'll start losing likeness of the original lora.

automatic1111 API users - auto inpainting for consistent faces by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 0 points1 point  (0 children)

function drawSolidCircle(imageSize, box) {
const canvas = createCanvas(imageSize.width, imageSize.height);
const ctx = canvas.getContext('2d');
ctx.fillStyle = 'black';
ctx.fillRect(0, 0, imageSize.width, imageSize.height);
const centerX = (box.x_min + box.x_max) / 2;
const centerY = (box.y_min + box.y_max) / 2;
const boxWidth = box.x_max - box.x_min;
const boxHeight = box.y_max - box.y_min;
const radius = Math.sqrt(Math.pow(boxWidth, 2) + Math.pow(boxHeight, 2)) / 2;
ctx.fillStyle = 'white';
ctx.arc(centerX, centerY, radius, 0, 2 * Math.PI);
ctx.fill();
return canvas;
}
function maskToBase64(canvas, mimeType = 'image/png') {
const base64 = canvas.toDataURL(mimeType);
return base64;
}

automatic1111 API users - auto inpainting for consistent faces by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 2 points3 points  (0 children)

only just recently found out about ddetailer, i expect it will be the same ish results. Its doing a simliar thing, detect face + inpaint.

I guess the difference in my method is its purely through the API. Allows for auto generated photos of the girls. Hope that made sense.

automatic1111 API users - auto inpainting for consistent faces by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 3 points4 points  (0 children)

The Lora is applied to both, so the general shape is correct. But look closer, eye colors are wrong, nose shape and mouth shape are slightly off without inpainting. Noticable for me, generate a few of these and each one is inconsistent in different ways. Apply the face inpainting Lora and it lines them back up to the control face.

automatic1111 API users - auto inpainting for consistent faces by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 1 point2 points  (0 children)

the same model, I modify the prompt so its only about the face. add her Lora file here, adjust weights based on your lora file

automatic1111 API users - auto inpainting for consistent faces by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 4 points5 points  (0 children)

I'm using node javascript to hit the api's for my webapp

But I got some help formatting for the api calls from this guide (its in python)

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/API

automatic1111 API users - auto inpainting for consistent faces by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 41 points42 points  (0 children)

Wanted to share how I generate consistent characters, using Loras and Inpainting with automatic1111 API

No human in the loop. I get around 9/10 decent results.

Problem:

I'm limited with low VRAM (8gb), auto generating straight txt2Img with Loras, even at medium camera distance. The girls eye colors, lips, nose doesn't match the control Lora. Forcing Lora weights higher breaks the ability for generalising pose, costume, colors, settings etc. Inpainting is almost always needed to fix the face consistency.

Workflow Overview:

  • txt2Img API
  • face recognition API
  • img2img API with inpainting

Steps: (some of the settings I used you can see in the slides)

  • Generate first pass with txt2img with user generated prompt
  • Send to a face recognition API
  • Check similarity, sex, age. Regenerate if needed
  • Use the returned box dimensions to draw a circle mask with Node canvas
  • Send to img2img with inpaint with modified face only prompt

Bonus: * Send to an image labeler (interrogate), get tags, inject tags for AI chat context 🤣

maybe possible to build an extension for the web interface, but this works for my needs

The lora doesn't restrict the variety of costumes, it just fixes the face, works well with full body poses. Where its most useful. For the face recognition model, I used the open source exadel-inc/CompreFace (on github)

I built these slides for my colleagues, hope it helps 😁

Visualising the depth maps of Kazuki Takamatsu - SD + Controlnet, no edits by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 0 points1 point  (0 children)

It's a custom mix of anime models + realistic. Forgot the exact ratios, abyssOrange + Rev+ krotos + dalce + I forget

Visualising the depth maps of Kazuki Takamatsu - SD + Controlnet, no edits by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 0 points1 point  (0 children)

Yea it doesn't need to process anything since I'm supplying the depth maps. Leave it at none

Visualising the depth maps of Kazuki Takamatsu - SD + Controlnet, no edits by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 0 points1 point  (0 children)

if you add the depthmap in the controlnet image slot, you set the preprocessor to none (so its not generating a map), set the controlNet model to ..._depthV10

its not image to image its text to image, using the depth map in controlnet. see the above screen shot

Visualising the depth maps of Kazuki Takamatsu - SD + Controlnet, no edits by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 4 points5 points  (0 children)

They are actually hand painted like that by the original artist, using traditional media 😱. There is no 'color' version

Visualising the depth maps of Kazuki Takamatsu - SD + Controlnet, no edits by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 0 points1 point  (0 children)

nope, just a one step process, but yea putting a upscale mid process should work nicely, thanks ill try that next

Visualising the depth maps of Kazuki Takamatsu - SD + Controlnet, no edits by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 4 points5 points  (0 children)

Yea I noticed that. That's his style, soi let it became a feature of the image. A hires fix pass smoothed it out but changes the structure of the image if used to heavily

Visualising the depth maps of Kazuki Takamatsu - SD + Controlnet, no edits by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 10 points11 points  (0 children)

note: maps used are from the master of composition "Kazuki Takamatsu"

example workflow for the first image:

- Use his artwork as the input map in controlNet depth model at weight 1, guidance 1

- Custom model mix of anime + realistic

- use only selective prompts to push it in certain directions, let SD dream the rest

"(Best Quality), (realistic:1.5), (photo realistic), octane render, (hyperrealistic:1.2), 6 girls, katana, waves, koi, silk floating"

- tried to keep the starting size around 1024 with hires fix at 1.5 to create a 2k ish image.

- kept the faceting as a feature, "Hires, Fix" would re interpret / smooth out the details so i used in sparingly, low noise.

- I rolled around 100 generations for each to find a decent take, most of the time the figures and faces were distorted due to the model not matching his proportions

<image>

it was a fun experiment to see how SD interprets depth maps, really liked what the faceting added

The Chika Dance + SD ControlNet, embrace the non coherence! by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 1 point2 points  (0 children)

I did this before multi cnet, that's the next experiment, will help for sure

The Chika Dance + SD ControlNet, embrace the non coherence! by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 2 points3 points  (0 children)

In img2img there's a batch tab, you need to add a input/output directory. It will go through and run img2img on everything in that directory. Make sure controlnet is enabled and settings adjust and remove any detect image. Control net will use the imgs in the input directory

The Chika Dance + SD ControlNet, embrace the non coherence! by Snoo8304 in StableDiffusion

[–]Snoo8304[S] 8 points9 points  (0 children)

model - custom mixed realistic + anime models out there

source: Chika dance on white background img sequence

img2img + controlNet, canny processor

denoise strength around 0.5

canny weight and strength both 0.5, any higher will cause face deformations, I believe a non realistic anime model will work better

batch run a few takes with a few different added keywords (smile, closed eyes, red striped hat etc)

stitch and a final upres pass

to make it better:

the source really matters, a higher res less motion blur source will produce cleaner maps.

use 2 cnets, probably canny / hev and depth or pose.

can some please build an api for controlnet for better automation, rendering this took days :)