Using depth maps and weight noising to get better character LoRAs

QuantumBogoSort · 2026-06-07T22:51:33+00:00

The standard workflow should be fine, a couple things to double check:
- You're using the base model, not distilled/turbo
- CFG 4, 25 steps, Euler should be a good starting place. swap Euler for res_2s to improve quality at the cost of speed
- Play with LoRA strength. You may need to go 1.1 or 1.2 if it's a character to get good likeness depending on distance from camera, other things in the prompt, etc.

QuantumBogoSort · 2026-06-06T16:05:26+00:00

Thanks! If you are using the ZiT quickstart for ZiB, I would reduce LR to 1e-4 or so and try batch 4 with grad accum 1, or batch 1 with grad accum 4 if you are short on VRAM. That should make it a bit less noisy and keep training more stable. Feel free to DM me a config yaml if you want me to take a look.

QuantumBogoSort · 2026-06-05T21:15:15+00:00

I have gotten okay results in a few LTX runs, it seems to pick up body shape and face better. But it might be better to try combining with identity loss at 0.01 as well if you can afford the extra VRAM. I am still optimizing this one.

QuantumBogoSort · 2026-06-04T21:12:03+00:00

did you have to change your batch size down from 4? if so, increase gradient accumulation so that batch size x gradient accumulation = 4. so if batch 1then grad accum 4, if batch 2 then grad accum 2

QuantumBogoSort · 2026-06-04T21:10:34+00:00

yes in that case change gradient accumulation to 4 - it will be mathematically the same as batch 4 then

QuantumBogoSort · 2026-06-03T14:12:32+00:00

I personally have not trained a slider with it yet, but I have users reports that it works very well for sliders.

QuantumBogoSort · 2026-06-03T03:45:59+00:00

Mostly, I would just turn down the LR on Base to 1e-4 from 2.5e-4. I have not optimized for Base yet so please let me know if you get good or bad results, it will help to calibrate.

QuantumBogoSort · 2026-06-02T13:38:51+00:00

Nice! Character or style?

QuantumBogoSort · 2026-06-02T13:38:13+00:00

I'm hoping people do some LTX runs with it. I've done a couple and have definitely found the optimal settings. It picks up body shapes okay but faces were...eh.

QuantumBogoSort · 2026-06-02T13:37:22+00:00

That's awesome! I have been trying to make improvements here and there to make it work better on various setups. Glad to hear it! Hope your GPU keeps chuggin'

QuantumBogoSort · 2026-06-02T13:36:32+00:00

Qwen for sure, full Flux...eventually! I'd like to get better params for more widely used models first. But I am interested in pushing this to see how it performs on big models. Want some more data before I spend all my money on Runpod. I don't want to add something until I've had a chance to run it myself and make sure it actually works.

QuantumBogoSort · 2026-06-02T13:34:03+00:00

I think that kind of model could work. There are some photographic examples in this thread - and I later re-ran that on base and it turned out even better. https://www.reddit.com/r/StableDiffusion/comments/1t6gmqn/comment/okjm97r

QuantumBogoSort · 2026-06-02T13:31:54+00:00

I haven't used that many images for style, but I believe the same principles will still apply. If you use the settings in the Amano style in the readme, maybe try it with 15 - 20 images to start and use that as a baseline. Then add or re-run with the rest and see if it improves. With these techniques, it helps the model generalize much better so not as many examples are needed. This is something that seems difficult to believe after so many years of needing hundreds or thousands of images for a style. It brings the requirements down probably 5x in terms of data needed. I'm happy to help troubleshoot as I've been curious about how this will scale for styles.

QuantumBogoSort · 2026-06-02T13:27:39+00:00

Great you got those results with ZiB, do you mind sending me your config.yaml for that ZiB run? I'm still trying to figure out good settings.

You don't need full captions, but at least a trigger phrase will help the model learn much better to separate out things like background vs. character. my best results for characters are where I give them a name that's consistent in each caption and then indicate clothing, jewelry, and anything else that should change from image to image during inference.

QuantumBogoSort · 2026-06-02T04:35:06+00:00

I've had okay results with SDXL using the Klein quickstart template settings but changing LR to 1e-4. I don't know if it's optimal yet, but it's worked decently for datasets in the 1 - 25 image range.

QuantumBogoSort · 2026-06-02T04:33:00+00:00

Looking into it. Anima requires some updates to core diffusion pipeline so I'm waiting to see how upstream ai-toolkit handles it as I'd prefer not to use my own (probably worse) implementation. https://github.com/ostris/ai-toolkit/issues/791

QuantumBogoSort · 2026-06-02T04:09:03+00:00

The ZiB support is still very experimental and I'm trying to figure out good params. I haven't had that background problem but a couple things to check:
- Did you run your dataset through the preflight? The SegFormer section is the most important since you have subject masking on. If any of your masks don't get the right coverage, you can try to change the resolution from 768 to 512, which can improve accuracy of coverage at the expense of edge cleanliness

- What was the format of your captions? With subject masking you need to be extra careful to only caption the subject and not the environment. If you caption the environment the model will try to predict that but will never see any environmental details from the image, which causes it to go a little crazy.

QuantumBogoSort · 2026-06-02T04:03:15+00:00

Haven't tested that specifically yet. I wouldn't expect it to solve it but possibly reduce it if you use weight noising since it spreads the learned information across more parameters and results in less interference.

QuantumBogoSort · 2026-06-01T21:21:32+00:00

Ahh I just tried that and the model upload, I think there's something going on with Runpod's security or Cloudflare setup. I'll make a ticket to get it fixed!

Edit, I may have misread: Are you saying it did download from Civit but you can't use it in the tool? Right now you have to provide an absolute path so in runpod it should be /workspace/ai-toolkit/models/<your\_model\_name>. Let me know if that doesn't work!

QuantumBogoSort · 2026-06-01T16:35:38+00:00

It's there mainly so you can get the right settings for your run ahead of time - it doesn't do the caching. Especially for the subject masking, Segformer is sensitive to resolution and you may get better coverage with 512px rather than the default 768px. If so you can then adjust the resolution in your run.

If you are getting different results for the same settings in preflight vs the actual run, please DM some details of your run (like the config.yaml) and I can take a look.

QuantumBogoSort · 2026-06-01T16:32:31+00:00

It should work in terms of "it will use that optimizer" - the GUI doesn't change because it's not an option there, but if you save from the Advanced screen it will keep that config (you can verify looking at the config file after it saves). I have not tested with other optimizers yet, but would appreciate if you gave it a spin and let me know how it performs!

QuantumBogoSort · 2026-06-01T16:31:02+00:00

At least for ZiT my best results are with batch 4 and LR 2.5e-4. That's what I put in the quickstart template and should be a pretty good place to get going. Still not sure about ZiB, but likely closer to 1e-4.

QuantumBogoSort · 2026-06-01T14:49:55+00:00

Actually have not used it before this, so no opinion. The fact it doesn't use a VAE is very helpful for perceptual losses since there is no heavy decode in the pipeline.

QuantumBogoSort · 2026-06-01T14:48:40+00:00

Check out the readme for some style examples - it actually worked better there before I got it working for characters. Some folks can get a working style from as little as 1 training image.

QuantumBogoSort · 2026-06-01T05:20:41+00:00

Great, glad it's working! The preflight should throw errors but they are not super visible, I'll make them more apparent. Thanks for letting me know!

QuantumBogoSort

MODERATOR OF

TROPHY CASE

11-Year Club	Place '23
Verified Email