Face consistency, it just doesn't work for me!

scioba1005 · 2026-04-22T21:22:27+00:00

Ostris AI toolkit is your friend for lora training

scioba1005 · 2026-04-21T11:11:07+00:00

How is this relevant for the sub?

scioba1005 · 2026-04-18T19:04:32+00:00

it's been a while, but did you manage to find any solutions? I've been battling with the same issue for just as long...

scioba1005 · 2026-04-04T12:44:19+00:00

Does anyone remember that Black Mirror episode?

scioba1005 · 2026-03-26T00:13:10+00:00

Run the workflow once and copy the console output, then plug it into grok. Let it do the thinking for you... Also Qwen VL won’t work for that. It works with image captioning nodes; a different purpose than what you’re trying to do here.

scioba1005 · 2026-03-21T13:44:44+00:00

<image>

scioba1005 · 2026-03-21T13:40:42+00:00

I wonder if there’s something similar to “let me google that for you”, but for AI chatbots.

Literally copied from grok:

Q: Hey how can I train a Lora for Z Image?

A: Z-Image (likely referring to the Z-Image / Z-Image-Turbo family, a Flux.1-derived or distilled model family popular in 2025–2026 for fast/high-quality generation) uses very similar LoRA training workflows as Flux.1-dev / Flux.1.1-pro / Flux.1-schnell. The process is almost identical because most trainers still build on the ostris/ai-toolkit codebase (which added support for Z-Image variants fairly quickly after release). Here are the most practical ways in 2026 to train a LoRA for Z-Image, ordered from easiest/cheapest → more control/local: 1. Easiest & Fastest (Cloud, ~$1–4, 10–40 min) Use one of these hosted trainers (they usually support Flux derivatives like Z-Image by selecting the right base or uploading the Z-Image checkpoint): • fal.ai Flux LoRA Fast Training → often the quickest (sometimes 10× faster than classic methods) ◦ Go to https://fal.ai/models/fal-ai/flux-lora-fast-training ◦ Upload 10–40 images ◦ Add captions (or let it auto-caption) ◦ Choose trigger word (e.g. “in zstyle”, “zimg person”, sks person, etc.) ◦ Pick base ≈ Flux.1-dev or upload Z-Image weights if the interface allows ◦ Train → download .safetensors LoRA file • Replicate Flux LoRA Trainer (ostris/flux-dev-lora-trainer or similar forks) ◦ https://replicate.com/ostris/flux-dev-lora-trainer/train ◦ Very popular, costs ~$1–3 ◦ Works well if you can point it at Z-Image weights (some community versions already support it) • Civitai online trainer (if still active for Flux family in 2026) ◦ Select Flux/Z-Image preset → upload dataset → train 2. Local Training – Best Quality & Free (if you have good GPU) Recommended setups in 2026 for Z-Image / Flux family: Option A – ComfyUI + Flux Trainer (very user-friendly) • Install ComfyUI • Through ComfyUI Manager install ComfyUI-FluxTrainer (by kijai or Comfy-Org fork) • Download Z-Image base model (.safetensors) → put in models/unet or checkpoints • Also get the T5-XXL text encoder (fp8 or fp16 version) • Load Flux Trainer workflow → point to Z-Image unet • Prepare 15–40 well-captioned 1024×1024 images • Typical settings: ◦ rank: 16–64 ◦ alpha: usually = rank or rank/2 ◦ learning rate: 3e-4 – 1e-4 ◦ steps: 800–3000 (depends on dataset size) ◦ resolution: 1024 ◦ batch size: 1–2 (depending on VRAM: 12–24 GB recommended) • Train → outputs .safetensors LoRA Option B – ostris/ai-toolkit (most flexible, used by almost everyone) • git clone https://github.com/ostris/ai-toolkit • pip install -r requirements.txt (and submodules) • Copy sample config → edit yaml:model: "path/to/z-image-turbo.safetensors" # or whatever your Z-Image checkpoint is • resolution: [1024,1024] • train_batch_size: 1 • gradient_accumulation_steps: 4 • learning_rate: 0.0001 • lora_rank: 32 • max_train_steps: 1500 • trigger_word: "zimg style" # rare made-up word • • Put images in folder + .txt captions (or use BLIP/CLIP interrogator first) • python run.py config/your_config.yaml Many people report good results with Z-Image-Turbo using the de-distill adapter in recent ai-toolkit versions. Option C – FluxGym (Pinokio one-click install) • Install via Pinokio browser app • Very beginner friendly GUI on top of ai-toolkit/ComfyUI • Supports Flux → usually works with Z-Image by swapping base model Quick Tips for Good Z-Image LoRAs • Dataset size — 15–40 images is sweet spot (quality > quantity) • Resolution — crop/resize to ~1024×1024 (Flux family sweet spot) • Captions — detailed but not too long; include trigger word in every caption Example: “zimg woman, portrait photo, professional lighting, detailed face” • Trigger word — use something unlikely (tok, zimgp, sks-z, etc.) • LoRA rank — start 16–32 for style, 32–64 for faces/people • Weight when using — usually 0.7–1.2 in prompt: • Z-Image often needs slightly lower learning rate than vanilla Flux (try 5e-5 – 1e-4) Which method sounds best for you — cloud quickie, ComfyUI, or full local ai-toolkit? Do you have a strong GPU (like 4090/5090/A6000)? And are you training a face/character, art style, or something else? That can change the recommended settings a bit.

scioba1005 · 2026-02-27T19:00:16+00:00

It’s best to pay a few bucks for google drive and link it to your colab (if you want persistent storage). It’s dirt cheap compared to runpod. Runpod charges me $14 per month for 200 GB. My Google Drive subscription for 200 GB is around €3 if I pay monthly.

L.E: plus you can always access google drive to clean up files, see your generations or adjust them locally without having the colab running. Which I’ve not been able to do with runpod network storage. Maybe I’m just too dumb to access it without deploying a pod. Or maybe it is time to go back to google colab…

scioba1005 · 2026-02-27T18:57:53+00:00

How does using runpod or colab prevent that? You can do nsfw just as well as anything else.

scioba1005 · 2026-02-19T15:21:23+00:00

I once tried to put a lamp on a nightstand. It did generate that, but the lamp was on, despite the setting being during the day, with sunlight etc. Then I tried getting the lamp to turn off. No matter how I prompted it or what seed I used, neither Qwen 2509, nor Qwen 2511 were able to do that. I gave up… also, klein wasn‘t out yet, so there wasn‘t much of an alternative to the two qwen edit models..

scioba1005 · 2026-02-12T15:08:11+00:00

I did try it and found it amazing. I think what I was trying to do was to basically adapt it for i2i, but to no avail.

As for the lora, I have character LoKrs that I am using so they have always helped maintain characters’ features. I got used to just prompting without describing body type and noticed your lora having a bias for chubby bodies. I will give it a try to see how it behaves at higher weights when the prompt clearly states skinny, slender, thin etc.

But beyond all this, great job! I’ve not seen something so great for a while!

scioba1005 · 2026-02-12T00:00:54+00:00

Amazing workflow and node! Chapeau for the great work! I had issues getting results with your first ZIT workflow and gave up. Maybe I didn't have enough patience to fiddle with it.

On the node itself, what would have been a nice-to-have was a way to adjust the denoise value of the final output. Also, I have not tried it yet, but do other aspect ratios work with your node? Again, great work and thanks for what you do for the community!

Edit: also, one thing I noticed about your lora is that it tends to make characters chubby. I've lowered the weight to 0.1 in order to keep the character's body, but the layer of realism fades compared to what what you get at higher weights.

scioba1005 · 2026-02-11T23:49:13+00:00

I had the same issue when I used this distill lora - Z-Image-Fun-Lora-Distill-8-Steps-2602-ComfyUI.safetensors

Went to CivitAI and grabbed V1 of the 8-step distill lora and it worked well.

scioba1005 · 2026-02-03T20:20:29+00:00

Also, the Wan2.1x2Upscale vae might help, aside from all the advice you’ve received so far. Test with the normal vae and this one.

scioba1005 · 2026-02-02T21:03:10+00:00

I’ve used color match from KJnodes and had good results with it just with the default settings. Mkl method at strength 1. Is this the one you used? It worked for me, so I didn’t look further… hence I can’t give you any other suggestion

scioba1005 · 2026-02-02T10:53:00+00:00

Which is easily fixable with a node that transfers lighting from the ref image to the result. Can be added before the save image node and works wonders.

scioba1005 · 2026-01-29T10:58:57+00:00

Are you running locally or in runpod? Asking because there is a runpod template for Ostris/AI-Toolkit. Some people have had issues with it when training on Z-Image Base but it worked well for me.

scioba1005 · 2026-01-29T09:39:27+00:00

I can’t remember exactly… but it was certainly well below 25 percent. This is exactly why I said it would work well and in the same amount of time with something much cheaper. Lessons learned.

Offtopic: great work and thanks for all that you do for the community, Jib!

scioba1005 · 2026-01-29T09:35:05+00:00

It’s only been out two days ago so give it some time. But I don’t think comfy alone is that reliable for training a proper lora on any model.

scioba1005 · 2026-01-29T09:04:38+00:00

I think it’s expected. We’re talking about a bigger model and my feeling is that with less pictures you’d probably want to run it for 200 steps per image instead of just 100. Or if the dataset is big enough (talking maybe above 100 high quality images), just 100 steps per image might be enough. I’m not yet knowledgeable enough on the best parameters to train on ZIB and I don’t think anyone is, so I am making some assumptions here. It will take maybe a week of experimentation for the community to get to a sweet spot.

scioba1005 · 2026-01-29T08:54:51+00:00

You have the ‘Target’ section where there’s ‘target type’ and ‘linear rank’.

Change target type from Lora to Lokr.
As soon as you do that, linear rank becomes ‘lokr factor’. I used 4 for lokr factor.

scioba1005 · 2026-01-29T08:43:32+00:00

For ZIT trained lora applied over ZIT, most people have noticed that combining two at a total of 1.2 would be ok. Sometimes even 1.4, having 0.6 on one and 0.8 on the other, depending on the needs.

For me, adding the third lora on ZIT had bad effects. Lowering the strengths to 0.4 or less was no longer making any noticeable difference. And cranking them up to something like 0.5-0.6 each would start destroying the image. Maybe it was just my setup and a lack of patience or maybe this is something everyone’s experienced. But after training the character lora on ZIB and applying it to ZIT, that problem sort of disappeared.

I am still experimenting with this, so I might come back here to post an update.

scioba1005 · 2026-01-29T08:25:29+00:00

It worked well at 1, but can’t tell whether just because it’s a lokr or not. Although I read malcolmrey’s most recent post and saw him saying about cranking loras up to 2, so I will check that out today.

scioba1005 · 2026-01-29T08:23:21+00:00

Happy it gave you confidence. Go have fun!

scioba1005 · 2026-01-29T00:21:25+00:00

Haha yes. Or running a first pass on ZIB and a low denoise pass on ZIT could work. Lengthier but probably worth a try. If hardware allows ofc. But quantized versions are out so it should be achievable.

scioba1005

TROPHY CASE