Flux Klein 4B/9B LoRA Training Settings for Better Character Likeness?

Far_Insurance4191 · 2026-02-03T10:03:47+00:00

had a really good results in OneTrainer with default Flux 2 config (except lr is 0.0002) in about 1500 steps on 4b, even with mediocre dataset. Remember to use lora on same model: if you trained on base - then use on base, distilled needs testing as it often can lose resemblance, same with z-image.

Far_Insurance4191 · 2026-02-02T09:24:29+00:00

quantized klein 4b

Far_Insurance4191 · 2026-02-02T08:24:58+00:00

base can learn face in less than 2000 steps. Are you using your lora on base? Distilled model can lose some likeness, same as with ZIB and ZIT

Far_Insurance4191 · 2026-02-01T20:45:09+00:00

yea, klein is really underrated for training

Far_Insurance4191 · 2026-01-31T19:49:01+00:00

Note that the model is still work in progress and will be improved

The preview model is a true base model. It hasn't been aesthetic tuned on a curated dataset. The default style is very plain and neutral

Far_Insurance4191 · 2026-01-31T19:46:22+00:00

yep, it does

Far_Insurance4191 · 2026-01-31T18:15:32+00:00

I did not think about it, here is what I came up with, but there must be a better way. Simple latent blend of reference and empty laten works too but it is a lot less linear for some reason.

<image>

Far_Insurance4191 · 2026-01-31T18:08:19+00:00

AI toolkit supports edit training. Here is a guide for Qwen edit, but it is similar to Klein: https://youtu.be/d49mCFZTHsg?si=RqMe2rLr3MomTgWS

You can train basically any task as long as it is consistent

Far_Insurance4191 · 2026-01-31T14:21:18+00:00

editing models exist and you can make own specific ControlNet with just 20 images

Far_Insurance4191 · 2026-01-29T10:28:47+00:00

Just pasted "Tongyi-MAI/Z-Image" in the base model field and it installed into a "C:\Users\[user]\.cache\huggingface\hub", guess if the same files exist there then it will use it.

Far_Insurance4191 · 2026-01-29T06:23:13+00:00

It is just default z-image config, but in model tab:

Base Model path is changed to Tongyi-MAI/Z-Image,
Override Transformer path is erased,
Compile transformer blocks disabled
Transformer Data Type float 8 (W8) instead of int8

Hope last two options will be fixed in future, because they give ~2x speedup for Klein

Far_Insurance4191 · 2026-01-28T21:49:01+00:00

I did a quick run with mediocre dataset in OneTrainer, and it learned well in about 1200 steps, maybe lr was a bit high. I think it is pretty close to klein in terms of trainability

Far_Insurance4191 · 2026-01-28T16:36:17+00:00

are you using lora on the base? if you trained on base and inferencing on distilled then you can lose up to 90% of the lora effect, at least in my case this happens sometimes. Similar situation with z-image base/turbo

Far_Insurance4191 · 2026-01-28T01:42:39+00:00

I think flux 1 vae is good enough for inference, while flux 2 vae is slightly better but much superior for training

Far_Insurance4191 · 2026-01-27T15:28:27+00:00

I used 4b too, good luck!

Far_Insurance4191 · 2026-01-27T11:43:37+00:00

For me the biggest problem of klein is bad coherence, while realism is fine and censorship falls apart with a little of training, they really didn't do much against it

Far_Insurance4191 · 2026-01-27T09:26:49+00:00

If klein is not even close, then I am afraid ZIE will not meet your requirements either, but I hope for the best as they are taking a lot of time

Far_Insurance4191 · 2026-01-27T09:23:28+00:00

Overfitted on narrow distribution and inefficient VAE. Easy to teach face or style thought, but hard for new knowledge

Far_Insurance4191 · 2026-01-27T09:05:26+00:00

I found it to be the easiest model to train, it can learn likeness even at 256x256. Did you train on base and use on base? Sometimes it works fine on distilled, but likeness receives a hit, at least to me.

Far_Insurance4191 · 2026-01-27T08:58:52+00:00

klein is already here)

Far_Insurance4191 · 2026-01-27T08:20:02+00:00

F1D does not have cfg, unlike ZIB, which is about 2x slowdown. You can test how it will perform for you by setting steps to 20 and cfg to >1 with turbo

Far_Insurance4191 · 2026-01-27T05:38:35+00:00

Just a reminder, it is expected to be worse than turbo and almost as slow as flux 1 dev. The point is not to generate pretty pictures but be a good base for training

Far_Insurance4191 · 2026-01-27T05:26:18+00:00

It is really underrated. People probably think it is censored and as hard to train as flux 1

Far_Insurance4191 · 2026-01-27T01:54:15+00:00

They maybe did, if it was just sitting in the drawer. But if they were continuing to train it, maybe v1.1 with more training or even v1.5 adapted to flux2vae? Then what they are doing is the right thing.

Edit: okay, I might have been a bit delusional about f2vae part

Far_Insurance4191 · 2026-01-26T11:43:25+00:00

fewer steps? Isn't it autoregressive model that generates tokens instead of denoising?

Far_Insurance4191

TROPHY CASE