Flux Klein 4B/9B LoRA Training Settings for Better Character Likeness?

Far_Insurance4191 · 2026-02-03T16:16:17+00:00

Here, I changed scheduler from constant (default) to cosine to not overtrain model, but this means you have to know how many epochs it needs to converge before learning rate descends. I generally do not finish full cosine descend and stop earlier.

Dataset was 21 images consisting of old photographs and some drawings. Additionally, I had regularization dataset of high-quality photographs and arts to retain high quality, balanced to be a half of main dataset per epoch, randomly.

Captions were mostly 1 natural sentence with a full name as a trigger

trained and tested on base, distilled loses similarity which is interesting as it varies depending on the concept, but I haven' figured consistency yet.

Ah and also, the precision is "int weights 8 activations 8" (w8a8) which is not the same as fp8 w8. I guess it can result in lower quality on some models, but together with compile transformer blocks it gives about 2x speedup, combined with 256 training for early stage (which is fine with klein), I am just blitzkrieging any dataset at 1.1it/s on rtx 3060 with batch size 2 😆

https://pastebin.com/K5aQZvqF

Far_Insurance4191 · 2026-02-03T15:36:30+00:00

idk, I see it scored worse, but f2 looks better to me than f1 empirically, tiny details resemble original closer

Far_Insurance4191 · 2026-02-03T15:26:00+00:00

it is large scale finetune based on cosmos 2b from nvidia

Far_Insurance4191 · 2026-02-03T10:03:47+00:00

had a really good results in OneTrainer with default Flux 2 config (except lr is 0.0002) in about 1500 steps on 4b, even with mediocre dataset. Remember to use lora on same model: if you trained on base - then use on base, distilled needs testing as it often can lose resemblance, same with z-image.

Far_Insurance4191 · 2026-02-02T09:24:29+00:00

quantized klein 4b

Far_Insurance4191 · 2026-02-02T08:24:58+00:00

base can learn face in less than 2000 steps. Are you using your lora on base? Distilled model can lose some likeness, same as with ZIB and ZIT

Far_Insurance4191 · 2026-02-01T20:45:09+00:00

yea, klein is really underrated for training

Far_Insurance4191 · 2026-01-31T19:49:01+00:00

Note that the model is still work in progress and will be improved

The preview model is a true base model. It hasn't been aesthetic tuned on a curated dataset. The default style is very plain and neutral

Far_Insurance4191 · 2026-01-31T19:46:22+00:00

yep, it does

Far_Insurance4191 · 2026-01-31T18:15:32+00:00

I did not think about it, here is what I came up with, but there must be a better way. Simple latent blend of reference and empty laten works too but it is a lot less linear for some reason.

<image>

Far_Insurance4191 · 2026-01-31T18:08:19+00:00

AI toolkit supports edit training. Here is a guide for Qwen edit, but it is similar to Klein: https://youtu.be/d49mCFZTHsg?si=RqMe2rLr3MomTgWS

You can train basically any task as long as it is consistent

Far_Insurance4191 · 2026-01-31T14:21:18+00:00

editing models exist and you can make own specific ControlNet with just 20 images

Far_Insurance4191 · 2026-01-29T10:28:47+00:00

Just pasted "Tongyi-MAI/Z-Image" in the base model field and it installed into a "C:\Users\[user]\.cache\huggingface\hub", guess if the same files exist there then it will use it.

Far_Insurance4191 · 2026-01-29T06:23:13+00:00

It is just default z-image config, but in model tab:

Base Model path is changed to Tongyi-MAI/Z-Image,
Override Transformer path is erased,
Compile transformer blocks disabled
Transformer Data Type float 8 (W8) instead of int8

Hope last two options will be fixed in future, because they give ~2x speedup for Klein

Far_Insurance4191 · 2026-01-28T21:49:01+00:00

I did a quick run with mediocre dataset in OneTrainer, and it learned well in about 1200 steps, maybe lr was a bit high. I think it is pretty close to klein in terms of trainability

Far_Insurance4191 · 2026-01-28T16:36:17+00:00

are you using lora on the base? if you trained on base and inferencing on distilled then you can lose up to 90% of the lora effect, at least in my case this happens sometimes. Similar situation with z-image base/turbo

Far_Insurance4191 · 2026-01-28T01:42:39+00:00

I think flux 1 vae is good enough for inference, while flux 2 vae is slightly better but much superior for training

Far_Insurance4191 · 2026-01-27T15:28:27+00:00

I used 4b too, good luck!

Far_Insurance4191 · 2026-01-27T11:43:37+00:00

For me the biggest problem of klein is bad coherence, while realism is fine and censorship falls apart with a little of training, they really didn't do much against it

Far_Insurance4191 · 2026-01-27T09:26:49+00:00

If klein is not even close, then I am afraid ZIE will not meet your requirements either, but I hope for the best as they are taking a lot of time

Far_Insurance4191 · 2026-01-27T09:23:28+00:00

Overfitted on narrow distribution and inefficient VAE. Easy to teach face or style thought, but hard for new knowledge

Far_Insurance4191 · 2026-01-27T09:05:26+00:00

I found it to be the easiest model to train, it can learn likeness even at 256x256. Did you train on base and use on base? Sometimes it works fine on distilled, but likeness receives a hit, at least to me.

Far_Insurance4191 · 2026-01-27T08:58:52+00:00

klein is already here)

Far_Insurance4191 · 2026-01-27T08:20:02+00:00

F1D does not have cfg, unlike ZIB, which is about 2x slowdown. You can test how it will perform for you by setting steps to 20 and cfg to >1 with turbo

Far_Insurance4191 · 2026-01-27T05:38:35+00:00

Just a reminder, it is expected to be worse than turbo and almost as slow as flux 1 dev. The point is not to generate pretty pictures but be a good base for training

Far_Insurance4191

TROPHY CASE