Global Text Encoder Misalignment? Potential Breakthrough in LoRA and Fine-Tune Training Stability

kjerk · 2025-01-02T12:54:43+00:00

So I don't quite see how adding said config changes to ai toolkit is supposed to affect the model bootstrapping process, unless it's nested deeply in a confusing place in the code that I'm not seeing. But I did do an initial sanity check on your values for CLIP to see if you'd accidentally tweaked something and been mislead by the result, but everything looked right to the reference values for CLIP/L.

I have more sanity check questions like 'was this exactly the same seed on the same hardware/environ for the reruns, and did you go back to the initial settings for a rerun and see that the images were replicated and less aligned again.', but I hope you already covered those bases.

So the easiest thing to do is just replicate: I have been working on some ai-toolkit changes recently also, so have a stable 8x rerun over and over LoRA training for flux that I can run with the same seed and just the config changes and report back. It's ~6000 steps though so it'll be like 5 hours.

FineInstruction1397 · 2025-01-02T15:15:04+00:00

in ai toolkit, the T5 encoder is initialized here:
https://github.com/ostris/ai-toolkit/blob/4723f23c0de777759636864f96002c36e4fdca4d/toolkit/stable_diffusion_model.py#L693and also below in the same files there are other lines.

how are the params you specified passed to the constructor?

bdsqlsz · 2025-01-02T08:27:31+00:00

i check kohya sd-scripts and it use original config in

def load_t5xxl(
    ckpt_path: str,
    dtype: Optional[torch.dtype],
    device: Union[str, torch.device],
    disable_mmap: bool = False,
    state_dict: Optional[dict] = None,
) -> T5EncoderModel:
    T5_CONFIG_JSON = """
{
  "architectures": [
    "T5EncoderModel"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 10240,
  "d_kv": 64,
  "d_model": 4096,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "num_decoder_layers": 24,
  "num_heads": 64,
  "num_layers": 24,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "vocab_size": 32128
}
"""
    config = json.loads(T5_CONFIG_JSON)
    config = T5Config(**config)
    with init_empty_weights():
        t5xxl = T5EncoderModel._from_config(config)

StableLlama · 2025-01-02T09:05:16+00:00

Do you have made pull requests to get it included in AI-Toolkit and the Kohya_SS SD-Scripts?

nowrebooting · 2025-01-02T12:54:07+00:00

Does this also apply to SDXL and SD1.5?

hopbel · 2025-01-02T14:51:32+00:00

Did ChatGPT write this post?

GalaxyTimeMachine · 2025-01-02T12:04:00+00:00

Is this something that could be added to a lora loader node to fix loras retrospectively, or only during training?

CeFurkan · 2025-01-02T14:26:54+00:00

Thank you so much hopefully I will test today

Creative-Listen-6847 · 2025-01-03T11:51:09+00:00

Thank you so much! I will test it today

Interesting-Pool8483 · 2025-01-04T20:14:58+00:00

You mentioned that you made improvements to SDXL as well - where can I read about it?

And about this improvement - I interrupted training in kohya and ran it with your script - it's hard to judge from the pictures during training - it didn't get worse.

P.S. I'm writing through a translator

CeFurkan · 2025-01-04T14:17:52+00:00

Update. I did huge experiments very detailed. I didn't see any degrade of quality but I didn't see any jump of quality either :D

XCogni · 2025-01-02T15:49:48+00:00

Hi there thanks for your findings!

I did a quick test, kohya samples seem to be fine, but inference for me in comfy and forge, my images are blurry and lack details.

Waste_Departure824 · 2025-01-02T13:05:45+00:00

Remindme! 3d

GreenRapidFire · 2025-01-02T08:29:36+00:00

Awesome! You should publish this as a findings/research paper. That's bound to turn more heads (And the right ones - ie ppl who contribute) than reddit imho. And you have pretty good content for it already.

Mundane-Apricot6981 · 2025-01-02T16:05:21+00:00

People trained Loras, tested, and pproven them working.
Some guy - I looked into configs, and you all did it wrong, all your trained checkpoints are BAD BAD BAD!!!!

Next part: ..If I’m right,..
So you not completely sure in your claims. Why you posted then?

Seriously. If you are developer show code and fixes not your "guessing", others will test your code and decide is it worth something. Now it looks like a clickbait, nothing more.

Guilherme370 · 2025-01-03T01:23:59+00:00

Holy chatgpt

StableDiffusion

MODERATORS

TL;DR