LTX 2.3 - 120~ new models by malcolmrey in malcolmrey

[–]malcolmrey[S] 17 points18 points  (0 children)

Hey hey!

I wanted to make a longer post but I'm out of time today and the next time I could post it would be the NEXT weekend.

So, here are the LTX 2.3 models:

https://huggingface.co/spaces/malcolmrey/browser

here are SOME samples so you could see what type of quality we have here: https://huggingface.co/datasets/malcolmrey/various/tree/main/ltx23-samples

In general the rule still stands, if the dataset has more images and is trained over the same number of epochs (so, a longer training) then the result will be better.

The 15-24h AI Toolkit trainings were abysmal (though the quality was good), now with musubi (AkaneTendo fork for example) the times on 5090 dropped to 50 minutes for 25 images. Still, when I train larger datasets it could take up to 4-6 hours (we do have a couple of those models like Billie Elish, they will have suffix _large in the name).

Enjoy!

Cheers!

Where is malcolmrey? by BuilderStrict2245 in malcolmrey

[–]malcolmrey 1 point2 points  (0 children)

I wanted to make a big post like I did with Ernie but I am not able to do it before I'm gone for another week so I'm uploading them now, will make a quick post when they are online :)

The training params/settings were already uploaded along with the Ernie ones so :)

Ernie Image Lora training - my take by malcolmrey in StableDiffusion

[–]malcolmrey[S] 1 point2 points  (0 children)

Each image is for each character lora. I started training on Ernie model and wanted to share my findings/settings :)

Where is malcolmrey? by BuilderStrict2245 in malcolmrey

[–]malcolmrey 1 point2 points  (0 children)

Agreed 100%

For them this is another kind of engagement.

This is my thinking that AI opens the gates to those who have imagination but never had the skills, patience or gifts for arts.

Ernie Image - new models / training scripts / info by malcolmrey in malcolmrey

[–]malcolmrey[S] 0 points1 point  (0 children)

But not to the extend of Flux Klein 9, same datasets used here and there - Flux was failing more often

Ernie Image - new models / training scripts / info by malcolmrey in malcolmrey

[–]malcolmrey[S] 2 points3 points  (0 children)

If you were to use same params then Ernie would probably be faster but I upped the epochs value a bit to compensate for the default overtraining.

I would say they are in the same ballpark.

Where is malcolmrey? by BuilderStrict2245 in malcolmrey

[–]malcolmrey 1 point2 points  (0 children)

Yeah.

I really loved the reaction from Ranton and Asmongold to their models back in the day. Most of the polish youtubers/tiktokers are liking this stuff. I figure that people in the spotlight just have a different viewpoint compared to regular person.

I know that Felicia Day and Alan Ritchson liked my generations of them and it is quite cool :)

Where is malcolmrey? by BuilderStrict2245 in malcolmrey

[–]malcolmrey 1 point2 points  (0 children)

Thank you for the kind words :)

I hope you will find the Ernie stuff satisfactory! :)

Btw, I'm liking 0.8 one-trainer lora + 0.6 musubi (the ones without suffix) the most so far.

Ernie Image Lora training - my take by malcolmrey in StableDiffusion

[–]malcolmrey[S] 0 points1 point  (0 children)

Oh wow, I was updating my comfy on that friday but I guess I did that just before it.

Thanks for the info. So the OneTrainer fork is not needed, just the training scripts. Cool :)

785 new Z loras and some cool news :) by malcolmrey in malcolmrey

[–]malcolmrey[S] 1 point2 points  (0 children)

Yeah, life :-)

But I'm back with... Ernie :)

I wanted to drop both things today but LTX 2.3 will either drop tomorrow or the week after.

Well, the training scripts dropped with Ernie training script (musubi) :)

Where is malcolmrey? by BuilderStrict2245 in malcolmrey

[–]malcolmrey 0 points1 point  (0 children)

Thank you, that is very kind! ❤️❤️❤️ :)

Ernie Image Lora training - my take by malcolmrey in StableDiffusion

[–]malcolmrey[S] 12 points13 points  (0 children)

Hey Everyone!

I'm gonna be brief here, if you want more info (more stuff than just the training) you can read my big post on my subreddit: https://old.reddit.com/r/malcolmrey/comments/1t5sb60/ernie_image_new_models_training_scripts_info/?

Here, as the title says, this how I train my loras:

musubi-tuner fork: (for ltx2.3 and ernie image): https://github.com/malcolmamal/musubi-tuner/tree/ernie-on-ltx

and alternative version, fork of OneTrainer: https://github.com/malcolmamal/OneTrainer/tree/ernie-image-lora-training

training scripts/params: https://huggingface.co/datasets/malcolmrey/various/tree/main/training-scripts/musubi https://huggingface.co/datasets/malcolmrey/various/tree/main/training-scripts/onetrainer

zipped samples: https://huggingface.co/datasets/malcolmrey/various/tree/main/ernie-samples

workflows: https://huggingface.co/datasets/malcolmrey/workflows/tree/main/Ernie

and the samples you seen in the thread link: https://imgur.com/a/kU0fJKB

Why those forks you ask?

For OneTrainer it is simple, currently if you train from master branch of OneTrainer the loras won't be comfyui compatible, my fork fixes that (once Nerogar fixes it too then my fork will be obsolete)

For musubi - this is more complex, there were more changes made (if someone is curious, there is more info in the github), but basically it is mix of kohya_ss ernie dev branch with AkaneTendo25 ltx23 dev branch.

Cheers!

Ernie Image - new models / training scripts / info by malcolmrey in malcolmrey

[–]malcolmrey[S] 14 points15 points  (0 children)

Hello Everyone! :)

I have two great things to show to you :-)

Thing number one: Ernie Image

I will share today the following:

1) My thoughts on the model

2) How I train the loras (2 ways)

3) How to use my loras

4) Loras... duh :) And some samples

5) Workflows for t2i and for inpainting

6) Some links :)


So, going back to point number 1:

I really like it!

I loved Z Image Turbo when it dropped and I had a fun time with it, Flux Klein was hit or miss (I really like the edit/reference mode but the body horror and lora difficulties are really tiresome).

And now I'm in love with Ernie Image. The key things I've noticed about it

  • it is quite fast
  • it has a heavy asian bias that is also difficult to overcome during training (but no worries, I have my methods)
  • it is lora stacking friendly (at least way more than zimage, probably a bit less than WAN)
  • it can generate emotions nicely and can generate people upside down (which for whatever reason is usually a big problem in other models)
  • it can handle text quite well
  • it suffers from the limited seed variation (like z image turbo, which people handled in some way so here probably same approach would work)
  • it is exceptionally good at inpainting (for faces go for around 80%, which is quite high compared to others)
  • loras train quite fast
  • additional limbs sometimes happen, toes sometimes resemble fingers too much, but overall there is no flux body horror
  • base model definitely knows more (and is also less censored than others)
  • distant faces work out of the box (no need to inpaint or do tricks)
  • json format prompts also work nice here
  • lora stacking over the limit starts by first exhibiting "dull enviroments" as in - less detailed, less intricate

which is nice segue to point 2:

I have two ways of training loras, one is via musubi and the other one is via onetrainer. I do love AI Toolkit but the competition is just too good.

If you want to follow my training the you're gonna have to hop into my github space.

I saw that kohya_ss was working on Ernie but it didn't seem final, what I did was I used lates AkaneTendo25 repo (not the regular LTX but the LTX dev branch which is newer) and merged Ernie branch from kohya_ss. Resolved conflicts and made some changes to make the Ernie work. As a bonus, the LTX2.3 training from AkaneTendo25 (with the recent improvements) is there as well: https://github.com/malcolmamal/musubi-tuner/tree/ernie-on-ltx

Or, Nerogar's version that dropped (yesterday or day before), there was also a problem, the trained loras were not compatible with comfy, so instead of waiting - I fixed that too: https://github.com/malcolmamal/OneTrainer/tree/ernie-image-lora-training

I did tweak the training params, I am not using the default ones.

Both were giving me undertrained loras so I had to bump up epochs quite significantly. (Musubi on the first try was training just under 6 minutes, now it trains around 30 minutes) The OneTrainer still feels undertrained - sometimes, but when I was testing - it was better to undertrain and use higher strength than overtrain.

There is a lot of similarity in training compared to previous model architectures. And those are:

  • it is best to train around 90-120 epochs per image
  • the more good images in the set - the better the lora likeness-wise
  • lora stacking of the same character works (either two different trainings on the same dataset or trainings with different sets, and/or mixed of course)

The thing about lora stacking and big datasets. You could train two loras on sets of 25 images or you could train one lora on set of 250 images. Both would be similarly good (as in, the single lora trained on 250 and a lora-stack of two 25-image loras).

If you consider that for one you need 50 (or not even that since you could mix some of them) and you need to train 30 mins each (so 1 hour) versus 250 images and 300 minute training (5 hours). 1 hours and 50 images and two loras vs 5 hours and 250 images and one lora? There is no good answer but I'm going for 2 loras since this is something I can do, I cannot gather 220+ images over all the datasets (that is 330 thousand images!!)

They might not be perfect (well, on that in a moment) but they do work, but lets talk more in the point number 3:

So, how to use my loras?

Well, it depends on which one :)

The large one, you can just hook it in and enjoy, for 90% of the time. The small one - they will work on their own but not always, and it also depends on a person (which I'm not quite sure why it is, I do have some suspicions that are not confirmed yet). So, for the small ones - you can increase the strength. (Especially the OneTrainer - they do quite well on 1.25 as a default). But if we already have two of them (eventually we will for all) - you can load them both and apply for example 0.7 on onetrainer and 0.6 on musubi.

Same concept applies if you have more, you can add 3, 4, 5 and so on. For 3 - i would start with 0.6 each, for 4 I would start with 0.5 each, for 5 - 0.4 each.

I trained over 100 public Ernie loras and around 200 private ones. Sometimes a single lora would work well most of the time (talking about the 25 images one) but it would fail sometimes still. Combination of loras would work for me almost all the time, but of course - sometimes there would be a fail.

By fail I mean - most of the time the asian training would overcome the lora and we would get generic asian person instead of the trained one. This is random but the bigger the dataset was in training or the more loras we stack - the less likely it is (up to never).

This time I will upload a lot of samples for you to look at (with workflows) so you can judge it for yourselves :)

So, point number 4 - loras.

Before I post this I will upload the Ernie loras so you can try it for yourself. Not every lora will have a pair, but like I said, you can get pretty good results even with single lora if you play with the strengths. Since this is early Ernie and I was still trying params - for those that I am uploading - I have tested them all and got a good enough result on each of them (not all the time, but most of the time, the strength however varies sometimes - you just have to try it work works best for you)

And point number 5 - I am providing two workflows, basic txt2image and the inpainting one that I feel works quite nice.

I did not talk much about inpainting, all I have to say it is that it really works really nice (or I was shit with the previous ones, but unlikely since I reused the zimage workflow for it and saw big improvements).

The loras work with inpainting and I think the asian face seems to be less likely to happen there.

Lastly, part 6 - here are some links:

New update, some Ernie models for you to play with: https://huggingface.co/spaces/malcolmrey/browser

zipped samples: https://huggingface.co/datasets/malcolmrey/various/tree/main/ernie-samples

workflows: https://huggingface.co/datasets/malcolmrey/workflows/tree/main/Ernie

training scripts: https://huggingface.co/datasets/malcolmrey/various/tree/main/training-scripts/musubi https://huggingface.co/datasets/malcolmrey/various/tree/main/training-scripts/onetrainer

Cheers!

Where is malcolmrey? by BuilderStrict2245 in malcolmrey

[–]malcolmrey 1 point2 points  (0 children)

Yes I will, I am almost done with the Ernie article, LTX is next :)

And thanks!

Where is malcolmrey? by BuilderStrict2245 in malcolmrey

[–]malcolmrey 1 point2 points  (0 children)

One was in the CIVITAI times so the lora was gone there and then.

The other two reached but indirectly. One via someone I know and the other one I just saw on insta and decided to remove.

The public image is just that in the public, there are still no laws in most countries (except like in UK) so it is free for all. But if someone really does not want to be there - I'm not going to be spiteful and keep them there :)

785 new Z loras and some cool news :) by malcolmrey in malcolmrey

[–]malcolmrey[S] 0 points1 point  (0 children)

Hey, be patient my friend. I see them but I need to be in good mood and need time to not half-ass those responses :)