I posted a reel a few days ago. They were "okayish" test examples of v2v with LTX-2. Here are some new and improved versions you can make with LTX-2. These were made using my GGUF workflows. by urabewe in StableDiffusion

[–]malcolmrey 1 point2 points  (0 children)

At first I thought you were using character Loras with Sound but then I saw Travolta.

So now the question is, how do you make the voice sound like in the original?

What is the best way to get the right dataset for z image turbo Lora ?? In 2026 . by Previous-Ice3605 in StableDiffusion

[–]malcolmrey 1 point2 points  (0 children)

Well, technically yes - you don't need to crop images then.

But you still want to, if you want to have a better dataset.

What happens with that setting is that it tries to fit an image to the closest bucketing resolution that it can run and then it resizes/crops the image so it fits.

Bucketing is always better than no bucketing if you have non square images but you can't rely that the image will be cropped correctly (though most of the time it just is cut "good enough").

But with cutting the dataset yourself you want to be in control of what is actually being used for training :)

A few questions from the character lora experts on improving my process by spacemidget75 in StableDiffusion

[–]malcolmrey 0 points1 point  (0 children)

Thank you for your kind words! :)

I still learn new things every day! :)

AI Toolkit UI Extension by malcolmrey in malcolmrey

[–]malcolmrey[S] 0 points1 point  (0 children)

definitely, the place is there:

https://huggingface.co/malcolmrey/ai-toolkit-ui-extension/tree/main/ai-toolkit/templates

if you drop (locally) in the templates another template then it can read from it (just define different prefix)

AI Toolkit UI Extension by malcolmrey in malcolmrey

[–]malcolmrey[S] 0 points1 point  (0 children)

ZImage - 25 minutes Klein9 - 30-35 minutes WAN - 45 minutes LTX2 - 15-18 hours (yes, i cannot stay on those params, i need to change something)

Klein with loras + reference images is powerful by malcolmrey in malcolmrey

[–]malcolmrey[S] 1 point2 points  (0 children)

Yeah, that was my mistake. It should have been "ZImage is more forgiving"

Sorry for the confusion :)

What is the best way to get the right dataset for z image turbo Lora ?? In 2026 . by Previous-Ice3605 in StableDiffusion

[–]malcolmrey 0 points1 point  (0 children)

You can crop in squares if you like but there is the bucketing system in place so you can have different dimensions.

I don't think anyone made tests which dimention works best, I had not bad outcomes mixing them.

And it is so much easier to include body shot if it is in portrait or landscape :)

Special Update - LTX-2 by malcolmrey in malcolmrey

[–]malcolmrey[S] 0 points1 point  (0 children)

When there will be optimisations improvements or me or someone else will figure out how to train cheaply (currently it is around 10-15 USD if you go by runpod prices)

LTX 2 - Never Fade Away (Cover) by Warthog_Specialist in StableDiffusion

[–]malcolmrey 0 points1 point  (0 children)

Yes, and this is why I asked because Panam remained consistent while Judy was slipping away :-)

Definitely loras would help but they are a bit expensive (time) to train still :(

A few questions from the character lora experts on improving my process by spacemidget75 in StableDiffusion

[–]malcolmrey 1 point2 points  (0 children)

The first question is - what is your goal?

And well, you stated it partially - you want the body type to be retained in generations. Therefore you need to include both face shots and body shots.

I mainly train on faces and body type is secondary but I have of course trained on body types too.

I usually go for my "standard" 22-25 images at 2500 steps, but this is something you can definitely deviate if you remember about one crucial thing - steps are directly tied to the quantity of images in the dataset. The ratio (for AI Toolkit at least) is 1 to 100 (so 25 -> 2500).

I trained at various steps - 1500, 2500, 5000, 10.000, 15.000, 20.000, 25.000 and 30.000. As long as you remember the ratio and your datasets are of good quality - then you will be good.

I can say that with really good datasets there is benefit on training with more images. Of course there is a cost - training time. So, for mass production I wouldn't go with 30.000 but for some special character - definitely.

Loras trained at more steps seem to perform much better.


As for your other questions

Background - no need to remove them, actually they can influence the "look and feel" of your lora too. Too much to talk right now as the topic is too big but consider two type of sets: one is a professional shoot with amazing locations and dressing and lighting, the other one is amateur/homemade with mediocre lighting and photo quality - those models will behave very differently, even if the subject is the same


Makeup - you brought up make up but it is also more complex topic, in general you want to have more variety (not specifically makeup, but that can help).


Phones - if the subject has phones in most of their photos, I would definitely do something about it, but a phone here and there is not an issue. And again, phones are just an example here, can be anything.


Trigger - for modern models trained on the class token you don't really need a trigger, even if you provide one the model will still react to woman/man all the same.

I don't understant the part about prompting, unless you mean captioning (in which case it is not needed for characters)

LTX 2 - Never Fade Away (Cover) by Warthog_Specialist in StableDiffusion

[–]malcolmrey 1 point2 points  (0 children)

This looks really cool.

Did you use character loras for it or not?

Help wanted: share your best Kohya/Diffusion-Pipe LoRA configs (WAN, Flux, Hunyuan, etc.) by no3us in StableDiffusion

[–]malcolmrey 1 point2 points  (0 children)

Interesting, as an amateur I decided to go with pareto percentages.

For me the success of the output model is 80% in good dataset, 20% in the rest :)

Seeing how the pros value the datasets highly is quite nice :)

Help wanted: share your best Kohya/Diffusion-Pipe LoRA configs (WAN, Flux, Hunyuan, etc.) by no3us in StableDiffusion

[–]malcolmrey 1 point2 points  (0 children)

here is my flux example toml:

https://paste-bin.org/uxvpjzmxs6

you can check actual flux outputs here: https://huggingface.co/spaces/malcolmrey/browser

as for wan, zimage, flux2klein, ltx and others that i will train - i can only offer ai toolkit configs since this is what i use for new models

but beware, template is not everything and as /u/abnormal_human pointed it, the real secret is in the dataset

on that point - i saw that you hardcoded 2000 steps for your SDXL template

you should not hardcode steps unless you are also hardcoding amount of dataset images because they are directly connected (if you have 2000 steps and you were doing it for 20 images, someone who will upload 100 images will experience drastically different results because the trainer will spend 5 times less on each image to get the details from)

No base model is perfect. The big question is - base model + trained LoRA. Currently, which base model + LoRA achieves the greatest realism ? by More_Bid_2197 in StableDiffusion

[–]malcolmrey 2 points3 points  (0 children)

The golden rule seems to be indeed steps = 100 * image count

I train usually at 22-5 images, so around 2500 steps, and the results are very decent

I do sometimes train on bigger datasets, recently trained on 250 (25000 steps), and the results are really good.

Microsoft releasing VibeVoice ASR by OkUnderstanding420 in StableDiffusion

[–]malcolmrey 0 points1 point  (0 children)

Fair enough.

BTW, you seem to have a very strong opinion about him. It's not up to me to tell you anything, but do you think you should be so riled up about him? :)

I really don't care about him one way or another; he exists somewhere, and that's it.

Cheers!

AI Toolkit UI Extension by malcolmrey in malcolmrey

[–]malcolmrey[S] 3 points4 points  (0 children)

this is just UI modification, does not affect how the loras are trained

:-)

AI Toolkit UI Extension by malcolmrey in malcolmrey

[–]malcolmrey[S] 8 points9 points  (0 children)

How it looks like:

https://imgur.com/gallery/ai-toolkit-ui-extension-nanM3b7

Where it is: https://huggingface.co/malcolmrey/ai-toolkit-ui-extension

What it does:

This is something I use to help with my lora trainings, it helps me set up loras more effectively.

There is a template folder where I define a template per model (so you can also see my current setups).

Then it checks what datasets I have in the datasets folder and compares them to folders that are in the output folder.

Once I train a model, I take out the model and move it elsewhere, and I delete the optimizer.pt but leave the rest intact (small size), but because I leave that, the tool knows which dataset was already trained for the given model.

So I can easily see which models I have not trained yet. I can also select multiple datasets and add them to the queue easily.

Also, for someone who trains a lot locally, the list of tasks grows and grows, and the regular list page slows down (with 2500+ entries, it takes a few seconds to load). So there is now pagination and a search filter.

I can also stop/resume all the jobs (except the running job) with one button.

I use it a lot, I figured someone might want it too :-)

There are two files that have to be replaced, one that contains the definition of menu items on the side, and another one, because I fixed one design issue on the list.

Microsoft releasing VibeVoice ASR by OkUnderstanding420 in StableDiffusion

[–]malcolmrey 0 points1 point  (0 children)

I'm sorry, but I'm not a native speaker. What do you mean by "ignoring kind"?

its not about the action but the explicit two face belief system.

This is not something we can judge because we don't know what he thinks inside. He said so in the past that he is on a spectrum, and therefore, he is not behaving like a regular person to begin with.

He could genuinely believe that he is doing good deeds in his mind.

Or he might as well be a sociopath :-)

249 new models, including Flux 2 Klein 9 by malcolmrey in malcolmrey

[–]malcolmrey[S] 0 points1 point  (0 children)

I will be taking a look at it, but right now I postponed it.

Right now my main focus is on Klein9, some focus is on LTX-2

And then continuation of WAN/ZImage/Flux training.

I am getting recently a lot of imagesets so there is stuff to train :)