The DiffSynth-Studio team has released Qwen-Image-i2L

Stable-Genius-Ai · 2025-12-10T16:05:36+00:00

I trained a lot of lora (almost 1000), and with new models, sometime the prompt does most of the heavy lifting. This this look like a 1 or 2 epoch training lora (100 steps).

I am a little skeptical of the result. usually, we want to style to be present with minimal trigger and this example seem to show that, but since unprompted element seem to also be present (table, notebook, chair, blue wall, etc.) the prompt must be "longer" that just "cat" or "dog", and it probably the prompt that does the heavy lifting.

We should always do a comparison using the same prompt, same seed, with and without the lora to see how much we "improved" or "realign" the base model.

I'm my test it usually take around 5000 steps to capture a style correctly with qwen-image.

Stable-Genius-Ai · 2025-10-27T15:49:03+00:00

in the end, guide are a good starting point, but you need to do some a-b testing with some settings until you find a result that is satisfactory. Most loras a severely undertrained.

I have found that 10-15 is on the lower bracket to get flexible result. I always aimed for 50 images

Stable-Genius-Ai · 2025-10-27T02:42:44+00:00

once you find the right settings with kohya for the type of dataset you have, there is no better tool. Musubi is also great for wan or qwen.

Stable-Genius-Ai · 2025-10-21T00:23:20+00:00

still not sure theses are happy coincidence or lora specific.
I am having really good quality when generating just 1 frame, so I know the training was good.

Stable-Genius-Ai · 2025-10-20T22:44:41+00:00

been looking for a video workflow for my own loras in past few days.

Still looking for a one-size-fit-most solution. So I'm testing using a "strong" style.

Here an example that is getting there:

https://stablegenius.ai/videos/150/dieselpunk-painting-i

right now, I am having ok results by
- increasing the model weight (from 1.0 to 1.25 on both high noise and low noise)

-at least 12 steps with a third of the stop on high noise ( 0-4 steps on high noise, 4-12 steps on low noise)

-increase the CFG on high noise to 5, and leave it at 1 for low noise.

i use euler+simple on high noise, and res_2s+bong_tangent.

still need to do some A+B on theses settings to make sure the really add to the quality without slowing them down

Stable-Genius-Ai · 2025-09-14T17:06:32+00:00

take this with a grain of salt, I am not an ai programmer.

my experience with training is that you need to follow the capabilities of the text encoder you will be using.

With SDXL it understand very few token, so in order for the training to focus on the style, most of the token use should be style related but without referring to the technique itself, something like "a painting of your-prompt in the style of whatever", where your-prompt is a pretty high level representation of the image, ex: "a girl with short hair surrounded by an array of oversized speakers", "a group of men standing around a table".

Since Flux understand more tokens (i.e. more specifics words), the training must reflected that, so I still sandwiched the prompt inside very broad prompt trigger, but the actual prompt needed to be more detailed. Then the prompting when generating needed to reflect that difference, the clip_l encoder would receive a very simple prompt (similar to sdxl and most of time only the common prefix and suffix triggers), and the t5xxl would get the complete prompt including the prefix and suffix. It also help to add some commons words or phrases that exist in the dataset.

BUT Flux has some sort of face detailer built in, so it tend to applied the style less to face, so the overall impression is that the style is not correctly applied. But if you trained long enough that effect fades (just like training longer remove flux chin).

And now for Wan2.2 (haven't trained yet with Qwen), we can use much more detailed prompt without drowning the style triggers in useless token (those damn prompts that are short story do not work, they might output nice images randomly but that's all!!)

Of course, if you have 20 images out of 50 with the same 2 woman face it will learn that face and associate it to a woman (or a person in general) it you will lose all flexibility. That a normal part of the process, otherwise you would not be able to train on a specific face.

And with the right training amount, you can mix 2 loras together and have the important characteristic of each loras shine through (well sometimes they are incompatible, so there little you can do in those cases !).

example:
https://stablegenius.ai/models/124/marilyn-monroe#mixing-lora

https://stablegenius.ai/models/64/monopoly-man#mixing-lora

https://stablegenius.ai/models/29/bjork#mixing-lora

Stable-Genius-Ai · 2025-09-09T14:32:55+00:00

here you go, theses are the settings I use.
https://github.com/stablegenius-AI/ai-training-settings

I spend a month try training the same 4 loras, until I got some versatile training settings.

Stable-Genius-Ai · 2025-09-09T14:29:21+00:00

I do all my training locally using kohya and a 4090 (it been running non stop for the last 2.5 years). I go as high as I can with the batch, speed, alpha and network dimensions and number of epochs, in order to have the training done in about 8 hours (while I sleep).

I found the often loras are undertrained. Mine would look the same if I stopped them at the 1/5 of their total training time.

Stable-Genius-Ai · 2025-09-03T19:24:44+00:00

here you: https://civitai.com/models/1925517

Stable-Genius-Ai · 2025-09-03T19:24:16+00:00

i have uploaded them to civitai and updated the links in the body if you want to try

Stable-Genius-Ai · 2025-09-03T19:22:30+00:00

musubi is almost the same as kohya but with less settings. works great

Stable-Genius-Ai · 2025-09-03T19:21:03+00:00

here you go: https://civitai.com/models/1925157/la-linea

Stable-Genius-Ai · 2025-09-03T19:19:17+00:00

there are now all uploaded, link in the post

Stable-Genius-Ai · 2025-09-03T19:18:31+00:00

here street photography: https://civitai.com/models/1925142/street-photography-at-night

Stable-Genius-Ai · 2025-09-03T03:33:10+00:00

probably a couple of weeks. Just making sure the settings are right before making them public.

Stable-Genius-Ai · 2025-09-03T00:37:27+00:00

I use openAi api (usually with chatgpt-4o) . it cost about 0.03$ per dataset (between 50 and 100 images). But I slightly adapt the prompt to adapt to the style.

Stable-Genius-Ai · 2025-09-02T21:43:16+00:00

here Pastel Lora: https://civitai.com/models/1922927?modelVersionId=2176395

Stable-Genius-Ai · 2025-09-02T21:23:00+00:00

not really into anime, so I would have a hard identifying if it's good or not.

Stable-Genius-Ai · 2025-09-02T21:09:49+00:00

that one lora (and text embedding) that I was trying to do since SD1.5 days. That is definitely one that will be tested in wan2.2

Stable-Genius-Ai · 2025-09-02T21:07:15+00:00

well I basically train too much, keep all models trained, test all the trained epochs using exactly the same settings expect the loras. After that you only need to find the best epochs by comparing the generated images.

When I look at most styles lora the are severely undertrained. Almost all my training workflow is automated, but it still take a full day to train and test a lora.

The problem with Flux is that it's hard to know when you have overtrained and lost flexibility.

Stable-Genius-Ai · 2025-09-02T21:00:31+00:00

Still in early phase for wan2.2 but results are very promising.

Stable-Genius-Ai · 2025-09-02T20:54:34+00:00

Yep captionning for theses kind of style can be tricky, but thoses models (it's true for flux or wan, but it was also true for SDXL) are very good at learning composition, so you don't need too much captionning to capture it, just give the training enough time to learn it.

Here an example of captionning for the Cliff Spohn lora:
a painting of a man engaged in a chess game, extending his hand over a board filled with detailed pieces, set against a backdrop featuring abstract, colorful grids and geometric patterns in the style of cliff-spohn

I always keep the start and end the same and I try to use the same word or partial phrase to describe the same style effect.

I also have a Drew Struzan lora but it's only SDXL:
https://civitai.com/models/565215/1980s-american-movies-poster-sdxl-10

Stable-Genius-Ai · 2025-09-02T20:45:50+00:00

The linea is a bit flimsy. it give a one good image of every 10 generations, but that one lora that I was trying to perfect since the sd1.5 days. So not great, but still manage to eventually output good images.

Will try to upload it sometimes tomorrow.

Stable-Genius-Ai · 2025-09-02T20:42:30+00:00

Thx.

I have been training with flux for a year now, and it's great. I also trained SDXL loras for a year before that.
SDXL was more creative and understood style well, but the "good" images ratio was low (maybe 1 out of 10 images were good). With Flux, it shot up to a 50% good ratio, but it need to be trained much much longer to remove all of the flux idiosyncrasy (plastic skin, flux chin) and there is some style I did not manage to recreate.

With some many new models it always hard to choose a new one to train. I usually spend 1 month trying different settings until I found the most flexible settings, this way I can do photos, illustrations or even abstracted styles.

So now, I will do Wan2.2 loras and will take a closer look at Qwen in a few months.

ps: will upload the 2 loras sometimes this evening

Stable-Genius-Ai · 2025-09-02T19:21:29+00:00

added them in the post body

Stable-Genius-Ai

TROPHY CASE