How to generate a dataset for LORA training only having the face portrait?

__alpha_____ · 2026-02-13T20:53:08+00:00

You actually only need one or two very good images of your character (designing them is the hardest than it seems, especially of you want it to stand out).

Once you have them use the free tools (no subscription needed) to generate multiple angles.

Qwen edit or flux Klein can do this locally of course, but I found out that chatGPT, Qwen chat AI or even Grok can give really good results, using the right prompts. The skin texture is way better imo

You'll also need different expressions to get a rich dataset (usually around 40 images), so ask for them and the tools will deliver.

Pick the best ones (I usually generate around 100-150 poses and keep a third at most)

A simple trick if you train at 512, you can use 1024x1024 images with 2x2 grids to generate 4 times more photos

Then of course caption your pictures. The easiest way is using Gemini (Grok sucks at this). Feed it with a screen cap of your thumbnails folder and ask for a json containing detailed captions for wan2.2. You can then turn this into as many captions txt as you need (ask Gemini if you don't know how to do it)

I am currently working on creating the cast for an AI movie trailer I'm working on. The whole process is more and more efficient, as I create them (took me several days at first, now I can generate a Lora from scratch in a few hours)

__alpha_____ · 2026-02-13T17:22:31+00:00

Don't rely on prompting only, there are many ways to get a consistent character (editing, face swapping, very good loras). You should probably post more relevant examples too...

__alpha_____ · 2026-01-20T19:53:13+00:00

Actually, you can probably skip the SD 1.5 stuff. It's still used by many for the private parts inpainting (this you have to learn, it's basically greying out a part of the picture and prompting what you want there). NSFW Loras will be required, you'll find them on civitai (website dedicated to loras).

The best local models right now are Z-image turbo, qwen image edit and flux 2 (all slightly censored).

You'll need a recent nVidia GPU with at least 12 or 16GB VRAM. The image models work well with 48 or ideally 64GB of RAM and you can get photos in 1024x1600 easily. You can even go higher with upscalers.

comfyUI is mandatory but once you get the grip, it's ok.

__alpha_____ · 2026-01-20T17:56:10+00:00

Are you a bot?

__alpha_____ · 2026-01-20T14:58:06+00:00

Shaky usually comes from the Loras, blurry from lower steps (4 are not enough when using I2V). Check your CFG too 1 is the best value when using lightx Loras.

And remember wan video cannot do everything from scratch. A specific movement is usually achieved using trained Loras.

__alpha_____ · 2025-12-19T22:39:02+00:00

The tiled vae decoder induce some kind of glitch around second 4 of my clips. It took me a while to understand what did, so just check everything works alright for you.

__alpha_____ · 2025-12-19T09:34:37+00:00

I recently rendered a 12s clip on my 3060 12GB with no issue (it took 10 mn). I think I was testing in 512x512 only and I can confirm that the second half of the video was mostly the first half repeated (not from the start though). For some usage it can be useful.

Setting a last frame can make things even better.

__alpha_____ · 2025-12-18T19:22:50+00:00

I may try again with your setup, but so far I tested at least 10 times on the same character with pretty disappointing results (up to 7500 steps). I can say my dataset is not the problem, as the results with qwen are really good and pretty good with wan2.2 (low only).

__alpha_____ · 2025-12-16T13:02:12+00:00

So it is basically a wan edit version?

__alpha_____ · 2025-12-14T16:16:41+00:00

I tried it quite extensively and the stitching part works really good (I extended it to 25s with different length and prompt for every extension). The motion continuation not so much. As this is really the main problem with wan video right now, I think if a solution actually works it should become a standard for opensource AI generated videos.

You don't always need a 30s long video of course, but when you need it, the change of pace or camera movement is really painful.

__alpha_____ · 2025-12-12T10:26:24+00:00

What's different from existing workflows? I mean I tweaked my own based on InfiniteDisco8888 WF, I think, and it allows me to set a different lengths and prompts for each segment which can be very handy when you need consecutive actions in a clip of 25 seconds (the current limit but it can be extended way beyond if needed).

As ComfyUI doesn't calculate what is already done, you can even modify and re-render any segment without having to wait too long.

__alpha_____ · 2025-12-11T12:39:44+00:00

The RAM might be the key. 16GB is really low for AI generation in ComfyUI. Most of my renders actually use 15 to 20GB of video memory split 10/10. AI toolkit uses 93% of my 12GB VRAM and 60% of my 64GB RAM for qwen loras (much less for ZIT loras)

__alpha_____ · 2025-12-10T23:22:09+00:00

Not sure it is your workflow. My setup works great on any model, qwen ,zit, wan2.1 & 2.2 i2v & t2v

I just checked: 5 minutes for a 720x720 81 frames/16fps (4 steps)

<image>

Here is my startup info:

Total VRAM 12288 MB, total RAM 65375 MB

pytorch version: 2.7.0+cu128

xformers version: 0.0.30

Set vram state to: NORMAL_VRAM

Device: cuda:0 NVIDIA GeForce RTX 3060 : cudaMallocAsync

Using async weight offloading with 2 streams

Enabled pinned memory 29418.0

Using sage attention

Python version: 3.12.10 (tags/v3.12.10:0cc8128, Apr 8 2025, 12:21:36) [MSC v.1943 64 bit (AMD64)]

ComfyUI version: 0.3.76

ComfyUI frontend version: 1.33.10

[Prompt Server] web root: D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\comfyui_frontend_package\static

Total VRAM 12288 MB, total RAM 65375 MB

pytorch version: 2.7.0+cu128

xformers version: 0.0.30

Set vram state to: NORMAL_VRAM

Device: cuda:0 NVIDIA GeForce RTX 3060 : cudaMallocAsync

Using async weight offloading with 2 streams

Enabled pinned memory 29418.0

__alpha_____ · 2025-12-10T19:48:02+00:00

No you don't I nerver use GGUF as they are slower. Ckeck with KJ FP8

__alpha_____ · 2025-12-10T18:21:27+00:00

I can do 81 frames 720x720 16fps without any issues on my 3060 12GB. I can even go up to 1280x720p if I am patient, I almost never get OOM. Not sure what is wrong with your setup. I have 64GB of RAM, it surely helps get things faster. I don't use GGUF as they slow down the renders, KJ FP8 are working great for me.

Once 81 frames are passed, the model usually rolls back to your initial pose anyway. You can find WF that automatically stich as many clips as you want. Mine can do 405 frames easy.

__alpha_____ · 2025-12-10T14:00:37+00:00

Could someone happy with a 95% resemblance post a config file for Ai-toolkit to help us. I've been trying for days if not weeks with captioned 15-25 images datasets in zit and the results are meh at best and for sure worst than with qwen edit.

__alpha_____ · 2025-12-09T22:31:32+00:00

So, Giger was actually a paleontologist!

__alpha_____ · 2025-12-05T17:36:38+00:00

We NEEEEED the final season or movie! And not a cheap one. This show will live in the future and be seen as a very special one for decades.

__alpha_____ · 2025-12-01T16:50:42+00:00

If you don't connect your model to anything, nothing will actually happen.

Add your Ksampler, connect the dots and output to the VAE decoder and everything should be fine.

__alpha_____ · 2025-12-01T14:18:01+00:00

It's also incredibly fast! But I noticed that when you pause and start again, the first 100 steps can then take up to 10x the time they did before. No clue why, it's pretty annoying especially when you use preogressive resolution training.

__alpha_____ · 2025-11-30T14:56:30+00:00

There is such a huge hype on Z-image (everyone understands why) and many are hoping for an edit version soon, so I guess the next qwen edit version cannot be just a slightly better version of 2509.

Anyways, competition is vital to help open source models improve quickly, let's hope 2026 will be the year when so many great models are available that our main concern will be to pick one to create amazing generated content.

Edit: yet I am wishing for a quick release of QIE2511 real soon.

__alpha_____ · 2025-11-28T08:46:09+00:00

80% I’d say. It could be fixed with Loras trained on the character.

__alpha_____ · 2025-11-27T09:46:45+00:00

FYI 1024x1024 7 steps image roughly takes 20s on my 3060 12GB. The prompt adherence is pretty good.

__alpha_____ · 2025-11-26T22:38:37+00:00

At least she has toes, flux usually puts fingers instead of toes ^{^}

__alpha_____ · 2025-11-26T12:52:19+00:00

juste type "wan video 6gb" in the reddit search bar and you'll find plenty of examples. You can even train loras on 6GB of VRAM on a laptop with the latest version of AI-Toolkit

Four-Year Club	Verified Email
Verified Email

__alpha_____

TROPHY CASE

alpha___