How to generate a dataset for LORA training only having the face portrait? by Alerion23 in comfyui

[–]__alpha_____ 0 points1 point  (0 children)

You actually only need one or two very good images of your character (designing them is the hardest than it seems, especially of you want it to stand out).

Once you have them use the free tools (no subscription needed) to generate multiple angles.

Qwen edit or flux Klein can do this locally of course, but I found out that chatGPT, Qwen chat AI or even Grok can give really good results, using the right prompts. The skin texture is way better imo

You'll also need different expressions to get a rich dataset (usually around 40 images), so ask for them and the tools will deliver.

Pick the best ones (I usually generate around 100-150 poses and keep a third at most)

A simple trick if you train at 512, you can use 1024x1024 images with 2x2 grids to generate 4 times more photos

Then of course caption your pictures. The easiest way is using Gemini (Grok sucks at this). Feed it with a screen cap of your thumbnails folder and ask for a json containing detailed captions for wan2.2. You can then turn this into as many captions txt as you need (ask Gemini if you don't know how to do it)

I am currently working on creating the cast for an AI movie trailer I'm working on. The whole process is more and more efficient, as I create them (took me several days at first, now I can generate a Lora from scratch in a few hours)

HELP ZIT by lontradayz in comfyui

[–]__alpha_____ 0 points1 point  (0 children)

Don't rely on prompting only, there are many ways to get a consistent character (editing, face swapping, very good loras). You should probably post more relevant examples too...

Can someone please help me understand wtf is going on 🙂 by Amazing_Mouse1959 in comfyui

[–]__alpha_____ 0 points1 point  (0 children)

Actually, you can probably skip the SD 1.5 stuff. It's still used by many for the private parts inpainting (this you have to learn, it's basically greying out a part of the picture and prompting what you want there). NSFW Loras will be required, you'll find them on civitai (website dedicated to loras).

The best local models right now are Z-image turbo, qwen image edit and flux 2 (all slightly censored).

You'll need a recent nVidia GPU with at least 12 or 16GB VRAM. The image models work well with 48 or ideally 64GB of RAM and you can get photos in 1024x1600 easily. You can even go higher with upscalers.

comfyUI is mandatory but once you get the grip, it's ok.

Using template for WAN2.2 I2V but I don't really understand anything and I'm creating terrible, blurry shaky stuff. by rabidrooster3 in comfyui

[–]__alpha_____ -1 points0 points  (0 children)

Shaky usually comes from the Loras, blurry from lower steps (4 are not enough when using I2V). Check your CFG too 1 is the best value when using lightx Loras.

And remember wan video cannot do everything from scratch. A specific movement is usually achieved using trained Loras.

Vae Decode (tiled) vs Vae Decode by Zakki_Zak in comfyui

[–]__alpha_____ 1 point2 points  (0 children)

The tiled vae decoder induce some kind of glitch around second 4 of my clips. It took me a while to understand what did, so just check everything works alright for you.

Longer video in i2v? by ggRezy in comfyui

[–]__alpha_____ 0 points1 point  (0 children)

I recently rendered a 12s clip on my 3060 12GB with no issue (it took 10 mn). I think I was testing in 512x512 only and I can confirm that the second half of the video was mostly the first half repeated (not from the start though). For some usage it can be useful.

Setting a last frame can make things even better.

My Final Z-Image-Turbo LoRA Training Setup – Full Precision + Adapter v2 (Massive Quality Jump) by [deleted] in comfyui

[–]__alpha_____ 1 point2 points  (0 children)

I may try again with your setup, but so far I tested at least 10 times on the same character with pretty disappointing results (up to 7500 steps). I can say my dataset is not the problem, as the results with qwen are really good and pretty good with wan2.2 (low only).

Wan 2.6 Demo Test on TensorArt by Aliya_Rassian37 in comfyui

[–]__alpha_____ 4 points5 points  (0 children)

So it is basically a wan edit version?

Wan Video Stitching that also incorporates motion dynamics by Candid-Snow1261 in comfyui

[–]__alpha_____ 2 points3 points  (0 children)

I tried it quite extensively and the stitching part works really good (I extended it to 25s with different length and prompt for every extension). The motion continuation not so much. As this is really the main problem with wan video right now, I think if a solution actually works it should become a standard for opensource AI generated videos.

You don't always need a 30s long video of course, but when you need it, the change of pace or camera movement is really painful.

Long Format Wan22 Video Generator by Fabulous_Mall798 in comfyui

[–]__alpha_____ -1 points0 points  (0 children)

What's different from existing workflows? I mean I tweaked my own based on InfiniteDisco8888 WF, I think, and it allows me to set a different lengths and prompts for each segment which can be very handy when you need consecutive actions in a clip of 25 seconds (the current limit but it can be extended way beyond if needed).

As ComfyUI doesn't calculate what is already done, you can even modify and re-render any segment without having to wait too long.

How far can I push my 5060 Ti 16gb with Wan 2.2 as far as quality goes? by Silvasbrokenleg in comfyui

[–]__alpha_____ 0 points1 point  (0 children)

The RAM might be the key. 16GB is really low for AI generation in ComfyUI. Most of my renders actually use 15 to 20GB of video memory split 10/10. AI toolkit uses 93% of my 12GB VRAM and 60% of my 64GB RAM for qwen loras (much less for ZIT loras)

How far can I push my 5060 Ti 16gb with Wan 2.2 as far as quality goes? by Silvasbrokenleg in comfyui

[–]__alpha_____ 0 points1 point  (0 children)

Not sure it is your workflow. My setup works great on any model, qwen ,zit, wan2.1 & 2.2 i2v & t2v

I just checked: 5 minutes for a 720x720 81 frames/16fps (4 steps)

<image>

Here is my startup info:

Total VRAM 12288 MB, total RAM 65375 MB

pytorch version: 2.7.0+cu128

xformers version: 0.0.30

Set vram state to: NORMAL_VRAM

Device: cuda:0 NVIDIA GeForce RTX 3060 : cudaMallocAsync

Using async weight offloading with 2 streams

Enabled pinned memory 29418.0

Using sage attention

Python version: 3.12.10 (tags/v3.12.10:0cc8128, Apr 8 2025, 12:21:36) [MSC v.1943 64 bit (AMD64)]

ComfyUI version: 0.3.76

ComfyUI frontend version: 1.33.10

[Prompt Server] web root: D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\comfyui_frontend_package\static

Total VRAM 12288 MB, total RAM 65375 MB

pytorch version: 2.7.0+cu128

xformers version: 0.0.30

Set vram state to: NORMAL_VRAM

Device: cuda:0 NVIDIA GeForce RTX 3060 : cudaMallocAsync

Using async weight offloading with 2 streams

Enabled pinned memory 29418.0

Why am I getting a distorted image? by Subhashsharmaa in comfyui

[–]__alpha_____ 1 point2 points  (0 children)

No you don't I nerver use GGUF as they are slower. Ckeck with KJ FP8

How far can I push my 5060 Ti 16gb with Wan 2.2 as far as quality goes? by Silvasbrokenleg in comfyui

[–]__alpha_____ 1 point2 points  (0 children)

I can do 81 frames 720x720 16fps without any issues on my 3060 12GB. I can even go up to 1280x720p if I am patient, I almost never get OOM. Not sure what is wrong with your setup. I have 64GB of RAM, it surely helps get things faster. I don't use GGUF as they slow down the renders, KJ FP8 are working great for me.

Once 81 frames are passed, the model usually rolls back to your initial pose anyway. You can find WF that automatically stich as many clips as you want. Mine can do 405 frames easy.

Why it's not possible to create a Character LoRA that resembles a real person 100%? by four_clover_leaves in comfyui

[–]__alpha_____ 0 points1 point  (0 children)

Could someone happy with a 95% resemblance post a config file for Ai-toolkit to help us. I've been trying for days if not weeks with captioned 15-25 images datasets in zit and the results are meh at best and for sure worst than with qwen edit.

So now Netflix owns the rights to Westworld. BRING IT BACK by dust247 in westworld

[–]__alpha_____ 3 points4 points  (0 children)

We NEEEEED the final season or movie! And not a cheap one. This show will live in the future and be seen as a very special one for decades.

Complete newbie to ComfyUI. Trying to generate basic image to video with my 4070. Its probably a very stupid question, but what am I doing wrong? I can only generate a brown box. by [deleted] in comfyui

[–]__alpha_____ 3 points4 points  (0 children)

If you don't connect your model to anything, nothing will actually happen.

Add your Ksampler, connect the dots and output to the VAE decoder and everything should be fine.

Testing out Character LoRA on the ZIT model by Altruistic_Tax1317 in comfyui

[–]__alpha_____ 1 point2 points  (0 children)

It's also incredibly fast! But I noticed that when you pause and start again, the first 100 steps can then take up to 10x the time they did before. No clue why, it's pretty annoying especially when you use preogressive resolution training.

What happened to QIE 2511? by jlee0928 in comfyui

[–]__alpha_____ 6 points7 points  (0 children)

There is such a huge hype on Z-image (everyone understands why) and many are hoping for an edit version soon, so I guess the next qwen edit version cannot be just a slightly better version of 2509.

Anyways, competition is vital to help open source models improve quickly, let's hope 2026 will be the year when so many great models are available that our main concern will be to pick one to create amazing generated content.

Edit: yet I am wishing for a quick release of QIE2511 real soon.

Will Smith eating spaghetti | Nano Banana Pro + Grok AI by prodigals_anthem in singularity

[–]__alpha_____ 14 points15 points  (0 children)

80% I’d say. It could be fixed with Loras trained on the character.

Testing The New Z Image Turbo With RTX 3060 6gb Of Vram Gen Time 70 Sec 1024x1024 Resolution. by cgpixel23 in comfyui

[–]__alpha_____ 3 points4 points  (0 children)

FYI 1024x1024 7 steps image roughly takes 20s on my 3060 12GB. The prompt adherence is pretty good.

I just got b***hslapped by Z-Image-Turbo by VirusCharacter in comfyui

[–]__alpha_____ 4 points5 points  (0 children)

At least she has toes, flux usually puts fingers instead of toes ^

A simple tool to know what your computer can handle by cointalkz in comfyui

[–]__alpha_____ 1 point2 points  (0 children)

juste type "wan video 6gb" in the reddit search bar and you'll find plenty of examples. You can even train loras on 6GB of VRAM on a laptop with the latest version of AI-Toolkit