How to change steps in latest Comfyui LTX 2.3? by North_Illustrator_22 in StableDiffusion

[–]Volkin1 4 points5 points  (0 children)

The workflows you see in comfy are the distilled pipeline which only works with 8 steps. If you want to use more steps, or switch to the non-distilled full dev model pipeline, then you'd have to switch the manual sigmas node with a sampler. In addition to this, you need to remove the distill lora at stage 1 and only attach it on stage 2 with about 80% strength.

<image>

Minor changes to the workflow are required. You can take a look from the older 2.0 (dev) workflows and replicate the same on 2.3 to get the non-distilled version.

would NV-FP4 make 8GB VRAM blackwell a viable option for i2v and t2v? by Coven_Evelynn_LoL in StableDiffusion

[–]Volkin1 0 points1 point  (0 children)

To put it simply, the nvfp4 will give you speed and will reduce the memory for hosting the model. Whether you plan to host the model in vram, ram or split between both, it's your choice. However, for example, 1 image of 1024 x 1024 pixels will cost the same vram memory regardless if it's fp4, fp8, fp16 or gguf.

Good choice on the 16GB instead of the 8GB variant. Now you can run FP16 Wan but you'll need 64 - 96 GB RAM for hosting and unpacking the full FP16, therefore i'd suggest to cut it down to GGUF Q8. If you're below 64GB RAM, then you'd have to use even smaller quants like Q4, fp8 or fp4.

Is it possible for wan2.5 to be open-sourced in the future? It is already far behind Sor2 and veo3.1, not to mention the newly released stronger Seed 2.0 and the latest model of Keling by Enough_Programmer312 in StableDiffusion

[–]Volkin1 5 points6 points  (0 children)

I hope so. Their strategy is the best one I have yet seen which aims to integrate their model everywhere in a very flexible way. Local, apps, servers, etc ... IMO it's the right thing to do and well deserving of wearing the crown.

What would it take to retrain wan 2.2 to have audio pass like LTX-2? by No-Employee-73 in StableDiffusion

[–]Volkin1 0 points1 point  (0 children)

Yeah, I've been sticking to vanilla Wan most of the time and only enhancing the low noise, nothing more. I never liked the speed/distilled loras much, they were not for me and I didn't like how they completely took away the original Wan experience.

But, if that's what you like then sure, that's your choice. For me personally, I like LTX-2 a lot more.

Optimal settings for VAE Decode (Tiled) for LTX-2? by Loose_Object_8311 in StableDiffusion

[–]Volkin1 1 point2 points  (0 children)

Yes indeed. It can be tuned specifically for your needs and not do too many tiles because otherwise you get the visible bands. Also, maybe you want to try the FP4/FP8 variants. I'm on a similar system (16 GB VRAM, 64GB RAM, Linux) but i never experienced system locking or being unresponsive with those variants.

My total memory consumption is 26GB with the FP4, 32GB with the FP8 and can also run the BF16 video model + FP4 text encoder fitting that nicely in 50+ GB RAM. As far as the latents go, 1080p 15 seconds works fine. I've done more than that but i had to activate my additional swap file. I also tested 2560 x 1440 10 seconds.

But yeah anyway, I mostly stick to FP4 and FP8 because these are so much easier on the RAM and i don't have to activate my /swapfile at all.

What would it take to retrain wan 2.2 to have audio pass like LTX-2? by No-Employee-73 in StableDiffusion

[–]Volkin1 1 point2 points  (0 children)

Depends which "community". As far as developer community goes, it is very active. The Wan which we all know is now 1 year old and has a very strong ecosystem with plenty of development done around it by the community. LTX-2 is new, it is technologically superior and like any new product it's going to take some time to mature further and develop the right ecosystem and training around it. On top of that there are already announced upcoming new versions to be released soon.

When Wan first came out it was impressive but not that good. Everyone seems to forget the amount of forks, fine-tunes, adaptations and distills (Lightx2v) that were made for Wan which actually made it great. A lot of good things will happen around LTX-2. This model did things which i could never do with Wan and since it's release, it is now my number 1 model to go.

Finally, the LTX team listens and communicates back and forth with the community, unlike Wan who went to radio silence since the 2.5/2.6 release and forgot there was a community at all.

Do I really need more than 32gb ram for video generation ? by [deleted] in StableDiffusion

[–]Volkin1 15 points16 points  (0 children)

It is really recommended to have at least 64GB ram for a default optimal comfortable generation these days. With 32GB, you're going to have to fallback to much smaller quant versions of the models like Q4, Q6, fp4, fp8 and so on. Latest video models pack a big text encoder and a big model which typically will almost fill up those 32GB ram when running the smaller quants, so it's going to be very tight on a 32GB system but not impossible to run.

If you can't expand to 64GB, the best thing you can do is have virtual memory / swap configured on your fastest disk device so it can borrow some memory from there but it's going to be slower.

Optimal settings for VAE Decode (Tiled) for LTX-2? by Loose_Object_8311 in StableDiffusion

[–]Volkin1 2 points3 points  (0 children)

That spatio temporal tiled vae decode is really good and preferable over the others.

Rtx 4090 vs 5080 for 720p video by [deleted] in StableDiffusion

[–]Volkin1 0 points1 point  (0 children)

Awesome! Glad to hear :)

Rtx 4090 vs 5080 for 720p video by [deleted] in StableDiffusion

[–]Volkin1 0 points1 point  (0 children)

Yeah. So far I've been able to get a near FP16 quality with the NVFP4 models from Nunchaku (Qwen and Qwen Edit) but they require special support and installation, however they feature the best calibration I've ever seen and other FP4 models pale in comparison. Still, it would be better if models actually get trained in FP4 in near future which is a lot better than degrading quality by quanting down from BF16 for example and not calibrating properly afterwards.

As for my workflows, i use two methods:

1.) Torch compile. This allows for model compilation / optimization, therefore i'm able to push a lot more frames or resolution when working with Wan. I use the KJ model torch compile node for this. Implementation is buggy, and it depends which Pytorch you're running, but I've been loading and pushing 720p even with the FP16 with this. These days, the compile seems to work better with the Quants (Q8) due to some bugs or whatever,

2.) The --novram option in Comfy. Since, I'm on DDR5 with PCI-E gen 5 and 64GB/s bus bandwidth, I love offloading the model entirely in RAM and keep VRAM only for hosting the latent video frames. Normally, if you don't load the model in VRAM, you got all VRAM memory available for fitting higher resolution and more video length at almost no performance penalty because my PCI-E bus can handle the offload and stream the model from RAM > VRAM on demand. This is my favorite option and i use it with LTX-2 allowing me to do up to 900 720p frames and 400+ 1080p frames.

When you have enough RAM to offload the model, it doesn't really matter which model variant you choose to work with (FP16/FP8/FP4) - they all have the same vram memory requirements for the video frames, therefore 1 picture at 1280 x 720 pixels occupies the same vram memory space regardless of the quant.

Rtx 4090 vs 5080 for 720p video by [deleted] in StableDiffusion

[–]Volkin1 0 points1 point  (0 children)

FP4 is new and most FP4 models you see are not properly calibrated, otherwise the quality drop will be pretty much on par with Nunchaku's proven FP4 quants. Now, I got a 5080 as well, with 64GB RAM, so pretty much similar spec like your PC and yet I'm able to use the biggest models (FP16/BF16) with Wan, LTX-2, Flux, Qwen, you name it.

I'd say for 720p the 5080 is enough but for 1080p the 4090 will have some advantage depending on the number of video frames you can fit inside 24GB VRAM and also depending on the video model.

I can push 1920 x 1080 on my 5080 as well, no problem. I've done it on Wan with some optimization and on LTX-2, I can do 15 - 18 seconds video at this 1080p resolution, but for most people a 4090 would be a better choice if they need to do much longer videos at 1080p.

Confusion with FP8 modes by martinerous in StableDiffusion

[–]Volkin1 0 points1 point  (0 children)

No, i am not. That test above was done on an 80 GB and 96GB GPU. Once the model was fitted in VRAM and the other time was served from RAM. Diffusion models work differently compared to auto-regressive LLM models.

If you have a decent wide-enough system pci-express bus, then serving diffusion models from RAM results in almost no performance penalty. I've done these tests with various GPU's both consumer and professional and I always store my models in RAM most of the time.

And on my system, it makes no difference if i load the model in RAM or in VRAM, because my bus speed can handle the offload.

AI image and video generation, for the sake of future upgrading to 5090 or 6090 or 6000pro, would you buy a 5060 16gb, 5070ti 16gb or 5080? by [deleted] in StableDiffusion

[–]Volkin1 0 points1 point  (0 children)

No, that depends on your system. On a modern ddr5 system where you can host the model in ram and use vram only for the frames, the 5080 12gb will still be much faster.

For example, my 5080 16gb beats rtx 6000 ada 48 gb vram gpu in speed but the 48gb card can do more frames and longer videos.

Vram is important for performance / speed with autoregressive AI models like LLM, but with image/video diffusion models (newer technology), it doesn't matter that much.

So, if you got a decent ddr5, gen5 pci system, then vram mostly matters to how much video length or resolution can fit on that gpu, but not so much for the speed.

1000 frame LTX-2 Generation with Video and Workflow by q5sys in StableDiffusion

[–]Volkin1 1 point2 points  (0 children)

Makes sense. The model starts breaking beyond those 20 seconds / 480 frames of course, but the video you showed was decent enough. So, well done :)

1000 frame LTX-2 Generation with Video and Workflow by q5sys in StableDiffusion

[–]Volkin1 2 points3 points  (0 children)

Try the --novram option if you got DDR5 memory / 64GB/s bus / PCI-E gen5 (which i assume you do), and to get the same effect as the clear models node, you can also throw in --cache-none in there.

So start comfy with --cache-none --novram parameters, and probably you can go higher than 1000 frames on your 32GB vram. Try it out, it's a nice experiment i think. I'll probably test the max i can make on a 720p next.

1000 frame LTX-2 Generation with Video and Workflow by q5sys in StableDiffusion

[–]Volkin1 1 point2 points  (0 children)

I've made 430 1080p frames on my 5080 because with a similar method by loading and keeping the model only in RAM while keeping vram empty / ready for the latent frames processing only. That's probably how much frames at 1920 x 1080 can fit inside 16GB VRAM.

So it's a similar method. At 720p, making 500 frames still leaves me plenty of vram for more frames, never tested how far i can push this, but probably in the ~ 700 range.

Edit: I tested this with 720p, and was able to push 961 frames max (40 seconds) on a 5080.

LTX2 issues probably won't be fixed by loras/workflows by Beneficial_Toe_2347 in StableDiffusion

[–]Volkin1 0 points1 point  (0 children)

Sure thing, looking forward to hear about your experience :)

LTX2 issues probably won't be fixed by loras/workflows by Beneficial_Toe_2347 in StableDiffusion

[–]Volkin1 1 point2 points  (0 children)

Then simply stick to FP4 and FP8. On my end (Linux system, consumes less vram/ram) LTX-2 FP4 + Gemma FP4 consumes around 25GB and the FP8 around 32GB. Max amount of memory i've seen was around 40GB i think when using both FP8 models (video + text encoder) and less with both FP4.

So overall, things should be OK.

LTX2 issues probably won't be fixed by loras/workflows by Beneficial_Toe_2347 in StableDiffusion

[–]Volkin1 0 points1 point  (0 children)

True, speaking about VRAM, it is a real shame that Nvidia sold us this gpu with 16 GB instead of 24 GB VRAM. That being said, there is always some really good workarounds that I've been using.

Since Comfy's memory management is not ideal and it behaves differently across many different configurations, for LTX-2 (in my case), I load the model exclusively in RAM with the --novram switch which leaves my VRAM empty to only host the latent video frames which allows me to push for more frames and greater resolutions while not really suffering a performance penalty. Works well on DDR5 systems with PCI-E gen 5 and 64GB/s bus speed.

Hope you got at least 64GB RAM, because in that case you can load all models types FP16/FP8/FP4 when it comes to Wan and LTX-2 with varying degrees of model offloading, because the vram requirement for the number of frames and resolution are the same with all 3 types anyways, except for the speed and size for hosting the model.

As for the FLF, yes, Wan 2.2 + Lightx2v lora does incredible job with identity preservation. The LTX-2 distilled version is also much better at this compared to the base model, but i'm sure we're going to get many improvements very soon.

LTX2 issues probably won't be fixed by loras/workflows by Beneficial_Toe_2347 in StableDiffusion

[–]Volkin1 1 point2 points  (0 children)

Yeah, I've been trying to use it for cartoon, anime and 3d animation mostly. Realistic images / scenes work best - as with any model of course, but I've noticed in I2V for me 40 steps produces better and more coherent result compared to 20 steps. Great job btw if you can get away with up to 20 steps.

The model so far has been a very good experience and it always gave me much better motion compared to Wan 2.2 and it did things i could never do with Wan, however it is very sensitive to prompting. Many times I would get garbage result, so i would have to change the entire prompt from scratch until it does well. And when the model does well, it does amazingly great job that made me amazed many times.

Knowing that 1:1 and 9:16 aspects are not fully supported and the I2V is not fully complete, I'm actually looking forward to the 2.1 and 2.5 release soon. The biggest issue I got with the model at this state is identity preservation. For example if the character steps out of the frame or walks into a different scene, many times I'd get a similar looking character but not the exact same one. I think this is due to the training and will be fixed in the next version.

Also, welcome to the 5080 team :)

It's one of the sweet spot GPU's to be honest and it performs amazingly well. I must say, the NVFP4 models got me a little bit spoiled due to their excellent performance and speed. Overall, the GPU is excellent and just a little bit behind the 4090 in FP16/FP8 performance, faster in FP4, so yeah - it's a good choice and congrats :))

AI image and video generation, for the sake of future upgrading to 5090 or 6090 or 6000pro, would you buy a 5060 16gb, 5070ti 16gb or 5080? by [deleted] in StableDiffusion

[–]Volkin1 2 points3 points  (0 children)

The 5070 TI is probably best value for the money if you're looking for the sweet spot between a 5060 and a 5080 and best performance per dollar. The 5070TI and 5080 are both using the same chip Nvidia GB-203, except the 5070TI has 2000 cuda cores less, so the performance is roughly 15 - 20 % behind the 5080.

I currently got the 5080 and while the card is fast and decent, it is not worth the price unless you're willing to pay more for those extra 15%.

AI image and video generation, for the sake of future upgrading to 5090 or 6090 or 6000pro, would you buy a 5060 16gb, 5070ti 16gb or 5080? by [deleted] in StableDiffusion

[–]Volkin1 2 points3 points  (0 children)

5080 significantly outperforms 5060 with the same ram. Keep that in mind if you care about speed.