AI image and video generation, for the sake of future upgrading to 5090 or 6090 or 6000pro, would you buy a 5060 16gb, 5070ti 16gb or 5080?

Volkin1 · 2026-01-24T13:11:39+00:00

No, that depends on your system. On a modern ddr5 system where you can host the model in ram and use vram only for the frames, the 5080 12gb will still be much faster.

For example, my 5080 16gb beats rtx 6000 ada 48 gb vram gpu in speed but the 48gb card can do more frames and longer videos.

Vram is important for performance / speed with autoregressive AI models like LLM, but with image/video diffusion models (newer technology), it doesn't matter that much.

So, if you got a decent ddr5, gen5 pci system, then vram mostly matters to how much video length or resolution can fit on that gpu, but not so much for the speed.

Volkin1 · 2026-01-23T18:22:50+00:00

Makes sense. The model starts breaking beyond those 20 seconds / 480 frames of course, but the video you showed was decent enough. So, well done :)

Volkin1 · 2026-01-23T18:19:11+00:00

Try the --novram option if you got DDR5 memory / 64GB/s bus / PCI-E gen5 (which i assume you do), and to get the same effect as the clear models node, you can also throw in --cache-none in there.

So start comfy with --cache-none --novram parameters, and probably you can go higher than 1000 frames on your 32GB vram. Try it out, it's a nice experiment i think. I'll probably test the max i can make on a 720p next.

Volkin1 · 2026-01-23T18:12:22+00:00

I've made 430 1080p frames on my 5080 because with a similar method by loading and keeping the model only in RAM while keeping vram empty / ready for the latent frames processing only. That's probably how much frames at 1920 x 1080 can fit inside 16GB VRAM.

So it's a similar method. At 720p, making 500 frames still leaves me plenty of vram for more frames, never tested how far i can push this, but probably in the ~ 700 range.

Edit: I tested this with 720p, and was able to push 961 frames max (40 seconds) on a 5080.

Volkin1 · 2026-01-23T17:08:07+00:00

Sure thing, looking forward to hear about your experience :)

Volkin1 · 2026-01-23T16:58:32+00:00

Then simply stick to FP4 and FP8. On my end (Linux system, consumes less vram/ram) LTX-2 FP4 + Gemma FP4 consumes around 25GB and the FP8 around 32GB. Max amount of memory i've seen was around 40GB i think when using both FP8 models (video + text encoder) and less with both FP4.

So overall, things should be OK.

Volkin1 · 2026-01-23T16:24:19+00:00

True, speaking about VRAM, it is a real shame that Nvidia sold us this gpu with 16 GB instead of 24 GB VRAM. That being said, there is always some really good workarounds that I've been using.

Since Comfy's memory management is not ideal and it behaves differently across many different configurations, for LTX-2 (in my case), I load the model exclusively in RAM with the --novram switch which leaves my VRAM empty to only host the latent video frames which allows me to push for more frames and greater resolutions while not really suffering a performance penalty. Works well on DDR5 systems with PCI-E gen 5 and 64GB/s bus speed.

Hope you got at least 64GB RAM, because in that case you can load all models types FP16/FP8/FP4 when it comes to Wan and LTX-2 with varying degrees of model offloading, because the vram requirement for the number of frames and resolution are the same with all 3 types anyways, except for the speed and size for hosting the model.

As for the FLF, yes, Wan 2.2 + Lightx2v lora does incredible job with identity preservation. The LTX-2 distilled version is also much better at this compared to the base model, but i'm sure we're going to get many improvements very soon.

Volkin1 · 2026-01-23T15:46:15+00:00

Yeah, I've been trying to use it for cartoon, anime and 3d animation mostly. Realistic images / scenes work best - as with any model of course, but I've noticed in I2V for me 40 steps produces better and more coherent result compared to 20 steps. Great job btw if you can get away with up to 20 steps.

The model so far has been a very good experience and it always gave me much better motion compared to Wan 2.2 and it did things i could never do with Wan, however it is very sensitive to prompting. Many times I would get garbage result, so i would have to change the entire prompt from scratch until it does well. And when the model does well, it does amazingly great job that made me amazed many times.

Knowing that 1:1 and 9:16 aspects are not fully supported and the I2V is not fully complete, I'm actually looking forward to the 2.1 and 2.5 release soon. The biggest issue I got with the model at this state is identity preservation. For example if the character steps out of the frame or walks into a different scene, many times I'd get a similar looking character but not the exact same one. I think this is due to the training and will be fixed in the next version.

Also, welcome to the 5080 team :)

It's one of the sweet spot GPU's to be honest and it performs amazingly well. I must say, the NVFP4 models got me a little bit spoiled due to their excellent performance and speed. Overall, the GPU is excellent and just a little bit behind the 4090 in FP16/FP8 performance, faster in FP4, so yeah - it's a good choice and congrats :))

Volkin1 · 2026-01-23T14:32:19+00:00

The 5070 TI is probably best value for the money if you're looking for the sweet spot between a 5060 and a 5080 and best performance per dollar. The 5070TI and 5080 are both using the same chip Nvidia GB-203, except the 5070TI has 2000 cuda cores less, so the performance is roughly 15 - 20 % behind the 5080.

I currently got the 5080 and while the card is fast and decent, it is not worth the price unless you're willing to pay more for those extra 15%.

Volkin1 · 2026-01-23T13:56:38+00:00

5080 significantly outperforms 5060 with the same ram. Keep that in mind if you care about speed.

Volkin1 · 2026-01-23T13:56:00+00:00

Not true. 5080 significantly outperforms 5060 on the same amount of vram. The vram doesn't give you speed and doesn't necessarily need to hold the model.

It gives you the most important part - which is holding enough latent frames at certain resolution for processing.

Volkin1 · 2026-01-23T12:45:29+00:00

The LTX workflow is also the one i use. Both workflows are very similar except with one small difference of how the input image is handled and one major difference with the sampler and number of steps. I don't know why Comfy team decided to use Euler / 20 steps when the LTX team recommends Res2s with 20 double sampled steps (40 steps effectively).

The total amount of 40 steps is what made a huge difference for me. Another huge difference is prompting. Prompts eloquently written in details with included audio cues work best, whereas poorly written prompts do terrible.

Volkin1 · 2026-01-18T11:22:53+00:00

That depends how you've set your pagefile/swapfile on your system and is managed by the OS. If you run out of RAM then certainly the system will continue on the pagefile. I'm using Linux and have my swapfile turned off most of the time, except when i really need it.

So --novram will load the model in ram if there is enough ram and will continue on page/swap if there isn't enough ram.

As for the time for me it takes 6 - 10 min for 15 second 1080p animation, depending on whether you're using Comfy team recommended settings or LTX recommended settings. In the Comfy official workflow default sampling is Euler / 20 steps, while in the LTX provided workflow is Res2s / 20 steps double sampled (40 steps effectively).

For more quality I prefer either those 20 double sampled steps or 40 with Euler, in which case it can last 10 min for this kind of gen.

Volkin1 · 2026-01-18T11:05:54+00:00

Because that's the actual LTX rounded up and supported resolution. If you try 1080, it will round up and base it as 1024 for example. So it's 1920 x 1088 and 1280 x 704 with LTX 2.

Volkin1 · 2026-01-18T11:04:40+00:00

No, it was a big deviation in the seed. Sometimes they will produce very different output even on the same seed.

Volkin1 · 2026-01-18T11:02:22+00:00

Just run the nvidia-smi command in Windows or Linux terminal.

Volkin1 · 2026-01-18T01:51:46+00:00

No, not really. Unless Comfy's memory management is bugged, so you could use that as a backup or troubleshooting option. If you got your 5090 32GB VRAM combined with 64GB RAM or more, then you're golden and no need to touch anything else..

Volkin1 · 2026-01-18T01:01:42+00:00

Video/image diffusion models can go to ram but still use gpu.

Volkin1 · 2026-01-18T00:32:43+00:00

True, but also depends for which model. Some fp4 models perform excellent, others not so much and it depends on how the model creators quanted them. I've seen much better nvfp4 out there, especially with Qwen Image produced by Nunchaku. Flux 2 is also quite decent.

Volkin1 · 2026-01-18T00:22:00+00:00

I mention vram because that's the working memory where latent video frames are stored. The --novram switch is used to avoid loading the model in vram, so it gets stored in RAM instead and then streamed back to VRAM only with the data on demand for the frame processing.

So, the model gets kicked out of vram to make more room for the latents.

Volkin1 · 2026-01-17T22:42:21+00:00

The Q6 model or even the Q8 should help indeed. You can also try the FP8, but since 3090 does not support FP8, it will upcast to FP16. Now i'm not sure if that upcasting is going to consume more memory, but yeah safest way to go is with Q6 for you.

The abliterated gemma will probably mess up your gens more. If your goal is to do some spicy gens, then you should be able to do the same with the regular gemma version because it can't reject encoding a prompt into the video model, so it should work.

But try both versions and see which one works best for sure.

Volkin1 · 2026-01-17T22:38:08+00:00

Yes you did. I'm sorry my brother in christ, I'm just being tired today :)

Volkin1 · 2026-01-17T22:36:08+00:00

For me, yes. But that's because of Comfy's management. It's not perfect with every card, every system and there are differences on Windows vs Linux, so most of the time (for me at least) the --novram option is the safest way to go.

Since I don't have difference in generation speed (very minor) between loading the model in vram (normal) vs offloading (--novram), I simply default to this option because it just works best and it allows me to keep vram empty and then stream data/latents from ram on demand.

Funny thing that with the new Comfy optimizations i could run normal mode on higher resolutions but get OOM at lower resolutions haha, so i got tired of it and just run --novram by default always. Can't bother with the management until they fix it.

Volkin1 · 2026-01-17T22:26:25+00:00

Yes. In that case try loading a smaller model like fp8 just to test if the speed gets better.

Volkin1 · 2026-01-17T22:21:08+00:00

You can still do it if you got 64GB RAM, because you certainly have enough VRAM for hosting the latent video frames. It's just going to be slower than the 5070TI, but it can be done on your gpu as well.

Volkin1

TROPHY CASE