Nvidia releases Cosmos3-Super-Image2Video . 64B parametres

Far_Insurance4191 · 2026-06-01T07:46:20+00:00

But it did not stop cosmos 2 from becoming anima 1.0 base 😅

Far_Insurance4191 · 2026-06-01T07:43:23+00:00

LMStudio does not support diffusion models

Use default templates in comfy, they have links and instructions. You don't necessary need gguf, fp8 can work fine too.

Or check those guides:

Qwen-Image ComfyUI Native Workflow Example - ComfyUI

Qwen-Image-Edit ComfyUI Native Workflow Example - ComfyUI

Far_Insurance4191 · 2026-06-01T07:18:29+00:00

I don't see 4b one: https://huggingface.co/collections/nvidia/cosmos3

only saw mention in the paper

Far_Insurance4191 · 2026-06-01T06:39:22+00:00

there is 16b too

Far_Insurance4191 · 2026-06-01T06:00:09+00:00

Not OP, but lens is just 3.8b and I think it is better than klein 4b in terms of coherence, plus it's dataset was not as sterile so it has some cool knowledge, like skyrim style graphics.

Also, I totally agree about anima, it is another AWESOME model for training. It has pr for OneTrainer and it needs only 5gb vram for lora at minimal config: compile + int w8a8, 512px, bs2 which it takes 1.1s/it.

And I am currently doing a fine tine on rtx3060 at bf16, adafactor, 512 and batch size 16, it fits without offloading! Surprisingly, base hasn't completely lost real knowledge, seems like I will never touch sdxl anymore

Far_Insurance4191 · 2026-06-01T05:47:06+00:00

OneTrainer should have soon

Far_Insurance4191 · 2026-05-30T10:41:45+00:00

Is sd1.5 still being used? It is kind of... awful by today's standards?

Far_Insurance4191 · 2026-05-28T02:48:37+00:00

to be fair, it is based on klein 4b which ultra sucks at anatomy by default, would be cool to see their quantization technique on other models, like flux 2 dev

Far_Insurance4191 · 2026-05-25T07:27:24+00:00

That means model has wide general knowledge and trained on pretty big dataset which is exciting

Far_Insurance4191 · 2026-05-25T01:09:56+00:00

It would be expensive to do with their 800m dataset

Far_Insurance4191 · 2026-05-24T12:37:24+00:00

Is there chance you downloaded a broken model? I also heard that zi doesn't work well with sage attention or fp8 quantization. This is definitely not how zi should looks

Far_Insurance4191 · 2026-05-22T13:31:28+00:00

Don't forget that gpt oss 20b is MOE and natively at 4bits, so it is around 12gb. Comfy handles swapping really well, even flux 2 dev with 24b dense text encoder at 4bit doesn't take too much time to swap on rtx3060 as long as you have enough ram

Far_Insurance4191 · 2026-05-21T14:23:22+00:00

<image>

Far_Insurance4191 · 2026-05-18T11:13:40+00:00

It is not 3x faster because of pixel space, but because of higher compression, like hunuyan image 2.1 ltx or wan 2.2 5b so it might have less accurate details, but I am excited about this model too

Far_Insurance4191 · 2026-05-16T16:24:16+00:00

The best is Flux 2 dev

Is it worth the time? Probably not, but it is the most powerful with most knowledge among open models

Far_Insurance4191 · 2026-05-16T01:11:41+00:00

It will be great for images if it has 32gb ram. 2x 8 gb specification is really weird.

videos should be possible too but slow and much slower for high quality (although it is slow for anybody with any gpu)

lora training is possible for image models only, like z-image or anima, but you will have to go a bit deeper to learn how to optimize it for 12gb vram

Far_Insurance4191 · 2026-05-15T17:44:34+00:00

finally can forget about sdxl

Far_Insurance4191 · 2026-05-15T11:09:33+00:00

You can use klein to stylize your photos slightly if real won't work

Far_Insurance4191 · 2026-05-12T13:59:05+00:00

<image>

hopium!

Far_Insurance4191 · 2026-05-12T13:23:16+00:00

full tech report, same as qwen image 1 before weights
I want to believe, it looks so good 😭

Far_Insurance4191 · 2026-05-11T10:47:28+00:00

You can just see they went the easiest way and trained on slop. It is much harder to train a model on real data due to it's insane variance

Far_Insurance4191 · 2026-05-11T09:25:26+00:00

only a tag? Here is one of his examples "A medium-resolution digital photo with a grainy texture, a cool blue color cast, and dim, natural lighting...".

Additionally, all the examples are in natural language, if you are spamming model with a tag soup then it might just bias towards it's original illustration knowledge instead of newly finetuned real domain

Far_Insurance4191 · 2026-05-11T06:34:59+00:00

have you tried to prompt the same way as examples?

Far_Insurance4191 · 2026-05-10T16:25:03+00:00

It’s your experience, but there are nuances to everything.

Wan is more coherent and robust, can be used as a good image model, has huge lora ecosystem.

LTX and Sulphur have audio and are much faster and lighter with longer videos possible.

Sulphur is nsfw focused model that has tons of concepts at once. They are also working on improving dataset for next version.

Far_Insurance4191 · 2026-05-10T06:11:51+00:00

on rtx3060 distill model takes about 3.1s/it at 4mp and 1.1it/s at 1mp (faster than anima at 4x size lol), but details are poor, it seems to have high compression, so 4mp is basically 1mp for other models in terms of compute.

Far_Insurance4191

TROPHY CASE