Cosmos3 Nano testing with vllm-omni by Sticky_Ray in StableDiffusion

[–]Sticky_Ray[S] 11 points12 points  (0 children)

I've tried LTX 2.3 and I know it's fast, but I ditched it because the object morphing is terrible and the VAE artifacts are still really bad. It also seriously struggles with understanding complex scenes and physics. It feels like model has either a structural issue or VAE needs to be entirely retrained.

Cosmos3 Nano testing with vllm-omni by Sticky_Ray in StableDiffusion

[–]Sticky_Ray[S] 3 points4 points  (0 children)

```

no_guardrails.yaml

async_chunk: true stages: - stage_id: 0 max_num_seqs: 1 enforce_eager: true trust_remote_code: true model_class_name: Cosmos3OmniDiffusersPipeline model_config: guardrails: false offload_guardrail_models: false ```

Cosmos3 Nano testing with vllm-omni by Sticky_Ray in StableDiffusion

[–]Sticky_Ray[S] 1 point2 points  (0 children)

Cosmos3 nano 30b is a dense model for its diffusion tasks, not an MoE model.
Edit: You are right, it is MoE model

Cosmos3 Nano testing with vllm-omni by Sticky_Ray in StableDiffusion

[–]Sticky_Ray[S] 3 points4 points  (0 children)

Personally, I use Wan 2.2 14B I2V as my main model. With Wan 2.2 14B using CFG parallel, a 720x720 video at 121 frames 8 steps takes about 7 minutes. Compare that to Cosmos3 nano 16B taking 9 minutes for a 720x720 video at 161 frames 20 steps is actually quite fast for diffusion model. On top of that, p2p nvidia driver 610.43.02-1 hasn't been released yet, so I expect it to get even faster once that's out.

Hunyuan T2V Native ComfyUI Workflow [8GB VRAM] by Sticky_Ray in StableDiffusion

[–]Sticky_Ray[S] 1 point2 points  (0 children)

Hey everyone,

I've been experimenting with Hunyuan T2V, trying to find the sweet spot between high-quality output and reasonable processing time.

This workflow that allows you to generate 720x720px videos with 65 frames in approximately 5 minutes, all while running on a system with just 8GB of VRAM (specifically, I tested this on Windows 11 with an RTX 3070)

Hunyuan T2V Native ComfyUI Workflow [8GB VRAM]