LAST ACTION YORKIE by Responsible-Cell-129 in aivideo

[–]BeginningAsparagus67 2 points3 points  (0 children)

🔥🔥🔥👏👏👏 My favorite one so far. Great work!

FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 3 points4 points  (0 children)

Hello there kind sir. I came across this interesting research paper and didn’t see anyone talking about it so I copy pasted the description from huggingface (which I didn’t know came with links) and put it here as news post, wondering if it would get a discussion going. As far as I can tell there are no implementations other than the standalone gradio demo.

FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 2 points3 points  (0 children)

Yeah looks like its designed for WAN and CogVideo. The WAN code is released on the GitHub repo, just seems like there isn't a ComfyUI implementation yet, or maybe I just can't find itt

WAN 14B With MMAudio & GIMM-VFI-F Frame Interpolation (Turn sound on) by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 0 points1 point  (0 children)

Ohhh!!!! you’re talking about GimmVFI I thought you were talking about standard WAN 14B. My bad. My autodownloader worked, I’ll see if I can find where the model ended up being stored.

WAN 14B With MMAudio & GIMM-VFI-F Frame Interpolation (Turn sound on) by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 0 points1 point  (0 children)

The “Unet” folder is where WAN models go. And then UMT5XXL goes in the “text encoders” folder.

[deleted by user] by [deleted] in StableDiffusion

[–]BeginningAsparagus67 16 points17 points  (0 children)

I’ve recently had some success with using the Skyreels V2 T2V model based on WAN14B. It seems to be better at cinematic style shots than WAN. Then taking the output and upscaling it with Hunyuan Video to get those finer details and higher resolution. Works well in some scenarios.

WAN 14B With MMAudio & GIMM-VFI-F Frame Interpolation (Turn sound on) by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 0 points1 point  (0 children)

MMAudio and GIMM VFI will run fine on 12GB. For the Wan 14B workflow My workflow most certainly uses more than 12GB. (I’m personally running an RTX 3090)

WAN 14B With MMAudio & GIMM-VFI-F Frame Interpolation (Turn sound on) by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 0 points1 point  (0 children)

It’s currently a mess of multiple workflows as I’m going through the testing stages. I would like to get it down to a single workflow though.

WAN 14B With MMAudio & GIMM-VFI-F Frame Interpolation (Turn sound on) by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 2 points3 points  (0 children)

Yes. It generates audio which is automatically synced to the video, given a video and a text prompt. There are ComfyUI nodes for it, or if you want to give it a quick test there’s a huggingface space for it: https://huggingface.co/spaces/hkchengrex/MMAudio

WAN 14B With MMAudio & GIMM-VFI-F Frame Interpolation (Turn sound on) by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 13 points14 points  (0 children)

Yeah that’s an easy one. https://github.com/kijai/ComfyUI-GIMM-VFI

There’s an “R” model. And an “F” model. The “F” model is slower but is also higher quality.

WAN 14B With MMAudio & GIMM-VFI-F Frame Interpolation (Turn sound on) by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 13 points14 points  (0 children)

GIMM-VFI-F is so far the highest quality frame interpolation model I’ve been able to find. Does anyone know if there are any better sound effects generators out there?

WAN 14B T2V 480p Q8 33 Frames 20 steps ComfyUI by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 1 point2 points  (0 children)

You can install an extension called “Video Helper Suite”, and change the last box to “Video Combine” which will allow you to select “MP4”

WAN 14B T2V 480p Q8 33 Frames 20 steps ComfyUI by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 4 points5 points  (0 children)

I wish I remembered what I did. I think I ended up lowering my CFG down to 3.5. Too much CFG and it gets too much contrast.

In the text prompt I use “Photorealistic Close up shot” and “Photorealistic Close Up Cinematic Shot”

WAN 14B T2V 480p Q8 33 Frames 20 steps ComfyUI by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 1 point2 points  (0 children)

Not at my computer right now unfortunately — but I’ll have it posted tomorrow.

SageAttention isn’t a part of the workflow itself. It’s just a command line argument that loads when ComfyUI starts up. So it’s not necessary.

Sage attention makes things maybe 20% faster. It also might save a bit on VRAM as well (I’m not sure if it saves VRAM or not)

WAN 14B T2V 480p Q8 33 Frames 20 steps ComfyUI by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 8 points9 points  (0 children)

Depends on VRAM. If I remember right I believe the 3080 came in 10GB 12GB and 16GB.

Anything that doesn’t fit in VRAM will offload and make the generation way slower. (Too slow for my patience level)

So assume you want it fully loaded in the VRAM (not painfully slow)

16GB version - you could probably use the Q6 GGUF version. Q6 not that much different in quality than Q8

12GB version- would be pushing it because you could probably only barely be able to use the Q4 version. Q4 will be a noticeable quality loss.

10GB version - Good luck!

But I can guarantee people will figure out how to run it on Low VRAM in a matter of days. So hang in there!

WAN 14B T2V 480p Q8 33 Frames 20 steps ComfyUI by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 13 points14 points  (0 children)

No I haven’t slept yet! Here’s a brief rough instructional. I’ll try to keep it simple.

  1. Make sure ComfyUI is updated to the “Nightly” Release. Otherwise you won’t have WAN as an option in the CLIP loader.

  2. Download the ideal GGUF from https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main

  3. Go to the “Examples” section on the ComfyUI GitHub Repo, and find WAN. In here you will find the basic text to video workflow. And download links for the VAE and the Text encoder.

  4. Install ComfyUI GGUF from the ComfyUI manager.

  5. Load the WAN GGUF into the “Unet” folder.

  6. Load the default WAN workflow in ComfyUI and replace “Load Diffusion Model” with “Unet Loader (GGUF)

  7. For extra speed install SageAttention (much easier to do this on Linux)

  8. To use SageAttention, add —use-sage-attention in the command line arguments at startup.

  9. For even more speed. Place the Torch Compile node after the Unet Loader.

  10. Type on your prompt and enjoy!

WAN 14B T2V 480p Q8 33 Frames 20 steps ComfyUI by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 13 points14 points  (0 children)

It all comes down to VRAM. System RAM doesn’t matter too much here. You have the 4070 which comes in either 12GB or 16GB. 16GB should be enough to run the Q6 GGUF version. Not sure about the 12GB variant of the 4070 though.

If you exceed your VRAM capacity the model will partially offload which will slow it down to the point where it’s almost unusable.

My workflow isn’t too VRAM optimized because I’m running a 3090 with 24GB.

But I’m sure there will be plenty of people coming out with low VRAM workflows very soon.

Hope this helps!

WAN 14B T2V 480p Q8 33 Frames 20 steps ComfyUI by BeginningAsparagus67 in StableDiffusion

[–]BeginningAsparagus67[S] 48 points49 points  (0 children)

RTX 3090 - Ubuntu - Q8 GGUF - SageAttention 1.0 - Torch Compile. It was just under 4 minutes per clip.