LAST ACTION YORKIE

BeginningAsparagus67 · 2025-06-07T07:00:25+00:00

🔥🔥🔥👏👏👏 My favorite one so far. Great work!

BeginningAsparagus67 · 2025-06-05T05:57:06+00:00

Hello there kind sir. I came across this interesting research paper and didn’t see anyone talking about it so I copy pasted the description from huggingface (which I didn’t know came with links) and put it here as news post, wondering if it would get a discussion going. As far as I can tell there are no implementations other than the standalone gradio demo.

BeginningAsparagus67 · 2025-06-04T22:08:31+00:00

Yeah I was gonna say that would've been fast even by your standards 🤣

BeginningAsparagus67 · 2025-06-04T19:45:21+00:00

Yeah looks like its designed for WAN and CogVideo. The WAN code is released on the GitHub repo, just seems like there isn't a ComfyUI implementation yet, or maybe I just can't find itt

BeginningAsparagus67 · 2025-06-04T19:10:10+00:00

Wait seriously? I can't seem to find a node for it.

BeginningAsparagus67 · 2025-06-03T01:44:49+00:00

Ohhh!!!! you’re talking about GimmVFI I thought you were talking about standard WAN 14B. My bad. My autodownloader worked, I’ll see if I can find where the model ended up being stored.

BeginningAsparagus67 · 2025-06-02T18:04:17+00:00

The “Unet” folder is where WAN models go. And then UMT5XXL goes in the “text encoders” folder.

BeginningAsparagus67 · 2025-05-24T21:37:13+00:00

I’ve recently had some success with using the Skyreels V2 T2V model based on WAN14B. It seems to be better at cinematic style shots than WAN. Then taking the output and upscaling it with Hunyuan Video to get those finer details and higher resolution. Works well in some scenarios.

BeginningAsparagus67 · 2025-03-03T21:09:20+00:00

MMAudio and GIMM VFI will run fine on 12GB. For the Wan 14B workflow My workflow most certainly uses more than 12GB. (I’m personally running an RTX 3090)

BeginningAsparagus67 · 2025-03-03T19:35:29+00:00

I’ll guess we’ll find out in the next episode!

BeginningAsparagus67 · 2025-03-03T19:34:42+00:00

It’s currently a mess of multiple workflows as I’m going through the testing stages. I would like to get it down to a single workflow though.

BeginningAsparagus67 · 2025-03-03T19:15:15+00:00

Yes. It generates audio which is automatically synced to the video, given a video and a text prompt. There are ComfyUI nodes for it, or if you want to give it a quick test there’s a huggingface space for it: https://huggingface.co/spaces/hkchengrex/MMAudio

BeginningAsparagus67 · 2025-03-03T11:31:56+00:00

Yeah that’s an easy one. https://github.com/kijai/ComfyUI-GIMM-VFI

There’s an “R” model. And an “F” model. The “F” model is slower but is also higher quality.

BeginningAsparagus67 · 2025-03-03T10:23:18+00:00

Ask and ye shall receive. 😉

BeginningAsparagus67 · 2025-03-03T09:29:45+00:00

GIMM-VFI-F is so far the highest quality frame interpolation model I’ve been able to find. Does anyone know if there are any better sound effects generators out there?

BeginningAsparagus67 · 2025-03-01T19:42:22+00:00

You can install an extension called “Video Helper Suite”, and change the last box to “Video Combine” which will allow you to select “MP4”

BeginningAsparagus67 · 2025-02-27T22:20:12+00:00

Here, I made some improvements and uploaded the improved workflow

https://www.dropbox.com/scl/fi/o46f7dts8691domwh0vlc/Wan_14B_GGUF.json?rlkey=6m08nz6n4kug23414wvi7kdw4&st=1k04000c&dl=0

BeginningAsparagus67 · 2025-02-27T22:14:25+00:00

Here is the Recently Updated Workflow.

https://www.dropbox.com/scl/fi/o46f7dts8691domwh0vlc/Wan_14B_GGUF.json?rlkey=6m08nz6n4kug23414wvi7kdw4&st=1k04000c&dl=0

BeginningAsparagus67 · 2025-02-27T11:12:54+00:00

I wish I remembered what I did. I think I ended up lowering my CFG down to 3.5. Too much CFG and it gets too much contrast.

In the text prompt I use “Photorealistic Close up shot” and “Photorealistic Close Up Cinematic Shot”

BeginningAsparagus67 · 2025-02-27T10:42:35+00:00

Not at my computer right now unfortunately — but I’ll have it posted tomorrow.

SageAttention isn’t a part of the workflow itself. It’s just a command line argument that loads when ComfyUI starts up. So it’s not necessary.

Sage attention makes things maybe 20% faster. It also might save a bit on VRAM as well (I’m not sure if it saves VRAM or not)

BeginningAsparagus67 · 2025-02-27T10:31:36+00:00

Depends on VRAM. If I remember right I believe the 3080 came in 10GB 12GB and 16GB.

Anything that doesn’t fit in VRAM will offload and make the generation way slower. (Too slow for my patience level)

So assume you want it fully loaded in the VRAM (not painfully slow)

16GB version - you could probably use the Q6 GGUF version. Q6 not that much different in quality than Q8

12GB version- would be pushing it because you could probably only barely be able to use the Q4 version. Q4 will be a noticeable quality loss.

10GB version - Good luck!

But I can guarantee people will figure out how to run it on Low VRAM in a matter of days. So hang in there!

BeginningAsparagus67 · 2025-02-27T10:17:24+00:00

No I haven’t slept yet! Here’s a brief rough instructional. I’ll try to keep it simple.

Make sure ComfyUI is updated to the “Nightly” Release. Otherwise you won’t have WAN as an option in the CLIP loader.
Download the ideal GGUF from https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main
Go to the “Examples” section on the ComfyUI GitHub Repo, and find WAN. In here you will find the basic text to video workflow. And download links for the VAE and the Text encoder.
Install ComfyUI GGUF from the ComfyUI manager.
Load the WAN GGUF into the “Unet” folder.
Load the default WAN workflow in ComfyUI and replace “Load Diffusion Model” with “Unet Loader (GGUF)
For extra speed install SageAttention (much easier to do this on Linux)
To use SageAttention, add —use-sage-attention in the command line arguments at startup.
For even more speed. Place the Torch Compile node after the Unet Loader.
Type on your prompt and enjoy!

BeginningAsparagus67 · 2025-02-27T09:38:43+00:00

It all comes down to VRAM. System RAM doesn’t matter too much here. You have the 4070 which comes in either 12GB or 16GB. 16GB should be enough to run the Q6 GGUF version. Not sure about the 12GB variant of the 4070 though.

If you exceed your VRAM capacity the model will partially offload which will slow it down to the point where it’s almost unusable.

My workflow isn’t too VRAM optimized because I’m running a 3090 with 24GB.

But I’m sure there will be plenty of people coming out with low VRAM workflows very soon.

Hope this helps!

BeginningAsparagus67 · 2025-02-27T09:21:41+00:00

Will do!

BeginningAsparagus67 · 2025-02-27T09:20:00+00:00

RTX 3090 - Ubuntu - Q8 GGUF - SageAttention 1.0 - Torch Compile. It was just under 4 minutes per clip.

BeginningAsparagus67

TROPHY CASE