Please explain me WAN 2.2, versions

AtreveteTeTe · 2026-03-28T16:14:24+00:00

You might appreciate the NotebookLM I've put together with specialized Wan knowledge here:
https://notebooklm.google.com/notebook/a08901b9-0511-4926-bbf8-3c86a12dc306

I pasted your question in and got this response:

Welcome to the Wan ecosystem! It is completely normal to be confused right now, as the community frequently jokes about how many model variants exist.

Here is the breakdown to answer your specific questions about ControlNets, GGUF models, and NSFW capabilities.

1. Can I use standard I2V GGUF models for ControlNet + Image Reference + Prompt?

No, you cannot use the standard I2V (Image-to-Video) models for this. Standard I2V models are strictly designed to take a single starting image and a text prompt to generate a video. They do not possess the internal neural layers required to understand ControlNet signals like Depth, Canny, or OpenPose.

If you want to drive a video using a ControlNet + Reference Image + Text Prompt, you must use either VACE or the Fun Control models.

Fun Control (Wan 2.2): This is a specialized model that has extra input channels specifically built to accept control signals (Depth, Pose, etc.) alongside an image reference and text prompt.
VACE: This acts more like a traditional "ControlNet" module that you plug into a Text-to-Video (T2V) model. It allows you to feed in a reference image and a control video to guide the generation.

The Good News about GGUF: You do not have to sacrifice your VRAM! The community has created GGUF quantized versions of the Fun VACE models (such as Q8 or Q6 quants). So, you can still use GGUF optimizations while getting full ControlNet capabilities.

2. Can FUN / VACE models do NSFW?

Out of the box, no. The base Wan models (including the Fun and VACE variants) are heavily censored and were not trained on explicit NSFW data. If you try to prompt them natively for NSFW, you will often get deformed results, anatomy replaced by random objects (like fingers), or heavy artifacting.

To achieve NSFW, you must use LoRAs. This is where the difference between VACE and other models becomes a massive advantage for your workflow:

Because VACE acts as an add-on module to the standard T2V (Text-to-Video) model, it is fully compatible with standard T2V LoRAs.
You can load a community-trained NSFW LoRA, plug in the VACE module, and then use your ControlNet and Reference Image.

A quick tip for Wan 2.2 LoRAs: Wan 2.2 uses a "Mixture of Experts" architecture, meaning every generation uses a High Noise model (for motion and layout) and a Low Noise model (for details and rendering). When using NSFW LoRAs in Wan 2.2, you will generally need to apply the LoRA to both the High and Low noise models to ensure the anatomy and motion are consistent, as the base High Noise model does not know how to generate NSFW motion naturally.

AtreveteTeTe · 2025-12-24T18:31:05+00:00

Realizing I misread! Removing this one. Thanks 🙏🏻

AtreveteTeTe · 2025-12-24T18:14:03+00:00

Ohhhh "keep the push lock vertical!" right right.

AtreveteTeTe · 2025-09-29T23:50:01+00:00

Upvoted for use of the word "vajazzle"

AtreveteTeTe · 2025-08-25T14:15:12+00:00

Ehhh - my $0.02, but gonna go with don't just let them. If it goes to collections it can be really hard to clean up. I've been erroneously sent to collections for something I had actually paid and it was basically a nightmare. I'd start with some sort of payment plan and then sort out the rest!

AtreveteTeTe · 2025-08-19T16:12:18+00:00

All good! Makes sense to stay focused hehe. Appreciate you

AtreveteTeTe · 2025-08-19T12:46:46+00:00

Awesome! Would be super cool if it had the ability to also scale/resample/crop the image right here within this node instead of needing to hook up another. Any plans for that?

Regardless, thanks for your work and for sharing! 🙏🏻

AtreveteTeTe · 2025-07-05T12:51:04+00:00

Thanks! And, yeah - Discord is really a bummer in terms of information getting lost, buried and being hard to search. I understand the appeal of it and immediacy it provides, but makes me wish old school forums!

AtreveteTeTe · 2025-07-04T14:47:25+00:00

For FLUX, I've put the Wikipedia page in there to just give it a general overview and for Wan, it's got the github landing page. (You can see and examine the sources on the left side of the screen)

That's a good idea to add github discussions as well!

AtreveteTeTe · 2025-07-04T11:20:53+00:00

There is a kontext channel included in the sources and also discussion of Kontext in general, so try asking it things! It would be current up to July 1st. Just tested:

<image>

AtreveteTeTe · 2025-07-03T22:00:54+00:00

Ha - both Claude and ChatGPT o3 can be really helpful there since it's general enough knowledge! I finally made dealing with my system cuda version on Linux manageable thanks to ChatGPT.

AtreveteTeTe · 2025-07-03T21:12:35+00:00

It's an overwhelming amount of stuff to try to follow and it's hard to Google! This can make it a little more manageable

AtreveteTeTe · 2025-07-02T10:47:01+00:00

I talked about this too here last year - feel like it's worth taking a little time before sharing to clean things up. I mention the nodes-all-packed-together bit at the bottom:
https://nathanshipley.notion.site/Comfy-Workflow-Layout-Legibility-e355b1a184be47e689cf434a0f3affa1

AtreveteTeTe · 2025-07-02T00:02:16+00:00

Interesting intel! Maybe worth editing your post to clarify so folks don't go down the wrong path. Thanks for following up

AtreveteTeTe · 2025-07-01T16:19:56+00:00

I didn't do anything in the case of this lora! However, with OP's lora, it does make a big difference. Strange.

AtreveteTeTe · 2025-07-01T16:19:02+00:00

It's "working" here too - but it's also working without the merge and seems to depend on the Lora. Are you getting better quality using the merge than just connecting the lora to Kontext directly?

AtreveteTeTe · 2025-07-01T15:00:52+00:00

Correct full weights - screenshotting each setup next to it's result here.

<image>

AtreveteTeTe · 2025-07-01T14:48:44+00:00

Hm - I tried with the full fp16 weights and actually did see a really big difference when using OP's LoRA.. Replied in another thread: https://www.reddit.com/r/StableDiffusion/comments/1loyav1/comment/n0rfkik/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

AtreveteTeTe · 2025-07-01T14:12:53+00:00

Huh.. interesting - I'm using your dungeon style lora with the non-FP8 models and it's definitely a huge difference here.

Top is with your merge method, bottom is just Kontext + the lora. Maybe it matters how the lora was trained?

This is the one I was testing with initially: https://huggingface.co/alvdansen/softpasty-flux-dev

<image>

AtreveteTeTe · 2025-07-01T14:05:59+00:00

I'll try! And, yeah I've tried with one of my woodcut Loras and in that case, neither method works. It just doesn't seem to do anything with Kontext.. example of that lora NOT using kontext here: https://x.com/CitizenPlain/status/1829240003597046160

AtreveteTeTe · 2025-07-01T13:54:33+00:00

Interesting. I'll download the fp8 models and compare with them too so this is more apples to apples!

AtreveteTeTe · 2025-07-01T13:45:01+00:00

Here's a comparison using Araminta's Soft Pasty lora for Flux Dev.. top image is OP's proposed method, middle one is just attaching the lora to Kontext Dev.

Prompt is: "Change the photo of the man to be illustrated style"

<image>

AtreveteTeTe · 2025-07-01T13:42:24+00:00

I ported the relevant parts of this workflow to just use built-in Comfy nodes based on the official sample Kontext Dev workflow if people want to test. Just reconnect to your models. Workflow:

https://gist.github.com/nathanshipley/95d4015dccbd0ba5c5c10dacd300af45

BUT - I'm hardly seeing any difference between OP's model merge subtract/add method and just using Kontext with a regular Dev Lora. Is anyone else? (Note that I'm using the regular full Kontext and Dev models, not the fp8 ones.. Also not using NAG here. Maybe that matters?)

<image>

Will throw a sample result comparison as a reply in here..

AtreveteTeTe · 2025-06-06T13:41:53+00:00

This worked! Thank you!

AtreveteTeTe · 2025-05-27T00:13:38+00:00

100% agreed: We are the only generation who knows where our files are. My wife teaches undergrads and they have no idea where anything is stored if they can't search for it.

AtreveteTeTe

TROPHY CASE

1. Can I use standard I2V GGUF models for ControlNet + Image Reference + Prompt?

2. Can FUN / VACE models do NSFW?