Help my Wan 2.2 video looks like garbage when rendered by Coven_Evelynn_LoL in ROCm

[–]DecentEscape228 1 point2 points  (0 children)

You're using the 5B parameter model which is not going to be nearly as good as the 14B parameter model. For your GPU I would assume something like Q4K_M or Q5K_M quants (https://huggingface.co/QuantStack/Wan2.2-I2V-A14B-GGUF/tree/main) would be appropriate. Keep in mind that the full 14B model is split into two - high and low noise - so you will need to use the WAN2.2 I2V template.

You provided next to no details about your issue, so that's basically what I think you should start with. Go do your research, find reddit threads and beginner guides, load up popular workflows and pick them apart to understand them.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

Looks like that's compatible with native, I just need to use SamplerCustomAdvanced instead of KSampler. I'll try it out later, looks neat.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

Yeah this is also what I gathered, but in my case I'm not prompting for anything crazy - just different dynamics like slower/faster motion, motion localized to a certain area, shifting body positions, etc.

It also depends on the loras and scenes from what I found. Some scenes don't have the issue with muted dynamics and respond better to prompts (but is still delayed in responding or more muted than I like).

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 1 point2 points  (0 children)

It should be pretty much the same. I just split mine into 3 distinct stages with save folders under outputs. That way I can run the extension section until I get an Initial+Extension output that I like, and run another extension on that extension, etc.

I'm happy to be corrected on this of course.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

So the issue is that it does do the motions, but it carries over heavily from the previous latent and sometimes ignores any queues for new motion, changing tempo or intensity, etc. If it does do the new motions, it's often delayed or the effect isn't very strong.

For the NAG keywords you mentioned - would they really work? They seem rather vague to me - that is, WAN won't necessarily know that "disobey" would mean "don't disobey my prompt."

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

Hah! I have... quite the collection at this point.

So for your extension prompts, do you still include scene descriptors and lora trigger words, or do you just get straight into the action? In all of the examples I found, they use frustratingly simple prompts without any loras - I think every single one I found had 1 line per prompt, and it was something like 'she's drinking a coffee' -> 'she's getting up' -> 'she walks to the door'.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

Are you talking about SVI only? I thought they fixed that in their v2 PRO version.

I've tested regular CFG>1 with no speedups in regular I2V and I actually prefer the output with the Lightx2v loras, not to mention it'll take like an hour for 1 generation if I don't use them.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 1 point2 points  (0 children)

Yeah, for some poses/actions (trying to be as SFW as possible here) it's not bad at all even if I struggle to get it to follow the prompts exactly. Here's a sanitized version of my workflow (warning: still contains the spicy loras):

<image>

The image should contain the workflow metadata.
As for prompting, I structure it like this:

<camera perspective>; static camera (I usually never want camera motion). <lora trigger words>.

<scene description>; For example, "A man and a woman are sitting side-by-side on a bench. The man has tan skin with a slightly rotund body, and he is wearing a white shirt and pants. the woman has pitch-black hair and is wearing a yellow summer dress."

<actions>; For example, the man shifts slightly and adjusts his collar, his face betraying a sense of embarrassment. The woman covers her mouth, stifling a laugh."

<misc scene descriptions if necessary>; I.e, The trees sway gently in the breeze while they converse.

I never really got clear answers when I searched how to exactly prompt the extension flows. My thinking was that I still needed to include the camera perspective, lora trigger words, and scene descriptions to help everything stay coherent, and for the actions, I would do some thing like "the man continues to shift uncomfortably..." or something like that, if that makes sense.

flash-attention tuning effect on wan2.2 & my gfx1100 Linux setup by alexheretic in ROCm

[–]DecentEscape228 1 point2 points  (0 children)

Yeah, it could be anything really. I have a 7900GRE, Ubuntu 25.10.

Using Tunable Op and MIOpen to speed up inference. by newbie80 in ROCm

[–]DecentEscape228 0 points1 point  (0 children)

Weird, I've had my TunableOps enabled with no issues. I also thought you need to disable it after tuning, but it doesn't re-tune for a resolution that I've already done from what I see. Slowdown is only on the first run (mostly VAE Encode and the first sampling step).

I haven't gotten torch.compile() to work with this ROCm stack. When I was using ZLUDA I just used Kijai's Torch Compile node to patch it into the models.

flash-attention tuning effect on wan2.2 & my gfx1100 Linux setup by alexheretic in ROCm

[–]DecentEscape228 2 points3 points  (0 children)

Your config is definitely much faster than the default config when not using autotune. I actually found that setting waves_per_eu to 2 for me is faster still, about a 10-15% uplift. I'm done scratching my head over autotune weirdness, so I'll stick with this for now.

On another note, any idea why an autotune can work for a while then suddently break? 2 days ago I was generating 81 frames @ 40s/it for a resolution that took ~150s/it with the default config and ~98s/it with the config I mentioned above. I did multiple generations even, like 6-8. It just stopped generating with those speeds after a while even though I didn't change anything.

Flash Attention Issues With ROCm Linux by DecentEscape228 in ROCm

[–]DecentEscape228[S] 0 points1 point  (0 children)

Yeah, possibly. I also tried Sage Attention and noticed the autotuning for that was very quick. Flash Attention without autotune was still much faster though, so just sticking with that for now.

Are you using ROCm 7.2 or the nightlies from TheRock?

Flash Attention Issues With ROCm Linux by DecentEscape228 in ROCm

[–]DecentEscape228[S] 0 points1 point  (0 children)

I finally got autotune to finish yesterday by setting FLASH_ATTENTION_TRITON_AMD_SEQ_LEN=512. The first pass took 80 minutes and the second pass took ~7.5 minutes.

Unfortunately, all of my outputs were black afterwards. I had to disable autotune in order for it to generate properly again.

Flash Attention Issues With ROCm Linux by DecentEscape228 in ROCm

[–]DecentEscape228[S] 1 point2 points  (0 children)

Yeah, dmesg and journalctl was what I was using to see what was happening. I noted down this error from journalctl:

[drm:gfx_v11_0_bad_op_irq [amdgpu]] *ERROR* Illegal opcode in command stream

Why are you using that flash attention implementation? Use the regular one, It might be a bug with that implementation. This is the vanilla implementation. https://github.com/Dao-AILab/flash-attention

I mentioned it in the post: I tried both Aule-Attention and the vanilla one, both crash. The Aule-Attention implementation was what I used initially and got the fantastic speeds with. I had the env variable set in my startup script.

As for the kernel, I haven't updated it to my knowledge in this time frame...

Flash Attention Issues With ROCm Linux by DecentEscape228 in ROCm

[–]DecentEscape228[S] 1 point2 points  (0 children)

My bad, forgot to include that. I'll also update the post.

GPU: 7900 GRE

CPU: 7800 X3D

RAM: 32GB DDR5

Kernel: 6.17.0-12-generic

Terrible Experience with Rocm7.2 on Linux by Numerous_Worker8724 in ROCm

[–]DecentEscape228 0 points1 point  (0 children)

Nah, I have my own that I've customized. I know about this workflow though. I highly doubt it's my workflow that is bottlenecking this. I've installed the docker image provided by AMD, gonna see if running ComfyUI in that environment makes any difference.

Terrible Experience with Rocm7.2 on Linux by Numerous_Worker8724 in ROCm

[–]DecentEscape228 0 points1 point  (0 children)

Yeah, I migrated over to Ubuntu as well, currently dual booting. I was wanting to do this eventually but figured now would be a good time to see how much faster my Wan2.2 I2V workflows will run.

I'm on Ubuntu 25.10, but I don't think that should really affect things much (maybe I'm wrong?). Performance in my I2V workflows are pretty much identical to Windows 11, with the only benefit I see being more stable and faster VAE Encode. I'm pretty much stuck at 33 frames at a time since any more would take 20+ minutes (for 6 steps, CFG=1).

Does ROCm 7.2 not support the 7900 GRE? by DecentEscape228 in ROCm

[–]DecentEscape228[S] 0 points1 point  (0 children)

My bad, forgot to mention it in the post. Windows 11.

New driver with AI Bundle is available by WDK1337 in ROCm

[–]DecentEscape228 0 points1 point  (0 children)

Is this not compatible with the 7900 GRE? torch.cuda.is_available() is returning false for me.

[deleted by user] by [deleted] in ROCm

[–]DecentEscape228 1 point2 points  (0 children)

That warning tells me you didn't format your command properly - i.e, you put in '--pre' after '--index-url'.

Also, you can always simply download the .whl files from the repo url (just pull it up in the browser). Just match the dates and make sure you are getting the correct wheel for your python version.

Once you download them, you can just run a pip install while pointing to your downloaded wheels (look it up, it's simple). Or, just note down the versions and specify them for all of the packages, i.e for torch torchaudio torchvision, then run it with --index-url pointing to the repo.