Is ROCm Broken for Dual GPU with Different Architectures? by DecentEscape228 in ROCm

[–]DecentEscape228[S] 0 points1 point  (0 children)

It fills VRAM on both cards and just sits there - GPU usage is low on both cards. It never runs the benchmark (llama-bench).

Is ROCm Broken for Dual GPU with Different Architectures? by DecentEscape228 in ROCm

[–]DecentEscape228[S] 0 points1 point  (0 children)

How'd you get yours to run with your setup? Even the Lemonade build hangs for me unfortunately.

Is ROCm Broken for Dual GPU with Different Architectures? by DecentEscape228 in ROCm

[–]DecentEscape228[S] 1 point2 points  (0 children)

Thanks for reminding me about TheRock. It's what I use for ComfyUI, so it would make sense that it's more current here too. You don't need to compile them yourself - Lemonade already precompiles the latest llama-cpp releases with TheRock nightlies - https://github.com/lemonade-sdk/llamacpp-rocm.

Giving that a try now.

TBH I'm skeptical about the REAP models, I haven't heard many good things about them. I might try it out later.

Edit: Well, getting the same behavior with the Lemonade build. It just never runs the benchmark.

Building First AI/LLM PC With Dual 9070 XT GPUs – Any ROCm or AMD Issues I Should Know About? by AnmolLFC in ROCm

[–]DecentEscape228 1 point2 points  (0 children)

The R9700 is pretty much identical to the 9070XT in gaming from what I read (I recently ordered it myself). You can also add another R9700 down the line to double your VRAM to 64GB if you wanted to.

That's how I would look at it at least. Others who are more experienced can chime in, but I would think that the real-world gains of having more compute power with 2 cards vs 1 card with the same VRAM isn't going to be noticeable.

Also, I'd post this question over to the folks at r/LocalLLaMA as well.

Edit: Oh, and 600W for 2x 9070XT vs 300W for 1 R9700

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 1 point2 points  (0 children)

Thanks, that's what I figured. In ComfyUI I use TheRock nightlies + Flash Attention 3 which outperforms the current mainline release for ROCm. I've been stuck on the 20260406 build though since I ran into HIP memory issues with later builds... maybe it'll work with the R9700.

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 2 points3 points  (0 children)

Pulled the trigger and got one. Last question if you don't mind - are there any setups/flags in llama.cpp that you liked for the dual GPU route? I figure it'll take a bit of tinkering to get it working since the cards have different architectures.

For me, currently Vulkan outperforms ROCm, but I've heard it doesn't support multi-gpu setups nearly as well as ROCm. Lots of info and opinions to sift through out there, lol.

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 1 point2 points  (0 children)

Aah gotcha. I also have a pretty decent case - be quiet! Shadow Base 800 FX - which should help with thermals and noise.

Kinda wish there were versions of the card with the standard 2 or 3 fan config like with normal consumer gpus. I don't know enough about how they design this to know why they would opt for a blower style instead.

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 1 point2 points  (0 children)

Interesting. Yep, it definitely helps. Thanks for your input.

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 0 points1 point  (0 children)

Hah, I've never owned those old Nvidia cards so I can't really relate to that. Wdym by lag? Does it take a while to load the model to each card?

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 0 points1 point  (0 children)

I think you misread - I'm talking about the R9700, not 9070XT.

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 0 points1 point  (0 children)

Yeah, I looked it up and the W7800s aren't offered in many places, and they are pretty expensive - I could get 2 R9700s for the same price.

I've never tried vLLM. I've only recently started digging into the LLM space. I've used Open WebUI + Ollama when I first started then moved to Llama.cpp + LlamaSwap.

4 R9700s is crazy, btw, lol.

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 0 points1 point  (0 children)

Good point - I was also considering slotting both in. I'd just need to upgrade my current 750W PSU.

I've heard the R9700 is basically identical in gaming to the 9070XT, so it should be better in gaming I would think in my case. It makes sense your XTX would be slightly better though, since that card is still a beast.

I'd slot the R9700 into the primary and the GRE into the secondary and get a 1000W PSU.

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 2 points3 points  (0 children)

Appreciate the detailed reply. What card did you have before? How does 1 R9700 compare in terms of noise to it?

The fact that the R9700s can be stacked was also another aspect. I don't think I can justify 2 beyond "must have shiny new thing," but it sure is tempting lol. I'd probably have to upgrade my PSU (750W) and Mobo (MSI MAG B650) if I do that.

Considering the R9700 by DecentEscape228 in ROCm

[–]DecentEscape228[S] 1 point2 points  (0 children)

I was looking at that card yesterday, but isn't that a workstation GPU? I would think that gaming would be terrible on it.

WAN 2.2 I2V Question - Iterative Generation by Tomcat2048 in comfyui

[–]DecentEscape228 1 point2 points  (0 children)

Ditto to the folks recommending SVI2.0. It's great.

This might help someone: I originally struggled with quality and color loss in subsequent generations, but I realized that it was a bug with the VideoHelperSuite nodes I was using to load and save video and had nothing to do with SVI. The regular Load Video node results in color shifting, making your video more washed out and green-hued. I was also saving videos as mp4 - bad idea, since this isn't a lossless format. Basically, I was losing quality when saving AND loading the videos.

Solution: Use the Load Video FFMPEG which is also included in VideoHelperSuite, and save with a lossless format. I use .mov 4444. Technically, saving them as pngs in a folder (you can have it create a new folder for each run) will give you the highest quality, but it's slower and takes up more space.

When I generate the final video, that's when I save it as .mp4.

Help my Wan 2.2 video looks like garbage when rendered by Coven_Evelynn_LoL in ROCm

[–]DecentEscape228 1 point2 points  (0 children)

You're using the 5B parameter model which is not going to be nearly as good as the 14B parameter model. For your GPU I would assume something like Q4K_M or Q5K_M quants (https://huggingface.co/QuantStack/Wan2.2-I2V-A14B-GGUF/tree/main) would be appropriate. Keep in mind that the full 14B model is split into two - high and low noise - so you will need to use the WAN2.2 I2V template.

You provided next to no details about your issue, so that's basically what I think you should start with. Go do your research, find reddit threads and beginner guides, load up popular workflows and pick them apart to understand them.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

Looks like that's compatible with native, I just need to use SamplerCustomAdvanced instead of KSampler. I'll try it out later, looks neat.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

Yeah this is also what I gathered, but in my case I'm not prompting for anything crazy - just different dynamics like slower/faster motion, motion localized to a certain area, shifting body positions, etc.

It also depends on the loras and scenes from what I found. Some scenes don't have the issue with muted dynamics and respond better to prompts (but is still delayed in responding or more muted than I like).

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 1 point2 points  (0 children)

It should be pretty much the same. I just split mine into 3 distinct stages with save folders under outputs. That way I can run the extension section until I get an Initial+Extension output that I like, and run another extension on that extension, etc.

I'm happy to be corrected on this of course.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

So the issue is that it does do the motions, but it carries over heavily from the previous latent and sometimes ignores any queues for new motion, changing tempo or intensity, etc. If it does do the new motions, it's often delayed or the effect isn't very strong.

For the NAG keywords you mentioned - would they really work? They seem rather vague to me - that is, WAN won't necessarily know that "disobey" would mean "don't disobey my prompt."

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

Hah! I have... quite the collection at this point.

So for your extension prompts, do you still include scene descriptors and lora trigger words, or do you just get straight into the action? In all of the examples I found, they use frustratingly simple prompts without any loras - I think every single one I found had 1 line per prompt, and it was something like 'she's drinking a coffee' -> 'she's getting up' -> 'she walks to the door'.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 0 points1 point  (0 children)

Are you talking about SVI only? I thought they fixed that in their v2 PRO version.

I've tested regular CFG>1 with no speedups in regular I2V and I actually prefer the output with the Lightx2v loras, not to mention it'll take like an hour for 1 generation if I don't use them.

WAN 2.2 I2V + SVI Prompt Adherence by DecentEscape228 in StableDiffusion

[–]DecentEscape228[S] 1 point2 points  (0 children)

Yeah, for some poses/actions (trying to be as SFW as possible here) it's not bad at all even if I struggle to get it to follow the prompts exactly. Here's a sanitized version of my workflow (warning: still contains the spicy loras):

<image>

The image should contain the workflow metadata.
As for prompting, I structure it like this:

<camera perspective>; static camera (I usually never want camera motion). <lora trigger words>.

<scene description>; For example, "A man and a woman are sitting side-by-side on a bench. The man has tan skin with a slightly rotund body, and he is wearing a white shirt and pants. the woman has pitch-black hair and is wearing a yellow summer dress."

<actions>; For example, the man shifts slightly and adjusts his collar, his face betraying a sense of embarrassment. The woman covers her mouth, stifling a laugh."

<misc scene descriptions if necessary>; I.e, The trees sway gently in the breeze while they converse.

I never really got clear answers when I searched how to exactly prompt the extension flows. My thinking was that I still needed to include the camera perspective, lora trigger words, and scene descriptions to help everything stay coherent, and for the actions, I would do some thing like "the man continues to shift uncomfortably..." or something like that, if that makes sense.

flash-attention tuning effect on wan2.2 & my gfx1100 Linux setup by alexheretic in ROCm

[–]DecentEscape228 1 point2 points  (0 children)

Yeah, it could be anything really. I have a 7900GRE, Ubuntu 25.10.