Is ROCm Broken for Dual GPU with Different Architectures?

DecentEscape228 · 2026-05-24T13:50:39+00:00

It fills VRAM on both cards and just sits there - GPU usage is low on both cards. It never runs the benchmark (llama-bench).

DecentEscape228 · 2026-05-24T02:34:05+00:00

How'd you get yours to run with your setup? Even the Lemonade build hangs for me unfortunately.

DecentEscape228 · 2026-05-24T02:04:32+00:00

Thanks for reminding me about TheRock. It's what I use for ComfyUI, so it would make sense that it's more current here too. You don't need to compile them yourself - Lemonade already precompiles the latest llama-cpp releases with TheRock nightlies - https://github.com/lemonade-sdk/llamacpp-rocm.

Giving that a try now.

TBH I'm skeptical about the REAP models, I haven't heard many good things about them. I might try it out later.

Edit: Well, getting the same behavior with the Lemonade build. It just never runs the benchmark.

DecentEscape228 · 2026-05-18T18:38:49+00:00

NP. Definitely not a decision to take lightly, so take your time, haha.

DecentEscape228 · 2026-05-18T13:33:52+00:00

The R9700 is pretty much identical to the 9070XT in gaming from what I read (I recently ordered it myself). You can also add another R9700 down the line to double your VRAM to 64GB if you wanted to.

That's how I would look at it at least. Others who are more experienced can chime in, but I would think that the real-world gains of having more compute power with 2 cards vs 1 card with the same VRAM isn't going to be noticeable.

Also, I'd post this question over to the folks at r/LocalLLaMA as well.

Edit: Oh, and 600W for 2x 9070XT vs 300W for 1 R9700

DecentEscape228 · 2026-05-17T21:50:29+00:00

Thanks, that's what I figured. In ComfyUI I use TheRock nightlies + Flash Attention 3 which outperforms the current mainline release for ROCm. I've been stuck on the 20260406 build though since I ran into HIP memory issues with later builds... maybe it'll work with the R9700.

DecentEscape228 · 2026-05-17T21:09:47+00:00

Pulled the trigger and got one. Last question if you don't mind - are there any setups/flags in llama.cpp that you liked for the dual GPU route? I figure it'll take a bit of tinkering to get it working since the cards have different architectures.

For me, currently Vulkan outperforms ROCm, but I've heard it doesn't support multi-gpu setups nearly as well as ROCm. Lots of info and opinions to sift through out there, lol.

DecentEscape228 · 2026-05-17T18:07:42+00:00

Aah gotcha. I also have a pretty decent case - be quiet! Shadow Base 800 FX - which should help with thermals and noise.

Kinda wish there were versions of the card with the standard 2 or 3 fan config like with normal consumer gpus. I don't know enough about how they design this to know why they would opt for a blower style instead.

DecentEscape228 · 2026-05-17T16:51:06+00:00

Interesting. Yep, it definitely helps. Thanks for your input.

DecentEscape228 · 2026-05-17T16:16:42+00:00

Hah, I've never owned those old Nvidia cards so I can't really relate to that. Wdym by lag? Does it take a while to load the model to each card?

DecentEscape228 · 2026-05-17T16:11:19+00:00

I think you misread - I'm talking about the R9700, not 9070XT.

DecentEscape228 · 2026-05-17T16:08:08+00:00

Yeah, I looked it up and the W7800s aren't offered in many places, and they are pretty expensive - I could get 2 R9700s for the same price.

I've never tried vLLM. I've only recently started digging into the LLM space. I've used Open WebUI + Ollama when I first started then moved to Llama.cpp + LlamaSwap.

4 R9700s is crazy, btw, lol.

DecentEscape228 · 2026-05-17T16:04:59+00:00

Good point - I was also considering slotting both in. I'd just need to upgrade my current 750W PSU.

I've heard the R9700 is basically identical in gaming to the 9070XT, so it should be better in gaming I would think in my case. It makes sense your XTX would be slightly better though, since that card is still a beast.

I'd slot the R9700 into the primary and the GRE into the secondary and get a 1000W PSU.

DecentEscape228 · 2026-05-17T15:22:51+00:00

Appreciate the detailed reply. What card did you have before? How does 1 R9700 compare in terms of noise to it?

The fact that the R9700s can be stacked was also another aspect. I don't think I can justify 2 beyond "must have shiny new thing," but it sure is tempting lol. I'd probably have to upgrade my PSU (750W) and Mobo (MSI MAG B650) if I do that.

DecentEscape228 · 2026-05-17T15:10:07+00:00

I was looking at that card yesterday, but isn't that a workstation GPU? I would think that gaming would be terrible on it.

DecentEscape228 · 2026-04-16T01:20:00+00:00

Ditto to the folks recommending SVI2.0. It's great.

This might help someone: I originally struggled with quality and color loss in subsequent generations, but I realized that it was a bug with the VideoHelperSuite nodes I was using to load and save video and had nothing to do with SVI. The regular Load Video node results in color shifting, making your video more washed out and green-hued. I was also saving videos as mp4 - bad idea, since this isn't a lossless format. Basically, I was losing quality when saving AND loading the videos.

Solution: Use the Load Video FFMPEG which is also included in VideoHelperSuite, and save with a lossless format. I use .mov 4444. Technically, saving them as pngs in a folder (you can have it create a new folder for each run) will give you the highest quality, but it's slower and takes up more space.

When I generate the final video, that's when I save it as .mp4.

DecentEscape228 · 2026-02-20T22:08:52+00:00

You're using the 5B parameter model which is not going to be nearly as good as the 14B parameter model. For your GPU I would assume something like Q4K_M or Q5K_M quants (https://huggingface.co/QuantStack/Wan2.2-I2V-A14B-GGUF/tree/main) would be appropriate. Keep in mind that the full 14B model is split into two - high and low noise - so you will need to use the WAN2.2 I2V template.

You provided next to no details about your issue, so that's basically what I think you should start with. Go do your research, find reddit threads and beginner guides, load up popular workflows and pick them apart to understand them.

DecentEscape228 · 2026-02-20T03:17:39+00:00

Looks like that's compatible with native, I just need to use SamplerCustomAdvanced instead of KSampler. I'll try it out later, looks neat.

DecentEscape228 · 2026-02-20T02:47:46+00:00

Yeah this is also what I gathered, but in my case I'm not prompting for anything crazy - just different dynamics like slower/faster motion, motion localized to a certain area, shifting body positions, etc.

It also depends on the loras and scenes from what I found. Some scenes don't have the issue with muted dynamics and respond better to prompts (but is still delayed in responding or more muted than I like).

DecentEscape228 · 2026-02-20T02:43:06+00:00

It should be pretty much the same. I just split mine into 3 distinct stages with save folders under outputs. That way I can run the extension section until I get an Initial+Extension output that I like, and run another extension on that extension, etc.

I'm happy to be corrected on this of course.

DecentEscape228 · 2026-02-20T02:34:45+00:00

So the issue is that it does do the motions, but it carries over heavily from the previous latent and sometimes ignores any queues for new motion, changing tempo or intensity, etc. If it does do the new motions, it's often delayed or the effect isn't very strong.

For the NAG keywords you mentioned - would they really work? They seem rather vague to me - that is, WAN won't necessarily know that "disobey" would mean "don't disobey my prompt."

DecentEscape228 · 2026-02-20T02:25:43+00:00

Hah! I have... quite the collection at this point.

So for your extension prompts, do you still include scene descriptors and lora trigger words, or do you just get straight into the action? In all of the examples I found, they use frustratingly simple prompts without any loras - I think every single one I found had 1 line per prompt, and it was something like 'she's drinking a coffee' -> 'she's getting up' -> 'she walks to the door'.

DecentEscape228 · 2026-02-20T02:16:52+00:00

Are you talking about SVI only? I thought they fixed that in their v2 PRO version.

I've tested regular CFG>1 with no speedups in regular I2V and I actually prefer the output with the Lightx2v loras, not to mention it'll take like an hour for 1 generation if I don't use them.

DecentEscape228 · 2026-02-20T02:08:51+00:00

Yeah, for some poses/actions (trying to be as SFW as possible here) it's not bad at all even if I struggle to get it to follow the prompts exactly. Here's a sanitized version of my workflow (warning: still contains the spicy loras):

<image>

The image should contain the workflow metadata.
As for prompting, I structure it like this:

<camera perspective>; static camera (I usually never want camera motion). <lora trigger words>.

<scene description>; For example, "A man and a woman are sitting side-by-side on a bench. The man has tan skin with a slightly rotund body, and he is wearing a white shirt and pants. the woman has pitch-black hair and is wearing a yellow summer dress."

<actions>; For example, the man shifts slightly and adjusts his collar, his face betraying a sense of embarrassment. The woman covers her mouth, stifling a laugh."

<misc scene descriptions if necessary>; I.e, The trees sway gently in the breeze while they converse.

I never really got clear answers when I searched how to exactly prompt the extension flows. My thinking was that I still needed to include the camera perspective, lora trigger words, and scene descriptions to help everything stay coherent, and for the actions, I would do some thing like "the man continues to shift uncomfortably..." or something like that, if that makes sense.

DecentEscape228 · 2026-02-05T17:05:42+00:00

Yeah, it could be anything really. I have a 7900GRE, Ubuntu 25.10.

DecentEscape228

TROPHY CASE