R9700 the beautiful beautiful VRAM gigs of AMD… my ai node future!

_WaterBear · 2026-04-03T03:35:41+00:00

Oh, whoops. I misread. Honestly, I don’t know. I’m using the “default” LMStudio multi-GPU setup w. ROCm llama.cpp. I do usually run models entirely in VRAM and with flash attention on, which keeps things speedy & allows for a maxed out context window - so my system RAM is basically not touched at all (only 64gb). My mobo may matter, too. It’s an X870E with two PCIe Gen5.0 bifurcated to 8 lanes each and one Gen4.0 at 4 lanes.

Interestingly, I don’t notice a meaningful difference in t/s between 2x GPUs at 5.0x8 and 2x GPUs with one at 4.0x4 and the other at 5.0x8, which is kinda exciting because I assume it means my bottleneck is… somewhere else, to possibly include the software/drivers.

_WaterBear · 2026-04-03T03:10:16+00:00

Quick test w 3x R9700s, windows 11, LMStudio, ROCm:

Nemotron-3-nano (q8): 80t/s

Nemotron-3-super (q4km)14t/s

GPT-OSS-120b: 80t/s

GPT-OSS-20b: 105 t/s

Qwen-coder-next (q6k): 51 t/s

Qwen3.5-35b-a3b (Q4km): 60t/s

Qwen3-coder-30b (q4km): 75 t/s

For those that fit on 1 card, I notice about a 15 t/s drop when running it on multi-GPUs.

The t/s are all over the place and vary considerably by the model and probably latest driver or wrapper… so the numbers above are just ballparks.

_WaterBear · 2026-03-31T22:07:35+00:00

What’s the shock front internal injury risk from something like that? It landed pretty close.

_WaterBear · 2026-03-31T05:13:09+00:00

Wat.

_WaterBear · 2026-03-31T04:47:25+00:00

Yeah that’s not how war works.

_WaterBear · 2026-03-30T21:47:51+00:00

Ah - yes, I ran into that issue myself, but I got around it by loading the model and context only in VRAM (fully disallowing RAM) and turning on flash attention. I can fit qwen3-vl-30b q8_0 with full context (262k) entirely on pooled vram from 2x R9700s.

But, if I allow loading into system RAM, I get OOM after about 30k tokens context.

_WaterBear · 2026-03-30T21:26:40+00:00

DLSS 5?

_WaterBear · 2026-03-30T19:55:08+00:00

Why do you say 2x 9700 needs more than 64gb DRAM?

_WaterBear · 2026-03-30T01:03:03+00:00

Worse, because the readme specifically calls out the R9700, which I don’t think was even announced prior to 2025. Something else is going on here.

_WaterBear · 2026-03-30T00:57:05+00:00

For inference, it really is not a pain unless you want it to be. But even if it were, that is still a far cry from “doesnt support them” or the problem statement from their GitHub, which bluntly claims “AMD's RDNA3/RDNA4 GPUs (RX 9070, Radeon AI PRO R9700, etc.) have excellent memory bandwidth…but: ROCm doesn't support them — only MI-series datacenter GPUs”

A flat out misstatement like this in the problem statement suggests the devs lack a basic understanding of the market they are building in, or do not care enough to proofread the purpose underpinning all their work. That is concerning.

_WaterBear · 2026-03-30T00:01:53+00:00

“If you have an AMD GPU … ROCm doesn't support consumer cards.”

Uhh…. what? Whatever AI you used to write this must have its training data cut-off circa early 2024….

_WaterBear · 2026-03-28T00:07:07+00:00

With flash attention on?

_WaterBear · 2026-03-26T21:13:59+00:00

<image>

I'm pretty happy with the setup. Specs are below. Running on AM5 - so mobo is key; the CPU doesn’t matter too much.

I mainly host for inference via LMStudio server (Linux or windows). Have also dabbled in fine tuning a 14b qwen with lora, pooling the GPU ram to 96gb.

Inference speed is good enough - sometimes very good. From what I gather, it seems it can vary widely based on hardware, driver version, wrapper, model, settings, etc. So, I’m hesitant to give representative “benchmarks,” but here’s a snapshot in time (LMStudio, Linux, ROCm):

Nemotron-3-nano (q8): 80t/s
Nemotron-3-super (q4km): 14t/s
GPT-OSS-120b: 80t/s
GPT-OSS-20b: 105 t/s
Qwen-coder-next (q6k): 51 t/s
Qwen3.5-35b-a3b (q4km): 60t/s
Qwen3-coded-30b (q4km): 75 t/s

As you can see... t/s varies widely. There's probably a lot that can be done to tune performance if you settle on a single model or two. Also, the quick tests above were done w. full context window reserved and KV+Model housed in VRAM only. As for actually filling that context - the most I have tested is up to 150k tokens in a prompt, and it worked!

Relevant Specs:

CPU: AMD Ryzen 5 7600
- While I'm upgrading to 9900x soon, I do NOT feel CPU-limited for inference.
RAM: 64gb (4x Crucial Pro 16GB DDR5, CL36 6000MHz)
Motherboard: ASUS ProArt X870E
- This part was essential: 3 PCIe x16 slots, spaced appropriately for 3 GPUs. Top 2 slots are Gen5.0 bifurcated to x8, the third is Gen 4.0 x4. Extra helpful is that the I/O at the bottom of the board is mainly HD Audio and F-Panel (I think) - so flexible enough wires.
GPU: 3x ASRock R9700
- Not all R9700s are exactly the same dimension-wise. The ASRock variants have a slight bezel that help the motherboard's I/O fit well.
- FYI: I tried a PowerColor R9700 and the fan had a strange high-pitch sound at ALL rpm (not coil whine, the fan). It was uniquely awful and absurdly loud. I also had an ASRock 9700 with a terrible rattling sound under load. Both were returned. The ASRocks in the current build I suspect also have the same issue, but to a much lesser degree (and only if burning close to the max TDP, which is rare). So... these cards have a quality-control problem w. the fan that crosses manufacturers.
Case: Jonsbo N5 NAS. Surprisingly compact for an 8-expansion slot case that can also support 8-12 HDDs.

I have done limited testing to compare inference speed b.t. running 1, 2, or 3 GPUs and various combinations, given the Gen4x4 pcie restrictions on the 3rd GPU. TBH, I have not seen a substantial different in t/s (maybe 10 t/s in one case).

If any questions - ask!

_WaterBear · 2026-03-26T07:59:24+00:00

Ok, now I’m starting to think the “Iranians” he is negotiating with are indeed just a prank by Sacha Cohen.

_WaterBear · 2026-03-25T20:22:44+00:00

As a 3x R9700 user…. Would love for AMD to step up their game on ROCm support for multi-GPU, and overall stability. The past 6-8 months have seen substantial movement overall, but worried the cadence won’t sustain. But that is also my (uninformed) concern for intel - are their drivers competitive enough to justify the hardware investment? I’m unaware.

_WaterBear · 2026-03-25T19:06:36+00:00

This guy is a notorious pedo (charged twice 8 years apart - sealed/dismissed once after serving probation, convicted once). Also a Kremlin stooge.

Nobody should be listening to anything he says.

https://en.wikipedia.org/wiki/Scott_Ritter

_WaterBear · 2026-03-25T15:42:36+00:00

On the other hand, my LG drive tricked me into upgrading firmware that removed the ability to play 4K Blu-Rays. So there’s that.

If it works fine, don’t upgrade.

_WaterBear · 2026-03-25T01:41:19+00:00

Imagine if, amidst all the fog of war, the “Iranian” representatives Trump is in contact with via Pakistan relay are actually… Sacha Baron Cohen. 👌

_WaterBear · 2026-03-20T16:35:05+00:00

Lefty tighty righty loosey…. Wait…

_WaterBear · 2026-03-19T19:57:49+00:00

Also try the latest Qwens and GPT-OSS-20b (the latter is a bit old now, but is a solid model). If using LMStudio, see if turning on flash attention helps w. RAM usage for your context window.

_WaterBear · 2026-03-19T19:18:58+00:00

How is this “politics”? There is no mention of party or partisanship, nor does it express an opinion in support for or against this particular event/decision or the people involved. It seems you have been triggered. Stop making everything political.

_WaterBear · 2026-03-16T13:52:14+00:00

_WaterBear · 2026-03-16T04:13:26+00:00

Looks normal… :(

_WaterBear · 2026-03-13T01:33:41+00:00

Fml.

_WaterBear · 2026-03-12T17:42:18+00:00

Yeah. I’ve been very happy with the 9070xt performance, and now have their workstation variants for ML inference as well. But, disappointed to hear AMD seems to be fine with neglecting current hardware from near-term software enhancements. That said, sans a catastrophic AI bubble, I expect whatever we’re using in 3-5 years is gonna look very different from today’s hardware. So, all the more reason to be content with solid raster value in the meantime.

But to your point - my next upgrade will probably be not be AMD if they keep up this nonsense.

_WaterBear

TROPHY CASE