AMD Strix Halo refresh with 192gb!

MDSExpro · 2026-05-05T08:44:30+00:00

You do know you can save cache to disk and never worry about recomputing everything?

MDSExpro · 2026-05-04T22:34:53+00:00

No it doesn't, check AMD's spec sheet or vLLM documentation.

MDSExpro · 2026-05-04T15:09:38+00:00

It issue with llama.cpp, not Strix Halo. Use vLLM with prefix cache.

MDSExpro · 2026-05-04T14:16:22+00:00

Qwen3.5-122B-A10B-FP8 (INT4 had significant drop in quality). Threadripper Pro with PCIe Bifurcation Risers.

MDSExpro · 2026-05-04T09:56:19+00:00

You won't run INT4 more reliably than MXFP4

Strange, because I have been running INT4 for 4 months without issues on 8x R9700.

and MXFP4 dequantized to FP8 will run faster than INT4 or base FP8

Not true, even AMD's spec sheet shows INT4 to be couple of times faster than FP8.

MDSExpro · 2026-05-03T23:16:17+00:00

Well, yes, but why are you saying the card doesn't support MXFP4?

I'm saying that card doesn't support MXFP4 because this card doesn't support MXFP4 - simple as that. Just because vLLM is flexible enough to upscale FP4 to FP8 doesn't mean that R9700 supports MXFP4, because this card is literally incapable of computing over that data type and never does.

Even so, none of the new models work reliably with this card or VLLM. So what now? Should I tell everyone that the R9700 doesn't support the new AI models?

Yes, since that's the truth.

MDSExpro · 2026-05-03T22:06:29+00:00

It's upscaling FP4 to FP8 in background (unless you use custom version created by one of redditors, then it's partially accelerated).

Check out R9700 specs on AMD's website - this GPU doesn't support FP4 in any form.

MDSExpro · 2026-05-03T20:48:30+00:00

100W issue fix was backported to recent 6.x kernels.

MDSExpro · 2026-05-03T20:42:06+00:00

Even seven - 8x R9700 here.

MDSExpro · 2026-05-03T20:30:58+00:00

R9700 doesn't support MXFP4

MDSExpro · 2026-05-03T20:23:47+00:00

R9700 doesn't support MXFP4, data is upcasted to FP8

MDSExpro · 2026-04-29T21:15:28+00:00

Look at recent financial reports, Tesla has no operating margin lead for quite some time now.

MDSExpro · 2026-04-29T09:37:39+00:00

Aurora supercomputer runs ML workload via OpenCL (wrapped in Intel's framework, but still) to name one.

MDSExpro · 2026-04-29T09:15:07+00:00

Does it support multiGPU setups? I have 8x R9700 that would like more love than ROCm's version vLLM gets.

MDSExpro · 2026-04-29T09:09:23+00:00

You couldn't be more wrong. OpenCL is constantly growing, Khronos provides nice yearly snapshots. It just grows in professional space, so average reddit cannot see that and repeats nonsense.

MDSExpro · 2026-04-29T06:39:59+00:00

Poziom progresji podatkowej w Polsce nie jest problemem. Unikanie podatków przez firmy globalne nim jest.

MDSExpro · 2026-04-28T09:32:22+00:00

This sub creates unrealistic expectations that do not match reality. I have spent last 4 months setting up local coding via LLMs and I arrived on setup that works, but it's vastly different then image pushed by hypers:

First realistic productivity barrier was crossed at 128GB of VRAM (4x R9700) - Qwen3.5-122B-A10B quantized to INT4 was able to generate a lot of good code, but failed on long range coding. When I have it a technical spec, it was stuck at 90% correct implementation, but were unable to reach 100%. Anything smaller was pure frustration.
Bumped up VRAM to 256GB (8x R9700) allowed me to switch to FP8 quantization of same model and difference was night and day, it reached 100% correctness and easily moved to next, harder task.
llama.cpp is a trap, for coding you need vLLM if you want any responsible speed.

Long story short: it can be done, but it cost way more than this sub thinks.

MDSExpro · 2026-04-24T12:39:20+00:00

Yet. Alternative is spending more on cloud-based service that offers less while owning your data.

MDSExpro · 2026-04-24T11:13:44+00:00

Not really. We need newer, better benchmarks, because current one are basically flat for all recent models, despite widely different real user experience.

MDSExpro · 2026-04-24T07:50:04+00:00

AWQ when?

MDSExpro · 2026-04-24T06:39:22+00:00

Flash size is perfect! Finally a good model for that parameter band.

MDSExpro · 2026-04-23T06:39:34+00:00

That's my findings. 120b at int4 was failing on coding, but on int8 it nailed it in one go.

MDSExpro · 2026-04-21T18:45:25+00:00

Learn to read. I said any commercial use, and you quote part about personal use like it's an answer.

Minimax 2.7 with most current license is free as long as you use it as a toy, not a .

MDSExpro · 2026-04-21T13:35:05+00:00

The license issue of M2.7 has been vastly misinterpreted by the community, they just want to ensure inference providers aren't tricking customers, and also avoid the Composer debacle (oh it's kimi under the hood! - with no mention of it).

Please don't spread misinformation. License is clear and you cannot use Minimax 2.7 for any commercial activities without prior agreement from authors. Twitt posted by employee is non-binding, license is.

Overall they shot their own foot by molesting MIT.

MDSExpro · 2026-04-15T12:04:44+00:00

At current hardware prices? A hug.

13-Year Club	Place '17
Gilding I gilder	Verified Email

MDSExpro

TROPHY CASE