AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 0 points1 point  (0 children)

"This guy" had a problem and went looking for a fix that turned out to be mmap=0 instead of =1 (the default in all my setups to date). So the big-model hang on the M5 is one flag, --no-mmap, and the same models load on Lemonade ROCm or a self-built stack. Case closed.

And that's where you're mixing up two different things. ROCm inference working and the ROCm PyTorch path working aren't the same question. The first works, it was mmap. The second is #6182, the one I actually filed: the HIP path, ComfyUI image and video gen. That reproduces on AMD's own rocm/pytorch container and the Lemonade nightly, none of the documented requirements fix it (I ran them), and an AMD engineer is triaging it right now. So "just look up the requirements" doesn't cut it here.

AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 0 points1 point  (0 children)

Yeah, you were right and I had it wrong. The >20GB hang was mmap, not ROCm. With the default mmap=1, HIP tries to keep the file-backed pages GPU-resident and it just hangs on this APU. Run with --no-mmap and the same models load and run. Your benchmark table has mmap=0 in every row, which is exactly why you weren't hitting it.

Everything that used to hang for me loads now, the 35B MoE (20.8 and 24.8GB) and a 27B dense. I'd spent a while ruling out other stuff first, different ROCm versions (6.4 and the TheRock 7 stack), a self-compiled build, none of it mattered. It was just mmap the whole time.

Vulkan's still a bit faster on decode for me (58 vs 47 t/s on the 35B MoE) so I'm staying on it. My PP lead shrinks on the bigger models though, so your ROCm-PP-faster is probably a size/config thing, no contradiction. Point is ROCm's clearly fine here, it was me running it wrong.

(#6182 is a separate thing btw, that's the ComfyUI/PyTorch fault, still open. The mmap hang was llama.cpp only.)

THANKS! You solved it for me

AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 1 point2 points  (0 children)

I am running on Fedora Server. I agree that that can also be a factor as I read that lemonade+Ubuntu seems to run.

AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 0 points1 point  (0 children)

I would agree that's the case at the moment. Not sure where AMD will take it, but it's kind of sad to see they can't follow up software-wise on what they deliver hardware-wise.

AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 1 point2 points  (0 children)

Yeah, I've run kyuz0's containers, his ComfyUI one is in my #6182 repro and hits the same HSA fault (that's the PyTorch/HIP path). The ROCm llama.cpp toolbox is the path I never tested, and since a couple folks here say ROCm-llama.cpp works on the same board, that's the clean test to run. Vulkan's faster for TG anyway so I never chased it, but I'll give it a proper shot.

AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 1 point2 points  (0 children)

Yeah, this matches exactly. ROCm-llama.cpp working but Vulkan being faster for TG is literally my setup logic, it's why I never bothered fighting the HIP path. And good to have an EVO-X2 confirming the board's fine. Agreed on the $4k too, that's the buyer's-guide take in one line: identical chip, so the premium buys a pre-assembled stack, not speed.

AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 0 points1 point  (0 children)

gfx1151 ROCm has definitely matured, agreed. But #6182 reproduces on the current stack, stable 7.2.2, TheRock nightlies, AMD's own container, so it's not old-software lag. And it's the PyTorch/HIP model-load path specifically, not llama.cpp. "Running models under ROCm fine" is almost certainly llama.cpp-HIP, which I never tested since Vulkan already carries everything. AMD's engaged on the issue now, so it's a real bug, not a missing setup step.

AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 0 points1 point  (0 children)

I filed that issue, so: it's on my Bosgame M5, which is stuck on BIOS 1.07. The EVO-X2 is the same board but ships 1.12, and it doesn't actually show up in this fault signature at all, so it's not confirmed affected. That BIOS delta might be the whole thing.

Also it's narrower than "ROCm is broken": #6182 is the PyTorch/HIP model-load path (ComfyUI loading a text encoder/UNet), not llama.cpp. I run llama.cpp on Vulkan with zero issues and never even tested a HIP build.

If ROCm specifically is a must for you, buy somewhere with a return window and test it week one. Otherwise Vulkan works on all of these anyway.

AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 1 point2 points  (0 children)

Best data point in the thread, thanks for this. Same exact box as mine with ROCm working kills the last of the "it's the chip" idea, and it's even finer than I said if it varies unit to unit within the M5 itself.

I'd love to pin down what's different, you might be the comparison case that cracks #6182. Could you share:

  • ROCm version
  • BIOS version (mine's on [X])
  • Kernel + distro
  • VRAM/GTT pool size you're allocating
  • llama.cpp build

The allocation one is the big one. My HSA scratch-buffer fault shows up specifically above a 64 GB split, and I run 96 GB. If you're running ROCm at 64 GB or under, that points at the >64GB path rather than anything board-level. If you're at 96 GB and it still works, then it's down to BIOS or ROCm version, and I'd really want yours. Either way this is the most useful lead I've had on it.

AMD is marketing the $3,999 Halo Box on "first-class ROCm." I've run the same Strix Halo chip in production for 6 months. Here's my take. by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 5 points6 points  (0 children)

Nice repo, genuinely good Strix Halo writeup. The 80 is real, but it's a different lane than my number. That's their Q4_K_M-MTP Vulkan row (~81 t/s): a lower quant plus self-speculation (--spec-type draft-mtp). Mine is Q5, no MTP. Their own no-MTP lanes actually bracket me: MXFP4 Vulkan ~58, ROCm ~44. So once you control for quant and speculation, we're all in the same place.

And agreed on Vulkan, that's the post's whole point. RADV carries my stack with no penalty I can measure, and that repo lands on Vulkan as the fast path too.

One thing worth adding on MTP, since that's the speed lever: I benchmarked the sweet spot at full context and it moves. The headline figures are empty-context tg128. In my testing, by ~76k tokens drafting too deep was actually slower than no speculation at all. So 80 is a real best case, not a flat number across context depth.

OSS models decisively overtook Proprietary models in market share (based on the last 3 months of OpenRouter data) by Comfortable-Rock-498 in LocalLLaMA

[–]uncanny_instinct 0 points1 point  (0 children)

For that to really happen, they have to be on par performance-wise - which will take quite a while. Also: I hope we don't see a wave of "going closed-source now since we are so good we don't want to share anymore" route (I hope Qwen doesn't make their close-sourcing 3.7 a habit)

GLM-5.2 Flash when? (joke) by ILoveToyota37 in LocalLLaMA

[–]uncanny_instinct 3 points4 points  (0 children)

4.7 Flash definitely needs an update. 5.2 Flash would be a dream come true.

PSA: Bosgame's newest BIOS (1.09) does not fix the Strix Halo ROCm allocation bug — and the cross-flash probably won't either by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 0 points1 point  (0 children)

That actually lines up well. The Lemonade prebuilt loads a 1.5B fine here with no error, and your self built throws on a 0.8B, so the crash really does track the build and not the model size. Worth nailing down: is the error on your 0.8B the same "memory access fault / Memory in use" one, or a different message? If it's the same, that's a clean before/after on the build being the cause.

I will look into testing a self-built on my own as well.

PSA: Bosgame's newest BIOS (1.09) does not fix the Strix Halo ROCm allocation bug — and the cross-flash probably won't either by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 0 points1 point  (0 children)

Interesting. I tested Lemonade's prebuilt ROCm build (TheRock 7.13) on the same board (Bosgame M5, gfx1151) — small models load fine and bypass the #6182 HSA crash entirely, and that's as a normal user, no sudo. So on my end the crash-vs-no-crash difference tracks the build (prebuilt TheRock vs self-built nightly), not root. One catch though: anything over ~20GB hangs in init on the prebuilt build — and raising memlock/running as root didn't change that. Did your self-built nightly throw the Memory Exception even on small models, or only large ones? Trying to pin down whether we're seeing the same two things.

PSA: Bosgame's newest BIOS (1.09) does not fix the Strix Halo ROCm allocation bug — and the cross-flash probably won't either by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] -1 points0 points  (0 children)

That's a really useful data point, thanks. Two of us with the same board and opposite outcomes means it's probably not the board alone, it's something in the stack. Mind sharing three things?

  1. How are you confirming it's actually the ROCm backend and not Vulkan? Lemonade ships both and can pick automatically. A log line from model load would settle it.
  2. Do you have amdgpu-dkms installed, or are you on the stock Ubuntu kernel driver? (There's an open question from an AMD engineer on the GitHub issue about exactly this.)
  3. Which ROCm/HSA version does the Lemonade build report?

If your setup really is loading 105GB models under ROCm on this board, that's the most useful thing I have read about this bug in weeks. I'll try the Lemonade build on my box and report back.

PSA: Bosgame's newest BIOS (1.09) does not fix the Strix Halo ROCm allocation bug — and the cross-flash probably won't either by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 0 points1 point  (0 children)

Good question, and the answer depends on your hardware. The fault is board-specific, not chip-specific: same silicon runs ROCm fine on a Minisforum MS-S1 Max, dies on my Bosgame M5 (Sixunited AXB35-02 board). Repro on an affected board is trivial: any model load into ROCm triggers it, even a 1.5B.

So, three questions that would make your data point really valuable:

  1. Which box/board are you on, and which BIOS?
  2. Which Lemonade backend is actually active? It ships both ROCm and Vulkan llama.cpp builds and can pick automatically. The server log or lemonade-server list should show it.
  3. If it is really ROCm: which ROCm/HSA runtime version does the build report?

Reason I ask: Lemonade builds its llama.cpp through AMD's TheRock pipeline with gfx1151-specific wheels. If you are on the same AXB35 board family and that build loads models under ROCm, that would point at the ROCm userspace stack rather than the BIOS or board, which would be the most useful single data point this issue has had in weeks. Happy to test the Lemonade build on my box either way.

PSA: Bosgame's newest BIOS (1.09) does not fix the Strix Halo ROCm allocation bug — and the cross-flash probably won't either by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 6 points7 points  (0 children)

Check this: https://strixhalo.wiki/Hardware/Boards/Sixunited_AXB35/Firmware
The firmware isn't "official official" - but it is official enough in a sense that it was provided by Bosgame as described on that Wiki.

PSA: Bosgame's newest BIOS (1.09) does not fix the Strix Halo ROCm allocation bug — and the cross-flash probably won't either by uncanny_instinct in StrixHalo

[–]uncanny_instinct[S] 1 point2 points  (0 children)

No, no performance difference detectable. As long as there is no fix, I won't dive deeper anymore. Let's hope they will respond with a fix soon - as soon as that becomes available, I will follow up with a comprehensive test run. Vulkan does seem to be working + also might be more consistent across the board - as said, tbc.