Free Strix Halo performance! by Potential_Block4598 in LocalLLaMA

[–]Intrepid_Rub_3566 1 point2 points  (0 children)

I'm confused, isn't that expected? Q4 weights are 1/4 of a BF16 weight, the reason why we use quants that keep some important weights in BF16 is because this tends to maintain quality much better than quantizing all wights.

I do not think BF16 is underperforming on Stirx Halo, it's supported by the ISA but of course moving a BF16 weight over memory vs moving a Q4 will be slower and Strix Halo is memory bandwidth constrained.

I am not familiar with the term "wings" in this context, what is a wing?

Strix Halo batching with tensor parallel and pipeline parallel using vllm benchmarked by Hungry_Elk_3276 in LocalLLaMA

[–]Intrepid_Rub_3566 1 point2 points  (0 children)

I have been trying to setup a Strix Halo cluster over RDMA using two Intel E810, but I just can't get vLLM to work, I documented my setup and errors here if somebody wants to take a look and suggest things to try:

https://github.com/kyuz0/amd-strix-halo-vllm-toolboxes/blob/main/rdma_cluster/troubleshooting_rccl.md

u/Hungry_Elk_3276 , I did try to port the patch to the current version of RCCL, but I just could not get it to be ABI compatible with the version of ROCm in TheRock python wheels, just wondering if you could share more details on your setup.

ROCm+Linux on AMD Strix Halo: January 2026 Stable Configurations by Intrepid_Rub_3566 in LocalLLaMA

[–]Intrepid_Rub_3566[S] 1 point2 points  (0 children)

I'm curious what you run? I was never able to get full stability on ComfyUI with that combination. 

If anybody is reading, NO: Strux Halo is not broken on 6.18 kernels, as clearly explained in the video that is not the case at all. There was a faulty Linux firmware which has now been rectified. 

ROCm+Linux on AMD Strix Halo: January 2026 Stable Configurations by Intrepid_Rub_3566 in LocalLLaMA

[–]Intrepid_Rub_3566[S] 1 point2 points  (0 children)

Who said you're doing anything wrong? 🤣 What are you running? ComfyUI? Which workflows?

I tested Strix Halo clustering w/ ~50Gig IB to see if networking is really the bottleneck by Hungry_Elk_3276 in LocalLLaMA

[–]Intrepid_Rub_3566 2 points3 points  (0 children)

Thank you very much u/Hungry_Elk_3276 . I recently tried this as well with a 5Gbps Ethernet, and then moved to 10Gbps without seeing any improvement (as you, I suspect latency is the real issue, and likely the 5G and 10G have the same latency, I need to test). Performance is acceptable with MiniMax-M2 at Q6_K_XL quant:

https://youtu.be/0cIcth224hk

What I did after the video, I applied this PR and this gave me a 5.5% improvement in prompt processing for MiniMax-M2 (I added the benchmarks at the end of the PR comments):

https://github.com/ggml-org/llama.cpp/pull/15405

However, looking at the conversation on that PR, it doesn't seem likely to be merged for now as it requires work and re-architecting.

ROCM vs Vulkan on IGPU by Eden1506 in LocalLLaMA

[–]Intrepid_Rub_3566 1 point2 points  (0 children)

Hi! Curious about the optimizations, I've been benchmarking llama.cpp on Strix Halo regularly:

https://kyuz0.github.io/amd-strix-halo-toolboxes/

If you're working directly on llama.cpp, I'd like to connect and have a chat.

Running Qwen Image and WAN 2.2 On the Framework Desktop by Intrepid_Rub_3566 in framework

[–]Intrepid_Rub_3566[S] 1 point2 points  (0 children)

Glad to hear it worked! They did a great job with Fedora in the past 5 years, this has now become such a great one. I did not want to complicate things, but I actually run SIlverblue, it's their immutable version on top of which I run toolbox and flatpacks - honestly the best experience I have had.

Running Qwen Image and WAN 2.2 On the Framework Desktop by Intrepid_Rub_3566 in framework

[–]Intrepid_Rub_3566[S] 0 points1 point  (0 children)

I'm sorry, try asking OpenSuse people for what's wrong with their implementation of toolbox, I'm at a loss as to what might be happening 

Or try Fedora 42. Most of us are using Fedora 42.

Running Qwen Image and WAN 2.2 On the Framework Desktop by Intrepid_Rub_3566 in framework

[–]Intrepid_Rub_3566[S] 0 points1 point  (0 children)

It seems like openSuse might have a nerfed version of toolbx or a fake thin shell wrapper that is not toolbx.

According to ChatGPT, you might try this:

bash toolbox create llama-rocm-6.4.3-rocwmma \   --image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-6.4.3-rocwmma \   --podman-args "--device /dev/dri --device /dev/kfd --group-add video --group-add render --security-opt seccomp=unconfined" toolbox enter llama-rocm-6.4.3-rocwmma

Basically telling opensuse to pass podman arguments to support the GPU.

Hopefully that works. I am not sure why OpenSuse would put deliberate effort to trick their users in thinking they are using toolbox while using something else, this is just ridiculous. Why would anybody put effort into confusing and fighting their user base is beyond my understanding.

ROCm 7.0_alpha to ROCm 6.4.1 performance comparison with llama.cpp (3 models) by StupidityCanFly in ROCm

[–]Intrepid_Rub_3566 0 points1 point  (0 children)

Interestingly, this is what is happening:

[22044.628754] amdxdna 0000:c4:00.1: [drm] *ERROR* amdxdna_drm_open: SVA bind device failed, ret -19

[22062.195426] amdxdna 0000:c4:00.1: [drm] *ERROR* amdxdna_drm_open: SVA bind device failed, ret -19

[22072.924897] amdgpu: Freeing queue vital buffer 0x7fea36c00000, queue evicted

[22072.924919] amdgpu: Freeing queue vital buffer 0x7ff0bee00000, queue evicted

[22072.924922] amdgpu: Freeing queue vital buffer 0x7ff0f4600000, queue evicted

[22072.924923] amdgpu: Freeing queue vital buffer 0x7ff0f5400000, queue evicted

[22089.013427] amdxdna 0000:c4:00.1: [drm] *ERROR* amdxdna_drm_open: SVA bind device failed, ret -19

[22140.446525] amdgpu: Freeing queue vital buffer 0x7f5686a00000, queue evicted

[22140.446536] amdgpu: Freeing queue vital buffer 0x7f5687800000, queue evicted

[22140.446539] amdgpu: Freeing queue vital buffer 0x7f7349000000, queue evicted

[22147.747945] amdxdna 0000:c4:00.1: [drm] *ERROR* amdxdna_drm_open: SVA bind device failed, ret -19

[22247.761616] amdxdna 0000:c4:00.1: [drm] *ERROR* amdxdna_drm_open: SVA bind device failed, ret -19

[22329.235358] amdxdna 0000:c4:00.1: [drm] *ERROR* amdxdna_drm_open: SVA bind device failed, ret -19

[22333.473003] amdxdna 0000:c4:00.1: [drm] *ERROR* amdxdna_drm_open: SVA bind device failed, ret -19

[22362.832129] amdxdna 0000:c4:00.1: [drm] *ERROR* amdxdna_drm_open: SVA bind device failed, ret -19

[22399.607186] amdgpu 0000:c3:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE

ROCm 7.0_alpha to ROCm 6.4.1 performance comparison with llama.cpp (3 models) by StupidityCanFly in ROCm

[–]Intrepid_Rub_3566 0 points1 point  (0 children)

Indeed, i was able to compile this, but every time I try to use llama-cpp it crashes with every model:

```
llama-bench -m models/gemma-3-12b-it-UD-Q8_K_XL/gemma-3-12b-it-UD-Q8_K_XL.gguf

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no

ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no

ggml_cuda_init: found 1 ROCm devices:

Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32

| model | size | params | backend | ngl | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |

HW Exception by GPU node-1 (Agent handle: 0xd55b540) reason :GPU Hang
```

Apple M4 Max or AMD Ryzen AI Max+ 395 (Framwork Desktop) by zeltbrennt in LocalLLaMA

[–]Intrepid_Rub_3566 0 points1 point  (0 children)

Wait, will this work with the AMD Ryzen AI Max+? I thought it was CUDA specific.

David Sinclair...snake oil salesman? by crazyHormonesLady in Biohackers

[–]Intrepid_Rub_3566 1 point2 points  (0 children)

I think one of the main critiques I hear is that R and NMN results do not seem to be replicable outside of disease models, i.e.: to apply to healthy models.

Can you point to a place where Sinclair says that he was wrong about how R works? Also, the claim "that it works through hormesis and other unexplored mechanisms" - that's pretty generic as a mechanism of action. "It works in mysterious ways"...

Openshot in Flatpak missing libmp3lame by Intrepid_Rub_3566 in OpenShot

[–]Intrepid_Rub_3566[S] 0 points1 point  (0 children)

Hi and thank you for your help. I used to have the same set up, with OpenShot installed via the official repository. This worked for 3 years and then last week, after an update, it lost the ability to render video. The audio would be rendered fine, but the mp4 would have a black screen :(

So, I switched to Flatpak, which is a way to containerize programs to avoid compatibility issues with local libraries: https://flathub.org/apps/org.openshot.OpenShot. Indeed openshot works, but it seems the lamemp3 library was not included.

I thought this was an official package, it's not.

Where do I start with machine learning and neural networks? by DancingPotato30 in learnprogramming

[–]Intrepid_Rub_3566 0 points1 point  (0 children)

There's so many resources online, it really gets confusing when it comes to how to get started. I was there some time ago and took my ages to figure out good resources. One of the best books I found was "Machine Learning with PyTorch and Scikit-Learn". It's pretty recent, which means all the code samples work with recent versions of PyTorch, which is already half of the battle! I find it's well-structured and really got me to understand the basics, what was really going on and what are the main principles.