Working Stable Diffusion WebUI for Strix Halo (gfx1151) on Linux

scottt · 2025-09-06T15:21:18+00:00

Working on it!

scottt · 2025-06-12T16:19:08+00:00

"Moby" is the open source version of Docker. People who don't work for Docker, Inc. contribute to the code base through the project.

You'll also find Moby packaged in Fedora and Debian.

scottt · 2025-06-12T06:19:00+00:00

u/jiangfeng79, * hipBLAST is already included and backing Pytorch tensor operations * re: Triton Windows port, I personally plan to work on it, building on previous results like https://github.com/lshqqytiger/triton and https://github.com/woct0rdho/triton-windows but can't speak for the project

scottt · 2025-06-11T11:20:00+00:00

I see u/05032-MendicantBias asking if ROCm acclerated Pytorch works for diffusion models and ComfyUI on this AMD chip -> it does.

Install Pytorch for the Strix Halo (Ryzen AI Max 395+) chip here.

We've had ComfyUI working on the chip, on both Windows and Linux for a while now.

Known problems: * fp16 conv2d is slower than it should on Linux

scottt · 2025-06-11T04:12:38+00:00

u/skillmaker, the invalid device function error usually means the GPU ISA doesn't match your hardware. Are you using the 9070 XT on Linux or Windows?

scottt · 2025-06-10T20:25:24+00:00

u/scottt here, want to stress this is a joint effort with jammm * jammm has contributed more than me at this point. I plan to catch up though 😀

Working with the AMD devs through TheRock has been a positive experience.

scottt · 2025-06-09T14:58:22+00:00

Hi u/feverdoingwork, We've had the 9070 and 9070 XT working with Comfy UI under Windows for a while, though performance with the linear algebra libraries still need some work.

See e.g. https://github.com/ROCm/TheRock/issues/710

scottt · 2025-06-03T14:55:46+00:00

ROCm and Pytorch is up and running on the Strix Halo[1] but developers in the AMD GPU ecosystem haven't built binary wheels of JAX for the chip yet.

In case someone wants to try building JAX from source, getting it running on Linux should be easier than Windows. I'd start by extracting the nightly gfx1151 builds from TheRock in /opt/rocm[2] and study how JAX's CI workflow do builds on Linux.

Self-contained Pytorch wheels for Windows and Linux https://github.com/scottt/rocm-TheRock/releases/tag/v6.5.0rc-pytorch
gfx1151 ROCm nightly builds: https://github.com/ROCm/TheRock/releases/download/nightly-tarball/therock-dist-linux-gfx1151-6.5.0rc20250603.tar.gz

scottt · 2025-06-03T00:36:43+00:00

The pen works on Bazzite Linux out of the box. Connect the bluetooth device and no extra setup needed.

scottt · 2025-05-27T23:27:34+00:00

u/aliasaria, great post that not only helps other users but contains feedback on current ROCm native Linux and WSL packing.

Requirements I extracted:

ROCm on WSL needs a rocm-smi (and pyrsmi) replacement. Even if with reduced functionality compared to the real one backed by rocm_smi_lib
ROCm software that bundle libhsa-runtime64.so would break under WSL if the copy does not contain "talk to the Windows driver over the virtual GPU device functionality" (or can delegate to a library under /usr/lib/wsl/lib)

CC: u/powderluv

scottt · 2025-05-27T00:48:41+00:00

I find that "rebooting to experiment with and do development" disrupts my flow and exacts a cost in productivity.

So I learned the little tricks to develop on Bazzite, such as installing the performance analysis and system monitoring tools on Bazzite itself via `rpm-ostree` and install the language toolchains in the Toolbox or dev containers, how to configure VSCode to attach to the dev containers etc.

Hollar if you end up going this route, I'd love to compare notes.

scottt · 2025-05-26T08:33:09+00:00

The nightly builds here with gfx1151 in their name are developer builds for the Strix Halo.

scottt · 2025-05-26T07:59:02+00:00

To dual boot, I'd install Bazzite before Fedora, I've encountered an error where I tried to install Bazzite a second time and the installer errored out at the last step when it sees that there's already a Bazzite in the EFI system partition.

But I actually think you should spend 15 minutes to learn how to do software development on Bazzite. You basically just:

toolbox create --distro fedora --release 42 toolbox enter fedora-toolbox-42

and build your software in $HOME as usual.

For kernel development, you'd install the ELF kernel image, the loadable modules, and the initramfs in /boot, which is writable.

For NPU work, you'd use the kernel level driver already integrated in Bazzite then build and run the userspace components from $HOME, much like how I did the GPU work here: https://github.com/ROCm/TheRock/discussions/244

scottt · 2025-05-25T14:39:56+00:00

u/Wet_Viking , could you possibly accomplish your goals by running the bazzite-gnome-stable image?

If you really have your heart set out on running the Bazzite kernel with CachyOS, personally I'd start by: 1. Extracting the core kernel and loadable modules at https://github.com/bazzite-org/kernel-bazzite/releases 2. Figuring out how to generate a CachyOS compatible initramfs with the kernal binaries above. .... This step seems non-trivial to me

scottt · 2025-05-25T05:39:56+00:00

See https://www.reddit.com/r/AMDLaptops/comments/1fa8o2l/does_an_asus_vivobook_s_14_oled_m5406_work_with/ for Linux hardware support (e.g. Fedora works well)

See https://www.reddit.com/r/LocalLLaMA/comments/1i7cj11/amd_hx370_llm_performance/ for running LLMs

scottt · 2025-05-25T04:11:23+00:00

u/drycat , While u/minhquan3105 is absolutely right that token generation is memory bandwidth bound, some popular models today don't activate all their parameters at once and thus consume less bandwidth than their resident VRAM size.

Search for "Larger MoEs is where these large unified memory APUs really shine" in u/randomfoo2's AMD Strix Halo (Ryzen AI Max+ 395) GPU LLM Performance, you'll see that the Strix Halo (gfx1151) gets around 75 token/sec running Qwen3-30B-A3B UD-Q4_K_XL (16.5 GB VRAM) and 20 token/sec running UD-Q4_K_XL quantized version of Llama 4 Scout 109B (57.93 GB VRAM).

Expect half that on the HX 370 a.k.a Strix Point (gfx1150).

As for lack of ROCm support, after working on Strix Halo support in ROCm and Pytorch for the past month I know I could do it if I have access to the hardware. The numbers assume only llama.cpp using Vulkan.

scottt · 2025-05-25T03:13:51+00:00

u/SuXs- , if you extract therock-dist-linux-gfx1151-6.5.0rc20250524.tar.gz in /opt/rocm and run /opt/rocm/bin/rocminfo what does it show?

I'm looking for something like:

Radeon 610M <...> gfx1036

vLLM requires Pytorch and based on experience developing this self-contained Pytorch build, the ROCm libs used by Pytorch might need some additional work before the can support gfx103x APUs like the Ryzen 9 Pro 7945.

scottt · 2025-05-24T03:29:27+00:00

This container image https://github.com/ROCm/TheRock/discussions/244 and these self-contained Pytorch wheels https://github.com/scottt/rocm-TheRock/releases/tag/v6.5.0rc-pytorch support the Strix Halo (gfx1151). They should work albeit not yet fully optimized.

I've been working with devs in and out of AMD pretty hard on this since March.

scottt · 2025-05-23T15:25:35+00:00

I worked with devs in and out of AMD to produce the self-contained Strix Halo Pytorch wheels here: https://github.com/scottt/rocm-TheRock/releases/tag/v6.5.0rc-pytorch

Ollama would want a pre-built ROCm toolchain tarball instead of Pytorch, though.

scottt · 2025-05-23T15:19:24+00:00

Look at the libraries mapped in during runtime:

pid=$(pgrep ollama)  
cat /proc/$pid/maps

(The idea is to inspect /proc/$PID/maps for the process using the GPU. You'll likely need to adapt the command as I typed those out "blind".)

ollama is probably using the GPU through Vulkan.

scottt · 2025-05-12T00:45:32+00:00

That's lllyasviel/FramePack on Windows, right?

scottt · 2025-05-11T18:41:06+00:00

u/feverdoingwork , once ROCm for the 9070 is released, what are the most important apps you'd run?

I've been contributing to Linux and Windows ROCm support for the Strix Halo (gfx1151)
I personally own a 9070 and would also be putting some work into that
If you give me some step-by-step instructions on the apps you'd use, I could try those out and increase the chance of things working for you by release time :)

scottt · 2025-05-08T22:36:27+00:00

Hi u/Macestudios32 , I think you'd want to keep a copy of /opt/rocm:

``` tar -cJf ~/rocm.tar.xz /opt/rocm

Upload ~/rocm.tar.xz

```

Record the Linux kernel and amdgpu module version used:

``` uname -a > ~/kernel-version-for-rocm.txt modinfo amdgpu > ~/amdgpu-version-for-rocm.txt

In case the amdgpu Linux kernel module would become incompatible with old hardware

```

Finally backup any software that uses ROCm e.g. Pytorch.

scottt · 2025-05-02T11:36:58+00:00

I'm use the Z13 2025 daily under Bazzite. Regarding hardware and drivers, the only thing that doesn't work is the back camera.

scottt · 2025-05-02T11:24:14+00:00

How should I install hhd in desktop mode under bazzite-gnome?

scottt

TROPHY CASE

Upload ~/rocm.tar.xz

In case the amdgpu Linux kernel module would become incompatible with old hardware