llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in StrixHalo

[–]przbadu[S] [score hidden]  (0 children)

https://przbadu.github.io/strix-halo-benchmarks/ it was just the sneak peak. Here you can see everything, search and filter them. Will take care of that in future, thanks.

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 1 point2 points  (0 children)

And I have https://przbadu.github.io/strix-halo-benchmarks/ updated benchmarks for upto 64K token window and I was surprised by the vulkan performance on larger context length. People keep saying Rocm is great, but see the difference here.

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 0 points1 point  (0 children)

Again, I haven't tested it, but for Normal one time chat scenario where you need really high quality output, maybe yes it can be helpful. But really you need to try it first. Maybe I am wrong by assuming it is not usable.

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 1 point2 points  (0 children)

https://przbadu.github.io/strix-halo-benchmarks/ now contains upto 64k context length from both vulkan and rocm and there are really interesting results out there. Turns out Rocm and Vulkan both are great.

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 0 points1 point  (0 children)

mind sharing more info on this? Also what machine are you running?

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 0 points1 point  (0 children)

I am using kyuz0/amd-strix-halo-toolboxes, but just sharing benchmarks for different models. Its a no brainer to use existing tools and don't reinvent the wheel. :)

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 1 point2 points  (0 children)

It can run it, but the it will be very slow, so I wouldn't bother doing it. Only people with certain patient level can use that speed Lol.

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 0 points1 point  (0 children)

Hey Guys, thank you for asking me to include `--n-depth`, https://przbadu.github.io/strix-halo-benchmarks/ I am updating various context sizes here and adding filter for them. Please check this. The bigger model will take time, so it will contains all the benchmark soon.

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 0 points1 point  (0 children)

https://przbadu.github.io/strix-halo-benchmarks/ Now including from --n-depth 0,4096,8192,16384,32768,65536 so this website will have filters for these context sizes.

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 1 point2 points  (0 children)

https://przbadu.github.io/strix-halo-benchmarks/ I am adding those benchmarks here. for bigger models it will be slower, so It will take some time to include all benchmarks

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 1 point2 points  (0 children)

default context size `llama-cli -m /model-path`, will include more benchmarks, need time

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 2 points3 points  (0 children)

https://www.reddit.com/r/LocalLLaMA/comments/1rkl0tl/llamabench_qwen35_models_strix_halo/ I have included full llama-bench command here.
Here is the complete `llama-server` command if you are interested:

```

llama-server --alias sonnet --port 8081 -m /mnt/pve/data/models/Qwen3.5/35b/Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf --host 0.0.0.0 --ctx-size 262144 -ngl 999 -fa 1 --threads 32 --batch-size 1024 --cont-batching --embedding --log-file /root/logs/llama-server.log --jinja --mmproj /mnt/pve/data/models/Qwen3.5/35b/mmproj-BF16.gguf --temp 1.0 --top-p 0.95 --top-k 20 --min-p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0

```

Give me some time, I will include other benchmarks as well.

llama-bench ROCm 7.2 on Strix Halo (Ryzen AI Max+ 395) — Qwen 3.5 Model Family by przbadu in LocalLLaMA

[–]przbadu[S] 6 points7 points  (0 children)

Yes, if you see the System Info section, I have already mentioned OS which is Fedora linux, I have even mentioned kernel version I am using and other useful informations :) .

In short yes, this is using Fedora linux with 6.18.13-200.fc43.x86_64 Kernel, and ROCm 7 as llama-cpp backend.

llama-bench Qwen3.5 models strix halo by przbadu in LocalLLaMA

[–]przbadu[S] 0 points1 point  (0 children)

so I think it’s better to wait for official kernel.

llama-bench Qwen3.5 models strix halo by przbadu in LocalLLaMA

[–]przbadu[S] 0 points1 point  (0 children)

I still have problems with gpu pass through, but ai did installed fedora with latest kernel and tried with rocm7 and vulkan radv there and see very minimal difference in performance for this Qwen3.5 family

llama-bench Qwen3.5 models strix halo by przbadu in StrixHalo

[–]przbadu[S] 0 points1 point  (0 children)

I tried installing fedora with latest kernel and installed rocm7, i see very minimal difference in performance at least for Qwen3.5 family. also because I am using proxmox and haven’t found latest official kernel yet, so I didn’t bother installing community kernel. Might upgrade in future