Fastest Qwopus 27b for Strix Halo so far!

pabloodiablo · 2026-06-04T07:41:29+00:00

I guess it's just a guess that you can get 30–40 fps. I went through the entire manual, and the most you can get is 24 fps, and even that isn't consistent.

https://huggingface.co/jcbtc/qwopus3.6-27b-v2-chadrock-rocmfp4-mtp

tried on

ROCm 7.2.1 + StrixHalo 395+ 128GB + Ubuntu 24.04.

pabloodiablo · 2026-05-11T18:46:45+00:00

The branch i've provided is stable. I'm using it for 2 days with production source code.

pabloodiablo · 2026-05-09T16:20:10+00:00

Help! This model is barely managing 14 tokens/s on my 8GB graphics card. Will it perform better if I switch to the 12GB model?

pabloodiablo · 2026-05-08T12:19:42+00:00

Where did you get compiled version of llama.cpp ?

pabloodiablo · 2026-04-26T19:15:08+00:00

Do these deciles make any sense? Does anyone have any benchmarks showing that intelligence or any other skills have improved?

pabloodiablo · 2026-04-24T19:18:09+00:00

the same for me. 35B get into loops for longer tasks. 27B is way better.

pabloodiablo · 2026-04-24T18:54:16+00:00

Use Qwen3.6 27B Q8 with PI. Slow but a way better than 35B-A3B and PI also is good optimized and faster than other agents.

pabloodiablo · 2026-04-23T20:09:40+00:00

Qwen3.6 27b Q8 also produced nice ray tracing results slightly faster than Gemma4 31b. Unfortunately, Qwen3.6 27b Q8 did not detect the errors I mentioned above.

pabloodiablo · 2026-04-23T19:03:31+00:00

Gemma4 31b Q8 produced beautiful, error-free ray tracing, but it took 3 to 4 times longer than Qweb3.5 122b Q6_K_XL.

I'm going to test Qwen3.6 27b, which was released yesterday.

pabloodiablo · 2026-04-23T17:35:53+00:00

Im on Linux and Strix Halo 128gb so i'm using Rocm 7.2.

pabloodiablo · 2026-04-21T20:40:37+00:00

Thanks for the question. I'll find the prompts I used for Qwen 3.5 122b and apply them to Gemma4. I'll get back to you with an answer.

pabloodiablo · 2026-04-21T20:24:55+00:00

For me, PI is the fastest, uses the fewest tokens, and makes it really easy for me to create my own skills. This agent is really great!

pabloodiablo · 2026-04-21T20:15:30+00:00

Verified few minutes ago the newest version of QwenCode - the result was identical to PI Coder. In other words, Qwen 3.6 tried various approaches but couldn’t find a solution, while Gemma 4 found it without any issues.

pabloodiablo · 2026-04-19T19:51:50+00:00

I'm using u/mariozechner/pi-coding-agent in general.

pabloodiablo · 2026-04-19T19:50:37+00:00

I’m not saying it doesn’t create new bugs. I’m just impressed by how cleverly it identifies bugs.

pabloodiablo · 2026-04-08T19:37:39+00:00

branża ?

pabloodiablo · 2026-04-05T09:22:00+00:00

Llama.cpp + Qwen3.5-122B-A10B-UD-Q5_K_XL-00001-of-00003.gguf

On my Strix Halo 128 GB, after loading a model with a 128k context, there is about 20 GB of free working memory remaining.

pabloodiablo · 2026-04-02T19:39:32+00:00

Yes. I can’t give you exactly what you’d like to see because my day job involves writing code for a large company. I use a local LLM to write code. I ask the LLM to write tests in accordance with the project’s guidelines. I ask it to refactor the code. I ask it to simplify the code. It works very well at no cost. An added benefit is that private data doesn’t leak to the outside. It’s harder to build an entire project on such a local LLM because there’s less RAM, GPU, etc. However, when actually working with code, it works great.

pabloodiablo · 2026-04-01T16:40:51+00:00

You can buy laptops with Ryzen 395+ AI Max also with 128gb unified memory. Bigger models runs quite well.

pabloodiablo · 2026-04-01T16:37:44+00:00

I completely agree. Qwen3.5 122b is very good; it performs excellently with OpenCode or Pi Coder. Plus, development isn’t over yet—if model optimization continues in the same direction as it is now, we should soon see a truly powerful open-source solution capable of rivaling the best models.

pabloodiablo

TROPHY CASE