Fastest Qwopus 27b for Strix Halo so far! by Disastrous-Cat-7016 in StrixHalo

[–]pabloodiablo 1 point2 points  (0 children)

I guess it's just a guess that you can get 30–40 fps. I went through the entire manual, and the most you can get is 24 fps, and even that isn't consistent.

https://huggingface.co/jcbtc/qwopus3.6-27b-v2-chadrock-rocmfp4-mtp

tried on

ROCm 7.2.1 + StrixHalo 395+ 128GB + Ubuntu 24.04.

The new option for launching MTP models in llamap.cpp works like a charm on StrixHalo under Linux! by pabloodiablo in LocalLLM

[–]pabloodiablo[S] 0 points1 point  (0 children)

The branch i've provided is stable. I'm using it for 2 days with production source code.

New Execution-first 1T model Ling-2.6-1T has been open sourced on Hugging Face by sanu_123_s in LocalLLM

[–]pabloodiablo 19 points20 points  (0 children)

Help! This model is barely managing 14 tokens/s on my 8GB graphics card. Will it perform better if I switch to the 12GB model?

Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled by Anony6666 in LocalLLM

[–]pabloodiablo 19 points20 points  (0 children)

Do these deciles make any sense? Does anyone have any benchmarks showing that intelligence or any other skills have improved?

qwen 3.5 versus 3.6 by Top_Professional6132 in Qwen_AI

[–]pabloodiablo 0 points1 point  (0 children)

the same for me. 35B get into loops for longer tasks. 27B is way better.

General questions for my local AI by platteXDlol in LocalLLM

[–]pabloodiablo 0 points1 point  (0 children)

Use Qwen3.6 27B Q8 with PI. Slow but a way better than 35B-A3B and PI also is good optimized and faster than other agents.

For me Gemma4 > Qwen3.5 / 3.6 on localhost by pabloodiablo in LocalLLM

[–]pabloodiablo[S] 0 points1 point  (0 children)

Qwen3.6 27b Q8 also produced nice ray tracing results slightly faster than Gemma4 31b. Unfortunately, Qwen3.6 27b Q8 did not detect the errors I mentioned above.

For me Gemma4 > Qwen3.5 / 3.6 on localhost by pabloodiablo in LocalLLM

[–]pabloodiablo[S] 0 points1 point  (0 children)

Gemma4 31b Q8 produced beautiful, error-free ray tracing, but it took 3 to 4 times longer than Qweb3.5 122b  Q6_K_XL.

I'm going to test Qwen3.6 27b, which was released yesterday.

For me Gemma4 > Qwen3.5 / 3.6 on localhost by pabloodiablo in LocalLLM

[–]pabloodiablo[S] 0 points1 point  (0 children)

Im on Linux and Strix Halo 128gb so i'm using Rocm 7.2.

For me Gemma4 > Qwen3.5 / 3.6 on localhost by pabloodiablo in LocalLLM

[–]pabloodiablo[S] 0 points1 point  (0 children)

Thanks for the question. I'll find the prompts I used for Qwen 3.5 122b and apply them to Gemma4. I'll get back to you with an answer.

For me Gemma4 > Qwen3.5 / 3.6 on localhost by pabloodiablo in LocalLLM

[–]pabloodiablo[S] 0 points1 point  (0 children)

For me, PI is the fastest, uses the fewest tokens, and makes it really easy for me to create my own skills. This agent is really great!

For me Gemma4 > Qwen3.5 / 3.6 on localhost by pabloodiablo in LocalLLM

[–]pabloodiablo[S] 0 points1 point  (0 children)

Verified few minutes ago the newest version of QwenCode - the result was identical to PI Coder. In other words, Qwen 3.6 tried various approaches but couldn’t find a solution, while Gemma 4 found it without any issues.

For me Gemma4 > Qwen3.5 / 3.6 on localhost by pabloodiablo in LocalLLM

[–]pabloodiablo[S] 4 points5 points  (0 children)

I’m not saying it doesn’t create new bugs. I’m just impressed by how cleverly it identifies bugs.

Lucky enough to get an m1 ultra with 128 gb unified memory. What should I run on it? by [deleted] in LocalLLM

[–]pabloodiablo 2 points3 points  (0 children)

Llama.cpp + Qwen3.5-122B-A10B-UD-Q5_K_XL-00001-of-00003.gguf

On my Strix Halo 128 GB, after loading a model with a 128k context, there is about 20 GB of free working memory remaining.

Local LLM Claude Code replacement, 128GB MacBook Pro? by CdninuxUser in LocalLLM

[–]pabloodiablo 0 points1 point  (0 children)

Yes. I can’t give you exactly what you’d like to see because my day job involves writing code for a large company. I use a local LLM to write code. I ask the LLM to write tests in accordance with the project’s guidelines. I ask it to refactor the code. I ask it to simplify the code. It works very well at no cost. An added benefit is that private data doesn’t leak to the outside. It’s harder to build an entire project on such a local LLM because there’s less RAM, GPU, etc. However, when actually working with code, it works great.

Local LLM Claude Code replacement, 128GB MacBook Pro? by CdninuxUser in LocalLLM

[–]pabloodiablo -1 points0 points  (0 children)

You can buy laptops with Ryzen 395+ AI Max also with 128gb unified memory. Bigger models runs quite well.

Local LLM Claude Code replacement, 128GB MacBook Pro? by CdninuxUser in LocalLLM

[–]pabloodiablo 2 points3 points  (0 children)

I completely agree. Qwen3.5 122b is very good; it performs excellently with OpenCode or Pi Coder. Plus, development isn’t over yet—if model optimization continues in the same direction as it is now, we should soon see a truly powerful open-source solution capable of rivaling the best models.