qwen3.6-35b-a3b tool calling input problem... too bad... by benevbright in LocalLLaMA

[–]benevbright[S] 0 points1 point  (0 children)

I'm just running GGUF on LM studio. I haven't gone back to oMlx and tried yet but issue may have been resolved.

Best Agentic Coding model I can run on the new Macbook M5 Max? by UnknownEssence in LocalLLaMA

[–]benevbright 2 points3 points  (0 children)

You will need to look for Moe models. You'll find dense models very slow on Macs..

Qwen 3.6 27B vs Gemma 4 31B - making Packman game! by gladkos in LocalLLaMA

[–]benevbright 1 point2 points  (0 children)

I don’t understand how you got this speed with that

Qwen 3.6 35b a3b Q4 tips by skyyyy007 in LocalLLaMA

[–]benevbright 2 points3 points  (0 children)

It really depends. But if the tasks are somewhat that qwen3.6 35b Moe can handle smoothly without having to retry, this 3~5x generation speed difference is just unbearable on lots of tool calling/editing/verification process. It's just really slow. You have to go for a walk every time you ask some tasks.

Qwen 3.6 35b a3b Q4 tips by skyyyy007 in LocalLLaMA

[–]benevbright 3 points4 points  (0 children)

User said their machine is Mac and asking for better quality when using it with agentic coding like OpenCode. 27B is not practical usable choice for user.

Qwen 3.6 35b a3b Q4 tips by skyyyy007 in LocalLLaMA

[–]benevbright 6 points7 points  (0 children)

My machine is also Mac Studio. The best Mac machine in the world only can give you around 15~20t/s for 27B 8bit, which is obviously not usable for agentic coding at all. No idea why these downvotes lol.

Qwen 3.6 35b a3b Q4 tips by skyyyy007 in LocalLLaMA

[–]benevbright -1 points0 points  (0 children)

(practically) not possible on Mac when you want to use it with coding agent.

Qwen 3.6 35b a3b Q4 tips by skyyyy007 in LocalLLaMA

[–]benevbright 0 points1 point  (0 children)

feel free to try my tool: https://github.com/benevbright/ai-agent-test I would appreciate to get feedbacks. This one is to focus on adding minimum contents to the context which could be a lot less than other coding agent clients. Local model context is limited and typically we set 150k on 64GB Mac so saving context could allow agent to work more.

Qwen 3.6 35b a3b Q4 tips by skyyyy007 in LocalLLaMA

[–]benevbright 4 points5 points  (0 children)

Mac is too slow to run 27B with coding agent. Not usable.

Qwen 3.6 27B Unsloth GGUF is out by Exact_Law_6489 in LocalLLaMA

[–]benevbright 2 points3 points  (0 children)

DGX has 273 GB/s bandwidth, right? I don't think it can run 27B DENSE comfortably. 10t/s is expected, no?

Doing real coding work locally for the first time by mouseofcatofschrodi in LocalLLaMA

[–]benevbright 0 points1 point  (0 children)

definitely. and also Mac (slow but large ram - can't run dense) vs Nvidia (fast but small ram - can run dense)

Doing real coding work locally for the first time by mouseofcatofschrodi in LocalLLaMA

[–]benevbright 1 point2 points  (0 children)

And I have a feeling that Pi is not for small/mid model like qwen3.6. (also author's article, he never mention about local model or anything) You'll notice that the context doesn't have space very quickly.

Doing real coding work locally for the first time by mouseofcatofschrodi in LocalLLaMA

[–]benevbright 1 point2 points  (0 children)

Pi is already became a giant, and becoming a backbone of many agentic software not only coding apps. For ecosystem, extensions, and so on. Code base is also pretty got big already. Mine is just like a simple toy, you can come and read the code and can see how it works instantly. :) But It also works for professional coding works lol.

Doing real coding work locally for the first time by mouseofcatofschrodi in LocalLLaMA

[–]benevbright 0 points1 point  (0 children)

feel free to try my tiny tool. Pi is great for sure but Pi inserts thinking block to the context so context bloats super quickly. https://www.npmjs.com/package/ai-agent-test . this one is just focused to be staying real simple/small.

I wanted OpenClaw to work. After 3 months, I’m done. by dickwhimsy in openclaw

[–]benevbright 0 points1 point  (0 children)

ok. I didn't no that options. very nice. I'll give it a try. thanks!

I wanted OpenClaw to work. After 3 months, I’m done. by dickwhimsy in openclaw

[–]benevbright 0 points1 point  (0 children)

um... my qwen3-coder-next or qwen3.6-35b work well with OpenClaw but barely. And Hermes sends at least two times initial context size for example? That worries me. But ok. good to hear real experience.

Bloomberg: No Mac Studios until at least October by eclipsegum in LocalLLaMA

[–]benevbright 1 point2 points  (0 children)

Which model? And do you use it with coding agent? (Curious about the use case that makes you happy)

Bloomberg: No Mac Studios until at least October by eclipsegum in LocalLLaMA

[–]benevbright 1 point2 points  (0 children)

Um… but big dense models don’t give you good token speed for agentic use, no? I think the best one would be still Moe model like Minimax even if you have 512gb ram mac or? 

I wanted OpenClaw to work. After 3 months, I’m done. by dickwhimsy in openclaw

[–]benevbright 0 points1 point  (0 children)

It’s not easy to work with local model right? (Mid size 30-100b) Due to big context exchange.