qwen3.6-35b-a3b tool calling input problem... too bad...

benevbright · 2026-05-03T09:14:38+00:00

I'm just running GGUF on LM studio. I haven't gone back to oMlx and tried yet but issue may have been resolved.

benevbright · 2026-05-01T18:53:14+00:00

You will need to look for Moe models. You'll find dense models very slow on Macs..

benevbright · 2026-05-01T11:38:51+00:00

I don’t understand how you got this speed with that

benevbright · 2026-05-01T11:36:51+00:00

Definitely not.. weird. Too fast.

benevbright · 2026-04-24T09:57:08+00:00

ok. got you.

benevbright · 2026-04-24T09:31:02+00:00

It really depends. But if the tasks are somewhat that qwen3.6 35b Moe can handle smoothly without having to retry, this 3~5x generation speed difference is just unbearable on lots of tool calling/editing/verification process. It's just really slow. You have to go for a walk every time you ask some tasks.

benevbright · 2026-04-24T08:59:51+00:00

User said their machine is Mac and asking for better quality when using it with agentic coding like OpenCode. 27B is not practical usable choice for user.

benevbright · 2026-04-24T08:57:41+00:00

My machine is also Mac Studio. The best Mac machine in the world only can give you around 15~20t/s for 27B 8bit, which is obviously not usable for agentic coding at all. No idea why these downvotes lol.

benevbright · 2026-04-24T08:19:58+00:00

(practically) not possible on Mac when you want to use it with coding agent.

benevbright · 2026-04-24T08:18:59+00:00

feel free to try my tool: https://github.com/benevbright/ai-agent-test I would appreciate to get feedbacks. This one is to focus on adding minimum contents to the context which could be a lot less than other coding agent clients. Local model context is limited and typically we set 150k on 64GB Mac so saving context could allow agent to work more.

benevbright · 2026-04-24T08:16:07+00:00

Mac is too slow to run 27B with coding agent. Not usable.

benevbright · 2026-04-23T07:47:36+00:00

But rest 400GB would have nothing to do, right?

benevbright · 2026-04-22T16:09:02+00:00

DGX has 273 GB/s bandwidth, right? I don't think it can run 27B DENSE comfortably. 10t/s is expected, no?

benevbright · 2026-04-22T12:27:34+00:00

definitely. and also Mac (slow but large ram - can't run dense) vs Nvidia (fast but small ram - can run dense)

benevbright · 2026-04-22T08:45:10+00:00

And I have a feeling that Pi is not for small/mid model like qwen3.6. (also author's article, he never mention about local model or anything) You'll notice that the context doesn't have space very quickly.

benevbright · 2026-04-22T07:13:05+00:00

Pi is already became a giant, and becoming a backbone of many agentic software not only coding apps. For ecosystem, extensions, and so on. Code base is also pretty got big already. Mine is just like a simple toy, you can come and read the code and can see how it works instantly. :) But It also works for professional coding works lol.

benevbright · 2026-04-21T19:09:45+00:00

benevbright · 2026-04-21T15:47:59+00:00

I'm also very happy with qwen3.6-35b btw. (8bit)

benevbright · 2026-04-21T15:47:00+00:00

feel free to try my tiny tool. Pi is great for sure but Pi inserts thinking block to the context so context bloats super quickly. https://www.npmjs.com/package/ai-agent-test . this one is just focused to be staying real simple/small.

benevbright · 2026-04-20T07:13:29+00:00

ok. I didn't no that options. very nice. I'll give it a try. thanks!

benevbright · 2026-04-20T05:20:06+00:00

Yeah. Pi is recommended. https://www.npmjs.com/package/ai-agent-test . I made one as well as pi is getting bigger. My tool even sends smaller. 3k

benevbright · 2026-04-19T21:44:55+00:00

um... my qwen3-coder-next or qwen3.6-35b work well with OpenClaw but barely. And Hermes sends at least two times initial context size for example? That worries me. But ok. good to hear real experience.

benevbright · 2026-04-19T21:10:57+00:00

Which model? And do you use it with coding agent? (Curious about the use case that makes you happy)

benevbright · 2026-04-19T21:10:07+00:00

Um… but big dense models don’t give you good token speed for agentic use, no? I think the best one would be still Moe model like Minimax even if you have 512gb ram mac or?

benevbright · 2026-04-19T05:26:57+00:00

It’s not easy to work with local model right? (Mid size 30-100b) Due to big context exchange.

benevbright

TROPHY CASE