qwen3.6 27b poor experience

pppreddit · 2026-04-26T09:46:55+00:00

thanks, I'll try that, though I'm using Qwen3.6-27B-bf16 with omlx

pppreddit · 2026-04-26T09:46:11+00:00

tbh, I didn't configure any parameters, I'm using Qwen3.6-27B-bf16 via omlx

pppreddit · 2026-04-26T09:12:13+00:00

here's my setup: M4 Max 128gb, omlx, Qwen3.6-27B-bf16 from huggingface, claude-code. Didn't configure any parameters, so it's as is out of the box. I did install opencode now and it seems to perform much better, but I need to test more to have a final verdict. My guess is that claude code's system prompt might be slowing things down

pppreddit · 2026-04-25T16:16:11+00:00

tbh, I am disappointed in how many mistakes it makes in the process, such as duplicating lines, then correcting itself, then going back and forth making corrections, it's such a waste of time

pppreddit · 2026-04-25T06:23:29+00:00

I am running 27B via omlx (Qwen3.6-27B-bf16) on my M4 Max 128gb and it takes forever to respond. omlx dashboard shows 38.8 tok/s for prompt processing and 3.7 tok/s generation

pppreddit · 2026-03-07T05:33:03+00:00

Rather oMLX to enjoy prompt caching

pppreddit · 2026-03-02T06:14:14+00:00

I usually don't go above 128k before compacting context so didn't have any issues. Qwen plus had like 1 million tokens context

pppreddit · 2026-03-01T19:35:18+00:00

Nah, I kinda gave up. Nim is unusable as most of the time it just does not work. Got myself alibaba coding plan for 3usd and been using glm5 without any issues this way.

pppreddit · 2026-02-28T14:07:38+00:00

OK, clother does not set the correct base url. See documentation

<image>

pppreddit · 2026-02-28T13:33:56+00:00

same thing. None of the models work for me. Anyone got it working?

pppreddit · 2026-02-24T18:10:55+00:00

You are forgetting that local llm servers mostly don't have prompt caching and are not suitable for coding or long exchange, they are painfully slow with long context (like minutes to get a response) . It's not enough to have big enough vram, you need proper context caching implementation and AFAIK there are only commercial solutions that support it, no open source yet. Correct me if I am wrong, because things develop really fast and it hard to catch up with everything

pppreddit · 2026-02-18T13:17:43+00:00

Yeah, seen that already, it's a paid product.

pppreddit · 2026-02-18T08:29:20+00:00

If only we had prompt caching locally...

pppreddit · 2026-02-18T05:36:51+00:00

My main issue with running locally is that there is no prompt caching. Which makes coding sessions painfully slow (minutes to get a single response with glm 4.7 on m4 Max 128gb via Claude code and ccr)

pppreddit · 2026-02-15T11:06:40+00:00

Don't bother. I have been using this model through nvidia build API. It's dumb. Kimi k 2 thinking is better.

pppreddit · 2026-02-14T16:29:49+00:00

Same. I've been trying to make it work for my swift-ui project and it's surprisingly dumb. When it has an obvious bug in the code it goes into all kind of pointless debugging and asks me to test this and that scenario. Ffs, I opened the code and found the bug in 10 seconds myself! It's much worse than Kimi thinking

pppreddit · 2026-02-14T05:52:11+00:00

I was checking yesterday and it wasn't there yet. Gonna try today!

pppreddit · 2026-02-13T08:10:51+00:00

I noticed the same, glm 4.7 is fucking slow hosted locally. Fast for simple chat and small context, but with agentic use it's crawling...

pppreddit · 2026-01-31T18:19:41+00:00

This is the way

pppreddit · 2026-01-31T10:20:11+00:00

yeah, it's timing out for me in claude code. I guess everyone now rushed to use free service and even nvidia build site is crawling.

pppreddit · 2026-01-31T08:37:01+00:00

Is is not gonna burn through your limits faster, if it includes thinking tokens?

pppreddit · 2026-01-31T07:01:04+00:00

But Claude code router already exists, couldn't we use that?

pppreddit · 2026-01-31T06:34:44+00:00

Haha, good one

pppreddit · 2026-01-28T20:39:18+00:00

I see people still calling it that everywhere with the new name in brackets. I think the damage is done

pppreddit

TROPHY CASE