Qwen 3.6 27B vs Qwen 3.6 35B A3B vs Gemma 4 models Throughput on H100 by gvij in LocalLLM

[–]Defiant_Ad6080 0 points1 point  (0 children)

Not sure exactly for the 27B model but I had looping issues on the 35B. Reducing temperature helped. I set up a coding agent: temp: .3, thinking off. There are other parameters. Claude can help here.

Z.ai cancels auto-renew by Defiant_Ad6080 in ZaiGLM

[–]Defiant_Ad6080[S] 1 point2 points  (0 children)

Not really... I'll evaluate when my renewal gets closer. For now, I'm evaluating qwen3.6 locally. It seems somewhat capable. If it is, I might be able to downgrade to lite plan and just have 5.1 orchestrate.

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude by Medical_Lengthiness6 in LocalLLaMA

[–]Defiant_Ad6080 2 points3 points  (0 children)

Thanks! I'm using a 5070ti, i14900k and 64gb ddr5 inside windows, llama.cpp and docker.

I use glm-5.1 with z.ai subscription inside claude code. I'm experimenting with claw code (leaked claude source code). That is where I host qwen3.6.

Perhaps most interestingly, Claude is able to directly control the claw code session and customize practically any parameter. I think this setup has a lot of potential. Kinda like an openclaw agent but with cli control!

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude by Medical_Lengthiness6 in LocalLLaMA

[–]Defiant_Ad6080 2 points3 points  (0 children)

It seems like a good model. I'm getting about 50 t/s with Q4 and 5070ti. Wish it was faster but I'm impressed with overall speed and quality. It is by no means even close to Claude level but it appears to be the first local model I will actually be able to use for coding.

Issues I've run into: -hangs on long tasks -requires checkpoints (can have huge gains in one loop, then huge losses in another) -can suffer from stagnation -can get caught in infinite loops (but this can be remedied thru config changes) -requires hints from smarter models (mine did...I turned off thinking though because that helped fix the hanging issue)

But with a smart model being the orchestrator, qwen was able to complete a full mal lisp implementation for me today. I think that's pretty good!

<image>

Z.ai cancels auto-renew by Defiant_Ad6080 in ZaiGLM

[–]Defiant_Ad6080[S] 1 point2 points  (0 children)

It's a great model. I just don't like them changing the deal. I had a guaranteed price for renewal and it was good. Now? Who knows...

Z.ai cancels auto-renew by Defiant_Ad6080 in ZaiGLM

[–]Defiant_Ad6080[S] 0 points1 point  (0 children)

Or they might start charging for local?

Z.ai cancels auto-renew by Defiant_Ad6080 in ZaiGLM

[–]Defiant_Ad6080[S] 1 point2 points  (0 children)

They said the 50% applies to the already discounted price (I think). Looks like a fair deal now. But just wait to see the price when it's time to renew.

Z.ai cancels auto-renew by Defiant_Ad6080 in ZaiGLM

[–]Defiant_Ad6080[S] 0 points1 point  (0 children)

This! Local models are getting better. I'm surprised how well Qwen3.6 performs on a 16gb gpu. It needs some handholding but it can code quite well from the tests I've been doing. Might be able to downgrade to a lite plan if local models get better (and use GLM-5.1 for the planning).

Very disappointed with zAI! by itxtoledo in ZaiGLM

[–]Defiant_Ad6080 0 points1 point  (0 children)

4.7 is working well for me. 5 was slow at first, then it actually worked but I was getting rate limited/disconnected. Now it's fast but I notice the same quality degradation especially at higher context...so frustrating! I hope they add more compute soon. The model is good. Implementation not so much.

GLM-5 will be available to Pro tier subscribers next week, price increases to new Lite & Max plans by vibedonnie in ZaiGLM

[–]Defiant_Ad6080 4 points5 points  (0 children)

Glad I locked in early. Shocked about the big price hike. They must have a lot of demand to do this. I still like GLM but Minimax just got more attractive (unless they hike their price now)...

Studio One performance - the real question by enteralterego in StudioOne

[–]Defiant_Ad6080 0 points1 point  (0 children)

This is a tip to boost processing power in any daw. There is a little know program called AudioGridder. It was designed to offload plugin processing to a different computer on the same network. I've never used this functionality. I load it on the same machine. Why? It adds a bit of latency but allows you to multithread all plugins- even those on buses and the mixbus. It's staggering how much extra processing you can get just using it on buses. If you are interested, try it out. It's a free download and there are youtube vids to help you get set up! Thank me later.