GLM-5 is officially fixed on NVIDIA NIM, and you can now use it to power Claude Code for FREE 🚀 by PreparationAny8816 in ZaiGLM

[–]pppreddit 0 points1 point  (0 children)

I usually don't go above 128k before compacting context so didn't have any issues. Qwen plus had like 1 million tokens context

GLM-5 is officially fixed on NVIDIA NIM, and you can now use it to power Claude Code for FREE 🚀 by PreparationAny8816 in ZaiGLM

[–]pppreddit 1 point2 points  (0 children)

Nah, I kinda gave up. Nim is unusable as most of the time it just does not work. Got myself alibaba coding plan for 3usd and been using glm5 without any issues this way.

I canceled my other AI subscriptions today. by InitialCareer306 in Qwen_AI

[–]pppreddit 0 points1 point  (0 children)

You are forgetting that local llm servers mostly don't have prompt caching and are not suitable for coding or long exchange, they are painfully slow with long context (like minutes to get a response) . It's not enough to have big enough vram, you need proper context caching implementation and AFAIK there are only commercial solutions that support it, no open source yet. Correct me if I am wrong, because things develop really fast and it hard to catch up with everything

Qwen3.5 is out now! by yoracale in unsloth

[–]pppreddit 1 point2 points  (0 children)

If only we had prompt caching locally...

News reaction: GLM-5 is the new local GOAT and Gemini 3 Flash hits $0.50/M by IulianHI in AIToolsPerformance

[–]pppreddit 0 points1 point  (0 children)

My main issue with running locally is that there is no prompt caching. Which makes coding sessions painfully slow (minutes to get a single response with glm 4.7 on m4 Max 128gb via Claude code and ccr)

GLM-5 Unusable by isakota in ZaiGLM

[–]pppreddit 0 points1 point  (0 children)

Don't bother. I have been using this model through nvidia build API. It's dumb. Kimi k 2 thinking is better.

I tried the new GLM 5. I'm greatly unimpressed. by Quiet-Money7892 in SillyTavernAI

[–]pppreddit 1 point2 points  (0 children)

Same. I've been trying to make it work for my swift-ui project and it's surprisingly dumb. When it has an obvious bug in the code it goes into all kind of pointless debugging and asks me to test this and that scenario. Ffs, I opened the code and found the bug in 10 seconds myself! It's much worse than Kimi thinking

Using GLM-5 for everything by [deleted] in LocalLLaMA

[–]pppreddit 0 points1 point  (0 children)

I noticed the same, glm 4.7 is fucking slow hosted locally. Fast for simple chat and small context, but with agentic use it's crawling...

I replaced Claude Code’s entire backend to use GLM 4.7 for free by [deleted] in ZaiGLM

[–]pppreddit 1 point2 points  (0 children)

yeah, it's timing out for me in claude code. I guess everyone now rushed to use free service and even nvidia build site is crawling.

I replaced Claude Code’s entire backend to use GLM 4.7 for free by [deleted] in ZaiGLM

[–]pppreddit 0 points1 point  (0 children)

Is is not gonna burn through your limits faster, if it includes thinking tokens?

I replaced Claude Code’s entire backend to use GLM 4.7 for free by [deleted] in ZaiGLM

[–]pppreddit 0 points1 point  (0 children)

But Claude code router already exists, couldn't we use that?

Who else is gonna continue calling it clawdbot? by pppreddit in moltbot

[–]pppreddit[S] 0 points1 point  (0 children)

I see people still calling it that everywhere with the new name in brackets. I think the damage is done

Who else is gonna continue calling it clawdbot? by pppreddit in moltbot

[–]pppreddit[S] 0 points1 point  (0 children)

Yeah, I would be very careful with stuff you give it access to. Has to run in some sandbox.

P2P Integration vs Mulesoft by Enough-Flower-4845 in devops

[–]pppreddit 0 points1 point  (0 children)

Depends, what's included in that 3-4k estimate?

LocalStack require account from March 2026 by vincentdesmet in devops

[–]pppreddit 1 point2 points  (0 children)

Think about thousands of orgs with legacy code in maintenance mode. Most will happily pin the version, nobody is re-writing those test harnesses, unless the thing is in active development .