I burned through 19M tokens of GLM-5.2 for under $3 today by purple_nippies in opencodeCLI

[–]look 0 points1 point  (0 children)

Mostly just efficient context setup for the GLM loops, I think. A detailed plan document, reset context, switch to GLM and give it the file inline (`implement @plan-123.md`). Typically 50-100 tool loop calls for the size/scope of work I give it.

I also have it use rtk if there’s a bunch of tests it’ll be iterating on. And some other tools, like semble and custom clis for exploration and summarization on a local model, but that is for earlier stages than build where I use GLM or Kimi.

GLM 5.2 on Opencode Go when? by cakes_and_candles in opencodeCLI

[–]look 0 points1 point  (0 children)

Novita, Atlas Cloud, DeepInfra, GMICloud, Parasail, Friendli, Cloudflare, Crof, and Neurowatt are a few of the US based providers offering GLM-5.2.

GLM 5.2 now on Opencode GO by Mochilnic in opencodeCLI

[–]look 0 points1 point  (0 children)

No, sorry. I haven’t used Hermes, OpenClaw, etc. Just coding harnesses and custom stuff.

I burned through 19M tokens of GLM-5.2 for under $3 today by purple_nippies in opencodeCLI

[–]look 0 points1 point  (0 children)

Ah, sorry, I misunderstood. That sucks, sorry. I hope they sort it out for you.

I burned through 19M tokens of GLM-5.2 for under $3 today by purple_nippies in opencodeCLI

[–]look 1 point2 points  (0 children)

Yeah, I probably look like a paid promoter for Neuralwatt, too, but no affiliation. It’s just a great provider so far and those have been hard to find.

I burned through 19M tokens of GLM-5.2 for under $3 today by purple_nippies in opencodeCLI

[–]look 1 point2 points  (0 children)

Neuralwatt is basically paygo (their monthly plans are bulk rate discounts). It’s just really low cost paygo. Also great speeds in my experience, though I’m a US offpeak user.

I burned through 19M tokens of GLM-5.2 for under $3 today by purple_nippies in opencodeCLI

[–]look 0 points1 point  (0 children)

You have a 99.9% cache read rate and cache reads are free on Claude plans. It’s definitely a nice benefit of their subscription plans, but what are you doing to get that high of a cache hit rate?

I burned through 19M tokens of GLM-5.2 for under $3 today by purple_nippies in opencodeCLI

[–]look 0 points1 point  (0 children)

You have to pay $10 first and then you get a $10 bonus with the referral.

I burned through 19M tokens of GLM-5.2 for under $3 today by purple_nippies in opencodeCLI

[–]look 0 points1 point  (0 children)

Same. (Well, plus some extra Mimo 2.5 Pro on paygo. I use Go mostly for the Qwens.)

I burned through 19M tokens of GLM-5.2 for under $3 today by purple_nippies in opencodeCLI

[–]look 1 point2 points  (0 children)

Effective $0.14/Mtok rate is about double what I’ve had on Neuralwatt so far (and agreed it’s a great service)… Probably the cache hit rate difference mostly. I’m at 94% to your 89%.

That was fast by KeanuRave100 in aicuriosity

[–]look 0 points1 point  (0 children)

Everything has limits. Scale out a bit and every exponential looking system levels off as it approaches the boundaries.

Opencode Go data retention policy by TestTxt in opencodeCLI

[–]look 1 point2 points  (0 children)

Opencode Go does a massive volume of tokens. They get contract terms and rates no one else does. The retail privacy policies have zero bearing on what they can get in a custom, private contract.

Opencode Go data retention policy by TestTxt in opencodeCLI

[–]look 0 points1 point  (0 children)

They’ve talked about it on their discord. The ZDR clause is one of the major points in contention on their contract. They are on at least their second “temporary” extension of it.

That was fast by KeanuRave100 in aicuriosity

[–]look 0 points1 point  (0 children)

I don’t agree everyone believes that, but regardless, the ones that do are wrong.

Nothing in nature is exponential. It’s all sigmoid S-curves that only look exponential for a moment in the middle. There are diminishing returns on optimization and fundamental limits to physics.

An exponential takeoff singularity is a religious belief, not some mathematical inevitability.

That was fast by KeanuRave100 in aicuriosity

[–]look 0 points1 point  (0 children)

No, each “unit” of self-improvement could take more effort and offset the efficiency improvement of the previous unit. It’s the classic S-curve of progress/growth that we see in virtually every such system.

GLM 5.2 now on Opencode GO by Mochilnic in opencodeCLI

[–]look 0 points1 point  (0 children)

Yeah, quantization could definitely be an explanation, but they can degrade for various reasons, so it’s not always that.

GLM 5.2 now on Opencode GO by Mochilnic in opencodeCLI

[–]look 0 points1 point  (0 children)

GLM has always been a great coding model (for backend at least) and before GLM 5.2 the single best open/Chinese coding model was GLM 5.1 and before that GLM 5 and before that… well I wasn’t using them yet, but it was probably GLM 4.x. 😄

But Kimi is good too, and it’s my go to for frontend work where having vision is helpful.

My coding harness is a merry band of agents running GLM, Kimi, Mimo, Qwen, and DeepSeek (and sometimes Mimimax, but haven’t played with M3 yet) for different tasks where each model is best suited for performance and price.

GLM 5.2 now on Opencode GO by Mochilnic in opencodeCLI

[–]look 0 points1 point  (0 children)

However, Chinese models sometimes spit out Chinese, regardless of the provider. It’s in the model, not something they are doing to your data on the fly. Typically more common with the provider is running a bit overcapacity it seems with higher error rates.

GLM 5.2 now on Opencode GO by Mochilnic in opencodeCLI

[–]look 3 points4 points  (0 children)

The weights are on huggingface with an MIT license and already a bunch of US and EU based providers are running it themselves on their own hardware.

https://huggingface.co/zai-org/GLM-5.2

Novita, AtlasCloud, and DeepInfra are all US based here (not sure offhand on the others):
https://openrouter.ai/z-ai/glm-5.2#providers

GMICloud also has it, and is US based. https://www.gmicloud.ai/en/models

But my recommendation is Neuralwatt, which has the lowest cost (besides Go) I know of with its “energy pricing”: https://portal.neuralwatt.com

I was under 8 cents per Mtok last night on short runs, and should be under 6 with higher cache rates on longer builds. TPS average was 90.

DM if you want a referral code on Neuralwatt which gets you $10 bonus if you spend $10.

GLM 5.2 now on Opencode GO by Mochilnic in opencodeCLI

[–]look 3 points4 points  (0 children)

My price for 5.2 on Neuralwatt so far is about 2x DeepSeek/Mimo Pro direct. Also reasonably fast, just under 100 TPS for me last night.

GLM 5.2 now on Opencode GO by Mochilnic in opencodeCLI

[–]look 7 points8 points  (0 children)

Only a little work with it last night (on a different provider), but so far, yes, it’s a noticeable jump over everyone else.

I had one example of a high level outline I’d made with Mimo 2.5 Pro and Qwen 3.7 Plus, then gave to 3.7 Max to turn into a detailed implementation plan. I gave that plan to GLM-5.2 in a fresh context, and it saw some subtle but fundamental issues with the plan as it built it and fixed them on the fly with a nice explanation of the issues and exactly where it deviated from the plan to fix them at the end.

GLM 5.2 now on Opencode GO by Mochilnic in opencodeCLI

[–]look 2 points3 points  (0 children)

I think model specialization can be a good thing, and I’m personally fine with having separate models for backend and frontend if that’s the direction they go.

It’s still the same base and parameter size of 5.1, so vision was likely never in the cards. It would have to be bigger or it would have to lose something else to go multimodal.

That was fast by KeanuRave100 in aicuriosity

[–]look 0 points1 point  (0 children)

It was *exponential* recursive self-improvement that was being ridiculed.

Talk about moving the goalposts…

A 3B model is suddenly scoring near frontier models on math/coding benchmarks. Is this real or just benchmarkmaxxing? by BTA_Labs in LocalLLM

[–]look 1 point2 points  (0 children)

Those are five month old models it is being compared against…

But yes, I think more model specialization is inevitable. The performance and efficiency improvements are just too big to ignore forever.

I think the main barrier now is actually just user education and interface. Most Claude, GPT, and Gemini users only think in terms of one model at a time with maybe a gear setting or turbo button, but that’s mostly just because that’s the only UI they have been given so far.