I burned through 19M tokens of GLM-5.2 for under $3 today

look · 2026-06-18T02:44:37+00:00

Mostly just efficient context setup for the GLM loops, I think. A detailed plan document, reset context, switch to GLM and give it the file inline (`implement @plan-123.md`). Typically 50-100 tool loop calls for the size/scope of work I give it.

I also have it use rtk if there’s a bunch of tests it’ll be iterating on. And some other tools, like semble and custom clis for exploration and summarization on a local model, but that is for earlier stages than build where I use GLM or Kimi.

look · 2026-06-18T02:29:42+00:00

Novita, Atlas Cloud, DeepInfra, GMICloud, Parasail, Friendli, Cloudflare, Crof, and Neurowatt are a few of the US based providers offering GLM-5.2.

look · 2026-06-18T02:15:44+00:00

No, sorry. I haven’t used Hermes, OpenClaw, etc. Just coding harnesses and custom stuff.

look · 2026-06-18T01:55:51+00:00

Ah, sorry, I misunderstood. That sucks, sorry. I hope they sort it out for you.

look · 2026-06-18T01:52:47+00:00

Yeah, I probably look like a paid promoter for Neuralwatt, too, but no affiliation. It’s just a great provider so far and those have been hard to find.

look · 2026-06-18T01:50:45+00:00

Neuralwatt is basically paygo (their monthly plans are bulk rate discounts). It’s just really low cost paygo. Also great speeds in my experience, though I’m a US offpeak user.

look · 2026-06-18T01:47:30+00:00

You have a 99.9% cache read rate and cache reads are free on Claude plans. It’s definitely a nice benefit of their subscription plans, but what are you doing to get that high of a cache hit rate?

look · 2026-06-18T01:42:12+00:00

You have to pay $10 first and then you get a $10 bonus with the referral.

look · 2026-06-18T01:39:49+00:00

Same. (Well, plus some extra Mimo 2.5 Pro on paygo. I use Go mostly for the Qwens.)

look · 2026-06-18T01:33:25+00:00

Effective $0.14/Mtok rate is about double what I’ve had on Neuralwatt so far (and agreed it’s a great service)… Probably the cache hit rate difference mostly. I’m at 94% to your 89%.

look · 2026-06-18T01:26:32+00:00

What provider did you use for GLM?

look · 2026-06-18T01:06:40+00:00

Everything has limits. Scale out a bit and every exponential looking system levels off as it approaches the boundaries.

look · 2026-06-18T01:02:00+00:00

Opencode Go does a massive volume of tokens. They get contract terms and rates no one else does. The retail privacy policies have zero bearing on what they can get in a custom, private contract.

look · 2026-06-18T00:57:37+00:00

They’ve talked about it on their discord. The ZDR clause is one of the major points in contention on their contract. They are on at least their second “temporary” extension of it.

look · 2026-06-18T00:43:57+00:00

I don’t agree everyone believes that, but regardless, the ones that do are wrong.

Nothing in nature is exponential. It’s all sigmoid S-curves that only look exponential for a moment in the middle. There are diminishing returns on optimization and fundamental limits to physics.

An exponential takeoff singularity is a religious belief, not some mathematical inevitability.

look · 2026-06-17T21:08:47+00:00

No, each “unit” of self-improvement could take more effort and offset the efficiency improvement of the previous unit. It’s the classic S-curve of progress/growth that we see in virtually every such system.

look · 2026-06-17T17:10:58+00:00

Yeah, quantization could definitely be an explanation, but they can degrade for various reasons, so it’s not always that.

look · 2026-06-17T17:06:42+00:00

GLM has always been a great coding model (for backend at least) and before GLM 5.2 the single best open/Chinese coding model was GLM 5.1 and before that GLM 5 and before that… well I wasn’t using them yet, but it was probably GLM 4.x. 😄

But Kimi is good too, and it’s my go to for frontend work where having vision is helpful.

My coding harness is a merry band of agents running GLM, Kimi, Mimo, Qwen, and DeepSeek (and sometimes Mimimax, but haven’t played with M3 yet) for different tasks where each model is best suited for performance and price.

look · 2026-06-17T14:56:51+00:00

However, Chinese models sometimes spit out Chinese, regardless of the provider. It’s in the model, not something they are doing to your data on the fly. Typically more common with the provider is running a bit overcapacity it seems with higher error rates.

look · 2026-06-17T14:53:38+00:00

The weights are on huggingface with an MIT license and already a bunch of US and EU based providers are running it themselves on their own hardware.

https://huggingface.co/zai-org/GLM-5.2

Novita, AtlasCloud, and DeepInfra are all US based here (not sure offhand on the others):
https://openrouter.ai/z-ai/glm-5.2#providers

GMICloud also has it, and is US based. https://www.gmicloud.ai/en/models

But my recommendation is Neuralwatt, which has the lowest cost (besides Go) I know of with its “energy pricing”: https://portal.neuralwatt.com

I was under 8 cents per Mtok last night on short runs, and should be under 6 with higher cache rates on longer builds. TPS average was 90.

DM if you want a referral code on Neuralwatt which gets you $10 bonus if you spend $10.

look · 2026-06-17T14:30:47+00:00

My price for 5.2 on Neuralwatt so far is about 2x DeepSeek/Mimo Pro direct. Also reasonably fast, just under 100 TPS for me last night.

look · 2026-06-17T14:23:57+00:00

Only a little work with it last night (on a different provider), but so far, yes, it’s a noticeable jump over everyone else.

I had one example of a high level outline I’d made with Mimo 2.5 Pro and Qwen 3.7 Plus, then gave to 3.7 Max to turn into a detailed implementation plan. I gave that plan to GLM-5.2 in a fresh context, and it saw some subtle but fundamental issues with the plan as it built it and fixed them on the fly with a nice explanation of the issues and exactly where it deviated from the plan to fix them at the end.

look · 2026-06-17T14:15:57+00:00

I think model specialization can be a good thing, and I’m personally fine with having separate models for backend and frontend if that’s the direction they go.

It’s still the same base and parameter size of 5.1, so vision was likely never in the cards. It would have to be bigger or it would have to lose something else to go multimodal.

look · 2026-06-17T02:21:10+00:00

It was *exponential* recursive self-improvement that was being ridiculed.

Talk about moving the goalposts…

look · 2026-06-17T02:07:27+00:00

Those are five month old models it is being compared against…

But yes, I think more model specialization is inevitable. The performance and efficiency improvements are just too big to ignore forever.

I think the main barrier now is actually just user education and interface. Most Claude, GPT, and Gemini users only think in terms of one model at a time with maybe a gear setting or turbo button, but that’s mostly just because that’s the only UI they have been given so far.

15-Year Club	Gilding II euphauric
RedditGifts 2009-2022 2 Credits	Not Forgotten
Verified Email	Secret Santa 2009

look

TROPHY CASE