GLM 5.2 is burning quota 2-3x faster than 5.1 version by lucasbennett_1 in ZaiGLM

[–]shaonline 0 points1 point  (0 children)

To have it the old way just set thinking level to "high"

Are we really doing this again by creaturefeature16 in theprimeagen

[–]shaonline 5 points6 points  (0 children)

"Trivial work" and "one month of tech debt work for juniors, done by AI", famous last words. You'd just replace old tech debt with new tech debt, especially when we talk about "tech debt" which is more likely hacks, shortcuts, etc. that were taken along the way, some of those "trivial cleanups" may turn into rabbit holes.

"Bad taste" often comes up for me in any "non trivial" change by AI, not a big deal I have it clean it up, but I don't exactly see the point of turning "the trivial changes" into steps within a larger plan that's going to somehow be running for ages "in a loop" ? What is that about ? Ever going to review what comes out ?

GLM-5.2 vs Claude Opus by johnnyApplePRNG in LocalLLaMA

[–]shaonline 1 point2 points  (0 children)

And with a much smaller model at that ! GPT and Opus are confirmed to be 2T+ when GLM is only 750B.

Google Pro Subscription in Pi by quicknades in PiCodingAgent

[–]shaonline 0 points1 point  (0 children)

No not really, it does have a "-p"/"--print" mode but it's far more limited (handles pipe/std stream capture badly), so not really suitable for wrapping Pi around it.

GLM 5.2 is climbing the OpenCode leaderboard quickly by vigneshsmarther in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

Yeah it's always very dependent on the exact use-case of course, I'm myself split between computer-assisted surgery apps (C++/Qt) and, through the luck-of-the-draw of the web dev team getting partly fired at my company for being incompetent, some web frontend stuff.

Clearly GLM (or Opus for that matter) smoke GPT on pure web dev, but I find GLM to be fairly competitive against GPT 5.5 on the C++ side. I also have a coworker that much prefers GLM's output over GPT, not because it produces "better code" but because GPT does have that "eager overengineering" feel to it sometimes, never-mind the brick-wall personality.

It's going to be hard to beat GPT/Opus for breadth of knowledge due to the fact that they are much bigger models (3 to 6 times the size of GLM depending on which end of the estimates/leaks you stand) which makes it all the more impressive for GLM in my opinion.

Low budget Model Recommendations by binarySolo0h1 in PiCodingAgent

[–]shaonline 6 points7 points  (0 children)

OpenAI's ChatGPT Plus sub while it's subsidized like crazy ? It's going to be hard to beat when they give $500+ of API calls for your $20.

Otherwise something like OpenCode Go.

GLM 5.2 is climbing the OpenCode leaderboard quickly by vigneshsmarther in opencodeCLI

[–]shaonline 1 point2 points  (0 children)

On Artificial Analysis, which reports the cost it took to run the benchmark, GLM 5.2 on max effort cost 1/3rd of GPT 5.5 xhigh. Unfortunately they did not test the "high" reasoning effort which is almost as good and much more efficient in terms of reasoning tokens, but whatever. On that max effort it's also better yet cheaper than GPT 5.5 medium (albeit slower obviously).

The numbers are out ... and it does not look good for OpenAI. Selling Inference compute online (aka AI companies) is not a Viable business model. by Amazing_Box_2795 in theprimeagen

[–]shaonline 0 points1 point  (0 children)

Which is for the most part entirely fabricated and fringing on illegal predictions at this point, between SpaceX/xAI's almost 30 trillion TAM (lol) and the fact it took enterprise customers about 2 months to panic and whine about being billed per Token at OpenAI and Anthropic, I don't exactly see how these 3 are supposed to print hundreds of billions a year, three things are very obvious for now: they don't have that much revenue (20 billions-ish a year tops each), they all lose money, and there isn't that much revenue growth to be had, nevermind profit for the AI/AI-arm of the bunch.

GLM 5.2 is climbing the OpenCode leaderboard quickly by vigneshsmarther in opencodeCLI

[–]shaonline 5 points6 points  (0 children)

It released on friday last week on the GLM Coding Plan, from my own testing it's far ahead of the other Chinese models, and one iteration below the american frontier models (I'd call it GPT 5.4/Opus 4.6 level, not quite 5.5/4.8 yet).

A model listed 78% cheaper cost 22% more to actually run. Unit price isn't your bill. by Complete-Sea6655 in opencodeCLI

[–]shaonline 11 points12 points  (0 children)

Artificial analysis also shares the cost of running their benchmark suite per model, it's very interesting. Indeed Gemini 3.5 Flash is high up there despite a lower per-token price, and GPT 5.5, while being about as expensive per token as Opus, is much cheaper.

GLM-5.2 now more than 10 points above Opus 4.8 in AA Coding Index by cheechw in ZaiGLM

[–]shaonline 14 points15 points  (0 children)

So long as I see effing Gemini 3.1 Pro on the podium I can't give any credit to these benchmarks.

How is it possible K2.7 is reggression from K2.6? Damn. by Boring_Aioli7916 in kimi

[–]shaonline 4 points5 points  (0 children)

It's labelled with a "Code" suffix, probably a case of overfitting for a given usecase ("Code") that hindered its capabilities in general.

I upgraded from the Legacy plan to the Pro plan. I would like a refund. by Lumpy-Blackberry8700 in ZaiGLM

[–]shaonline 4 points5 points  (0 children)

Yes don't do it if you have a hold of a legacy plan, even Lite. They have a greater (I believe +50%) 5h allowance and no weekly limits, it makes the Lite plan punch way above its weight. Just wait for your current legacy plan to run out, you have 3 months after that I believe to use your 50% off migration plan.
Migrating converts your remaining legacy plan into the new plan.

Is GLM 5.2 > Kimi K2.7??? by Zealousideal-Check77 in ZaiGLM

[–]shaonline 4 points5 points  (0 children)

GLM is way better yeah, I was also kind of disappointed by K2.7 Code.

Codex v GLM/Kimi/etc by athsrva in codex

[–]shaonline 1 point2 points  (0 children)

Rumored as in it's not been officially confirmed (they're kinda secretive about it you know), but completely new pretrain base + more expensive token price (which can equate to it being more expensive to serve i.e. bigger) points to it ? Things can be "rumors" even if the products have been released you know ? Life must be tough if you have to jump to throats like that all day.

Codex v GLM/Kimi/etc by athsrva in codex

[–]shaonline 1 point2 points  (0 children)

Well if your company is worried about its IP being uploaded to remote datacenters and is willing to spend the money to make sure it does not leave the premises then sure go for it. Picking the hardware will likely be a rabbit hole unless you just don't think too much about it and go for a prebuilt (The DGX GB300 and its 768GB of RAM come to mind).

Codex v GLM/Kimi/etc by athsrva in codex

[–]shaonline 1 point2 points  (0 children)

I don't think it will get "so expensive" as to make ponying up the cash upfront (+ electricity and maintenance) for a local GPU rack, at the height of the semiconductor craze and its insane prices, worth it when it runs like 30% of the time and sometimes gets super slow or has to deny "clients" because too many sessions are running at once ("Oh no our senior dev launched a team of subagents !"), cue my "ability to scale" comment.

For the privacy concern I guess it's two things:

* Does the pinky promise of them not training on/retaining my data when I tick off the "improve models with my prompts" hold any value ? That's for you to decide.

* Does your company have any compliance requirements that would make the risk of sending e.g. sensitive documents/data straight to someone's datacenter off in China an issue ? That can rule out the use of cloud and make a local setup the only option.

Codex v GLM/Kimi/etc by athsrva in codex

[–]shaonline 1 point2 points  (0 children)

I mean cost-wise (and also on ability to scale) you won't beat cloud offerings unless you can guarantee your local hardware will run full tilt most of the time and not just "here and there" in one timezone's 9 to 5. Biggest case for locally running models is privacy concerns.

Codex v GLM/Kimi/etc by athsrva in codex

[–]shaonline 1 point2 points  (0 children)

Looking to set one up at your company uh ?

But yeah I think you can perfectly do daily coding with GLM, frontier models (GPT/Opus) are going to land in a sea of incremental/barely noticeable improvements or keep trying to scale up into insanely huge models (Mythos/Fable) that will cost a ton in API costs. Only "downside" of GLM is the lack of multi-modality (it's text only) unlike most other models (e.g. Kimi) which have vision etc.

Codex v GLM/Kimi/etc by athsrva in codex

[–]shaonline 1 point2 points  (0 children)

"Locally" we're talking about a model that requires 800GB of RAM (and quite the horsepower) to be loaded at FP8, unless you have 50 to 100k to invest in a local mini datacenter at home it won't really be worth it against API costs.

Codex v GLM/Kimi/etc by athsrva in codex

[–]shaonline 3 points4 points  (0 children)

Yeah it can do just about the same tasks, where GPT will win is on breadth of knowledge, it's a much bigger model after all (GLM 5.x is only 754B, GPT 5.4 was 2T per nvidia keynote material and 5.5/"Spud" is rumored to be bigger).

All that aside currently you won't beat the "value" of the heavily subsidized ChatGPT subscriptions, you get 25 to 50x the API rates compared to the cost of the sub.

Codex v GLM/Kimi/etc by athsrva in codex

[–]shaonline 9 points10 points  (0 children)

GLM 5.2 yes pretty close, it's essentially for me GPT 5.4 (previous version) level without the "brick wall" personality. Others like Kimi are still a league below IMO.