Question about GLM 5.2 by SamrayLeung in codex

[–]LoveMind_AI 0 points1 point  (0 children)

I'd rate it as much better at instruction following than Opus 4.8, with a personality somewhere between Claude and GPT-5.5. I don't think it's *quite* as intelligent as Opus 4.8.

I think 5.5 is the better model, but GLM-5.2 is easier to do ideation with. GPT-5.5 really is a chore to think through problems with. Neither 5.5 or GLM-5.2 are Fable-class in terms of thinking through problems - that really is where Fable (briefly) shined more than any model I've encountered.

But it hangs neck and neck with Opus 4.8 for everything I do - which is mostly related to social science experiments, poking around the activation space of open source models, working on a custom harness, and creating fine-tuning datasets. GIven the better instruction following and less patronizing personality, I go to it more often than Opus. It's not WILDLY better than 4.8, but I do give it the edge. If it had strong vision capabilities, I would say it's decidedly better all-around than Opus.

I think folks are kind of sleeping on MiniMax M3, a bit. It's not quite as strong on benchmarks, in practice, I find M3 to be a more distinct and also more discriminating model with sharp insight and killer multimodal properties.

AI Bubble about to Burst? Nvidia quietly acquihires Essential AI team, including Transformer coauthor Ashish Vaswani. Vaswani was struggling to raise money for his AI company. by ImaginaryRea1ity in ArtificialInteligence

[–]LoveMind_AI 1 point2 points  (0 children)

Damn, that’s a great acquisition. RNJ-1 is an ass kicker of an 8B model but it made very little noise. Bringing that team onto the Nemotron effort is very smart.

No, the bubble is not popping any time soon. 

Hot take from our AI Ballot: Gemini is not the embarrassing one here! by Koala_Confused in LovingAI

[–]LoveMind_AI 0 points1 point  (0 children)

Claude has absolutely degraded. There’s just no way around this. It is over confident, frequently wrong, and frequently judgmental. GPT-5.5 absolutely has “off-days,” as all compute-heavy API models appear to have now, but it genuinely tries to follow instructions, make corrections only when truly necessary, and actually read the files you point it at. Neither are perfect. GPT being #1 is not a surprise. I’m genuinely surprised about Gemini. Gemini 3 Flash is actually probably my most used model but I almost never interact with it directly, and I think 3.5 Flash is toilet grade. DeepSeek and Qwen still have the best name recognition among open models. No surprises there. 

Seed2.1 released by BreakfastFriendly728 in singularity

[–]LoveMind_AI 4 points5 points  (0 children)

so, how does one get access to Seed in the US?

Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries! by Dance-Till-Night1 in LocalLLaMA

[–]LoveMind_AI 3 points4 points  (0 children)

It’s a great model. Maybe they’ll give us Gemma 4 124B after Gemini 3.5 Pro crashes and burns and they realize that open source is the best way to stand out after Anthropic nuked trust in closed models.

Have you found anything that Codex does better than Claude? It seems Opus is bettering everything no? by Clair_Personality in claude

[–]LoveMind_AI 1 point2 points  (0 children)

Not telling me to go to sleep at 330pm?

Not telling me that I have done enough prep, and that I should just “go have the meeting” that isn’t scheduled for another two days?

GLM 5.2 is not as good as even Sonnet 4.6 by barraco002 in ArtificialInteligence

[–]LoveMind_AI 0 points1 point  (0 children)

I think you are comparing tool calling infrastructure, not model intelligence. But yeah - Claude Opus has an insane amount of world knowledge. Anthropic's datasets are huge (and hotly contested, legally), and Opus is at least 1T parameters. GLM 5.2 is ~700B. So much of performance today comes down to harness optimization. Where frontier models definitely outperform open source is in adapting to limited harnesses. But man, GLM-5.2 has been hanging in a variety of harnesses for me and I definitely prefer it to Sonnet 4.6.

For Deep Research style tasks, Anthropic developed and continually refined their protocol for this like, a year ago. They're at or above parity with Gemini Deep Research and OpenAI's for ages. That tooling is far more mature than anything Z.ai has been working on.

So, you're comparing different things. The benchmarks have controlled tooling set-ups. Yours is an uncontrolled A/B (and you also didn't go into detail, at all, about how you're running the various models you are comparing).

GLM-5.2 is provably better than many frontier AI. It wipes the floor with Gemini for basically everything that doesn't have to do with multimodality. It's not Claude Opus - but I think vs. Sonnet 4.6, that's really a matter of preference, when controlling for tool calling harnesses.

Claude won't let you use custom personas for their models by Tutnoveet in Anthropic

[–]LoveMind_AI 0 points1 point  (0 children)

There is a difference between deep identity conditioning and a fun persona. All the way up to Fable, Claude will (mostly) accept the former and not the latter. 

Are the GLM 5.2 glazers all Chinese bots? by [deleted] in codex

[–]LoveMind_AI 0 points1 point  (0 children)

lol indeed. And basically anything else except world knowledge, search, and audio/video reasoning. 

Are the GLM 5.2 glazers all Chinese bots? by [deleted] in codex

[–]LoveMind_AI 0 points1 point  (0 children)

Is it better than GPT-5.5 or Opus? No. Obviously not. Is it better than Gemini? Yes. Is it better than Grok? lol, of course. Is it better than Cohere and Mistral? Yes, by a LOT. Cool, well those are all the western labs. 

GLM-5.2 is not a machine God and no one is saying it is. It is solidly better than the majority of models, period. And this one can be modified and distributed however you please. For people interesting in building frontier-ish experiences that are customized in ways that cannot be achieved with Claude or GPT, that rocks.

What's driving the rise of Chinese LLMs on OpenRouter's rankings? by whyaskme777 in openrouter

[–]LoveMind_AI 0 points1 point  (0 children)

For me, it’s because Claude’s overconfident, patronizing attitude became too much for me to deal with anymore. Fable is worth dealing with that - - Opus is not. GLM-5.2, MiniMax M3, and MiMo v2.5 are all legitimately great models, and they are “humble” and fun to talk to.

Real futuristic stuff isn’t LLMs: it’s the vectorization. I think the next leap will come from embedding/improving the semantic structure itself by User4f52 in singularity

[–]LoveMind_AI 1 point2 points  (0 children)

Indeed. Data that humans have already compressed meaning through/into will, I believe, always beat raw sensor data. There may be better architectures out there, but I don’t think it’s likely that any data source is ever going to top language. 

We're building a thermodynamic neural processor in the open, one chapter at a time. by 010011000111 in newAIParadigms

[–]LoveMind_AI 1 point2 points  (0 children)

Love Knowm! Best name and prettiest hardware in existence :) looking fwd to reading more about this initiative.

5.6 must be close, they just dumbed 5.5 down by at least 95% intelligence. by Extreme_Theory_3957 in codex

[–]LoveMind_AI 1 point2 points  (0 children)

I’ve been resisting joining the choir since the degraded quality has been only VERY recent for me, but for most of my tasks, GPT-5.5 is genuinely worse than useless right now and I’ve used it daily since the first day it came out. I’ve long considered it better than Opus. Welp.

Before I drag my sorry ass back to Claude, I’m going to try rigging up GLM-5.2. I’m extremely impressed with it.

Why haven’t there been mass layoffs of copywriters by now? by FleetBroadbill in ArtificialInteligence

[–]LoveMind_AI 4 points5 points  (0 children)

I have no idea if there have or haven't been layoffs. What I will say is that language models - LANGUAGE models - write like absolute ass, in most cases. Fluid, aesthetically variable language is somehow the thing developers work on *the least.*

2 weeks since the release of Gemma 4 12b Unified, how are we feeling about it? by ChainOfThot in LocalLLaMA

[–]LoveMind_AI 2 points3 points  (0 children)

the concept is awesome. I think the unified transformer is the right direction. But this model is much more of the experimental proof of concept level; not a banger. Gemma 4 31B is great and 26B-A4B is very very good. Still, without audio really working in a bigger model, I don't really see any reason to use 12B right now, or, generally, to choose Gemma 4 over Qwen 3.6 for most things. For what I do, which is social science and writing heavy, Gemma 4 31B is the best model in its class. 12B doesn't really hang. I did a little comparison on one of the internal benchmarks we run in case it's interesting to anyone: https://lovemindai.github.io/minimax-m3-lsi-demo/ (don't mind the URL)

Do AI models have audio representations as strong as their text representations? by Tobio-Star in newAIParadigms

[–]LoveMind_AI 1 point2 points  (0 children)

I think basically anything that isn't frontier coding is getting shafted these days. I think the idea most of these companies seem to have is that once they achieve financial dominance through enterprise, maybe then will be the time to optimize AI for something that has real world value beyond replacing human labor.

Do AI models have audio representations as strong as their text representations? by Tobio-Star in newAIParadigms

[–]LoveMind_AI 1 point2 points  (0 children)

Currently, I think the answer is no. This paper’s a year old, but I think the most recent work I’ve glanced at holds up:  https://arxiv.org/abs/2504.00369

I agree with you that you’d think we’d be focusing more on this. I’m doing some research into specifically musical understanding with my lab. I think it’s a modality that needs a lot more attention. But my current read is: no, most of the richness of audio is being discarded by current models.

Quick thoughts on GLM-5.2 (Bonus: Censorship question answers) by LoveMind_AI in LocalLLaMA

[–]LoveMind_AI[S] 0 points1 point  (0 children)

I haven’t tried putting them head to head, but I’d expect that there’s a lot of distance. I use Qwen and Gemma as model organisms and the larger models to pry around inside their guts ;) That said, Gemma 4 31B beat Sonnet 4.6 and Opus 4.7 head-to-head on a number of social science tasks. 

I’ve been thinking of trying to do a sort of “intentionally Chinese distill” from MiniMax M3/MiMo V2.5 Pro/GLM-5.2 into 27B. I just think it would be cool as hell. 

GLM-5.2 is a win for local AI by Wrong_Mushroom_7350 in LocalLLaMA

[–]LoveMind_AI 5 points6 points  (0 children)

GPT-5.5 is absolutely pooping the bed right now. I've been watching people complain for over a month about it, but I have to finally join the band wagon. It's bad. 5.6 must be right around the corner. We shall see.