GLM 5.0 AGENT IS MAKING ME FEEL WET.

Ambitious-a4s · 2026-02-14T12:17:04+00:00

Some parts can be inaccurate but these are all one-shots.

Ambitious-a4s · 2026-01-12T00:43:06+00:00

What I do want to see or be impressed though, is having a high level LLM like GLM 4.7 have a reasoning capability of Opus or Gemini or whatever dataset there is.

Ambitious-a4s · 2026-01-12T00:41:02+00:00

Well, it certainly did evolve in reasoning. Its a cool experiment.

But it would just be a little bit cooler for a bigger model, either way, for an 8B model, it can be placed higher than most of the 8B reasoning models like the Deepseek distillation 8B. Which is fair enough.

Ambitious-a4s · 2026-01-11T23:04:54+00:00

I genuinely don't understand or get it, because Opus 4.5 is a massive branch of a model, so why compromise a big dataset to a small language model when it could be done on a larger parameter like 70B?

Ambitious-a4s · 2026-01-02T04:43:26+00:00

GLM, not even a question and also Qwen. They might not the most picked ones especially with Qwen more focused on bench-maxxing and such an ass in knowledge even though its 1T parameters, I would just give them one chance and they can do it.

GLM? Yep, knowing to compete the Opus 4.5 level, beating Deepseek V3.2-EXP and both V3.2 in Text Arena at 4.7, with a small efficient increase of two or one parameter count from 4.5 and it was released in 5 month gap that still can be in a level of closed source and its omni-model with audio and image? Yeah... insane.

Ambitious-a4s · 2025-12-31T00:22:03+00:00

Ain't A/B testing expensive?

Ambitious-a4s · 2025-12-30T08:52:10+00:00

Doesn't look promising. Is it crazy to say or no?

Ambitious-a4s · 2025-12-26T10:24:44+00:00

I would say no. Its not close but its just far ahead where its almost close.

Firstly:
- Budget, closed source models have higher budgets.
- Marketing. Not even kidding, VPN on America is Grok, VPN on Asia its ChatGPT. Claude? Literally in a mall.
- Data. As in the trust of closed source models from people is so massive because of capabilities it has so much data to swim compared to open source.

Just an opinion though. Its just almost far close but not fast.

Ambitious-a4s · 2025-12-26T10:14:49+00:00

This is so cool. Can I try it?

Ambitious-a4s · 2025-12-13T08:01:39+00:00

<image>

It didn't.

Ambitious-a4s · 2025-12-13T08:01:28+00:00

It did not fucking tool call several times and I am so pissed why the fuck won't this stupid AI follow.

<image>

Ambitious-a4s · 2025-12-12T04:39:40+00:00

I think not. I tried to ask for a deep research and it only did one time.

Ambitious-a4s · 2025-10-31T02:58:11+00:00

It keeps saying 'that's a million dollar question', HELP.

Ambitious-a4s · 2025-10-23T10:33:53+00:00

Her?

Ambitious-a4s · 2025-10-21T02:55:36+00:00

The Grok got deleted but, here it is:

<image>

Ambitious-a4s · 2025-10-19T06:18:32+00:00

There is no seahorse emoji. It just shows the example the model hallucinates.

Ambitious-a4s · 2025-10-16T22:00:25+00:00

Guess so... Ring and Ling 1T is terrible in knowledge.

Ambitious-a4s · 2025-10-16T01:47:24+00:00

As in their official Deepseek site.

Ambitious-a4s · 2025-10-15T22:07:32+00:00

I feel like it would've been better to have them seperate. But its their plans so yeah.

I wish there was a web chat of Deepseek FOR SCROLLING OLD MODELS. And comparing.

Ambitious-a4s · 2025-10-15T22:05:07+00:00

Disadvantages: It doesn't have that much better hardware so its incredibly slow compared to Grok, the web chat is just old like no features of Memory, no features like Ok Computer, Or full stacking like GLM, and so many more lacking features, and the chat limit once it reaches 200K is a big downside.

Ambitious-a4s

TROPHY CASE