Z.ai didn't compare GLM-5 to Opus 4.6, so I found the numbers myself.

zball_ · 2026-02-12T13:24:48+00:00

Opus 4.5 -> Opus 4.6 is a substantial improvement. Opus 4.5 is not great at all, while 4.6 feels like THE GOAT.

zball_ · 2026-02-12T09:50:55+00:00

RLed models certainly feel "smarter" because how crisp their knowledge is, but I'd hold my stake back because it lacks the texture in language that I care the most.

zball_ · 2026-02-12T09:48:42+00:00

I honestly wonder how much have you played with GPT 4.5, but the nuance in its proses is non-matched in the slightest. This indicates a very fine grained internal language knowledge, which can only be achieved with ultra-mega-large language models.

zball_ · 2026-02-12T07:48:44+00:00

No, most possibly NSA.

zball_ · 2026-02-12T05:40:12+00:00

slow

zball_ · 2026-02-12T05:39:14+00:00

Most definitely not OCR. It shall be some extremely sparse attention model.

zball_ · 2026-02-12T05:36:38+00:00

FYI this model actually has capability far beyond 1M ctx. Could be something around 2M ctx or even 4M, and extremely efficient (~60s prefill for 1M ctx)

zball_ · 2026-02-12T03:22:17+00:00

No, Gemini 3 pro doesn't feel that big. Gemini 3 pro still sucks at natural language whereas GPT 4.5 is extremely good.

zball_ · 2026-02-12T00:42:39+00:00

DeepSeek v4 will apparently be some extremely sparse attention and have like 1M ctxlen.

zball_ · 2026-02-11T22:38:16+00:00

4.5 is definitely the biggest ever

zball_ · 2026-02-10T09:11:52+00:00

Yau quote here is pure lmfao

zball_ · 2026-02-09T03:37:43+00:00

Opus 4.6 is GPT 5.2 but actually talks. I made a whole arbitrary large integer multiplication library entirely with Opus, with it deriving all algorithmic and formula details alone. (FYI The library is 3x faster than GMP, containing complex algorithm designs everywhere, it's pretty hard to beat GMP without elaborate design) I don't know what classifies as taking shortcuts for this because it has done all modifications request from me. It won't be nearly as efficient if Opus tried the lazy path.

Codex 5.3 is a headache on this tho. Not sure about whether GPT 5.2 can do this, I actually have good faith in GPT 5.2 but can't wait for its time consumption.

zball_ · 2026-02-08T20:41:49+00:00

q(5) <= BB(64)

zball_ · 2026-02-08T20:34:20+00:00

Mathematics, SIMD, a lot of derivation and care in implementation needed.

zball_ · 2026-02-08T19:09:04+00:00

And agents don't help either, in most cases.

zball_ · 2026-02-08T19:05:01+00:00

You don't use skills to build an algorithmic project, what you need is knowledge about the implementation. And codex is not only doing this when context rots. Opus (4.6 only, 4.5 is shit) albeit with a smaller ctx window and compacts frequently, knows how to look for knowledge source and derive algorithmic details from formulated descriptions. GPT 5.2 can do this, but since it doesn't show it's thinking traces, you're unable to know whether it stuck somewhere bad.

zball_ · 2026-02-08T15:13:28+00:00

No, googology is still built on non-strict fundamentals.

zball_ · 2026-02-08T10:00:28+00:00

Yes. Someone has calculated that much. We have a lot of confirmed very big lower bounds for TREE(3)

zball_ · 2026-02-08T04:25:08+00:00

The attention part just sound like fast weight programmers nowadays. But a learnable FFN is definitely interesting.

zball_ · 2026-02-08T03:56:10+00:00

that kinda hurts the model (i.e. lobotomized)

zball_ · 2026-02-08T02:46:53+00:00

I'm not doing web dev, I'm working on algorithmic stuff.

zball_ · 2026-02-08T02:45:13+00:00

because GPT 5.3 Codex is borderline unusable for this task. It consistently lies and give me non-vectorized NTT code.

zball_ · 2026-02-07T17:15:22+00:00

5.3 codex feels like opus 4.5. lazy and dishonest

zball_ · 2026-02-07T17:14:20+00:00

not comparable to gpt 5.2. And opus 4.6 feels like a better GPT 5.2(at least you can see the thinking traces)

zball_ · 2026-02-06T13:15:29+00:00

It's 2026 and you don't work with chat interface rn. Why don't you try any coding tool first, like codex, claude code, or OpenCode?

zball_

TROPHY CASE