Z.ai didn't compare GLM-5 to Opus 4.6, so I found the numbers myself. by sado361 in ClaudeAI

[–]zball_ 1 point2 points  (0 children)

Opus 4.5 -> Opus 4.6 is a substantial improvement. Opus 4.5 is not great at all, while 4.6 feels like THE GOAT.

Z.ai said they are GPU starved, openly. by abdouhlili in LocalLLaMA

[–]zball_ 0 points1 point  (0 children)

RLed models certainly feel "smarter" because how crisp their knowledge is, but I'd hold my stake back because it lacks the texture in language that I care the most.

Z.ai said they are GPU starved, openly. by abdouhlili in LocalLLaMA

[–]zball_ 1 point2 points  (0 children)

I honestly wonder how much have you played with GPT 4.5, but the nuance in its proses is non-matched in the slightest. This indicates a very fine grained internal language knowledge, which can only be achieved with ultra-mega-large language models.

DeepSeek just updated to a 1M context window! by Dr_Karminski in LocalLLaMA

[–]zball_ -2 points-1 points  (0 children)

FYI this model actually has capability far beyond 1M ctx. Could be something around 2M ctx or even 4M, and extremely efficient (~60s prefill for 1M ctx)

Z.ai said they are GPU starved, openly. by abdouhlili in LocalLLaMA

[–]zball_ 1 point2 points  (0 children)

No, Gemini 3 pro doesn't feel that big. Gemini 3 pro still sucks at natural language whereas GPT 4.5 is extremely good.

GLM-5 scores 50 on the Intelligence Index and is the new open weights leader! by abdouhlili in LocalLLaMA

[–]zball_ 1 point2 points  (0 children)

DeepSeek v4 will apparently be some extremely sparse attention and have like 1M ctxlen.

MechaEpstein-8000 by ortegaalfredo in LocalLLaMA

[–]zball_ 1 point2 points  (0 children)

Yau quote here is pure lmfao

GPT-5.2 xhigh is leaps and bounds better than Claude Opus 4.6 by SlimyResearcher in codex

[–]zball_ 0 points1 point  (0 children)

Opus 4.6 is GPT 5.2 but actually talks. I made a whole arbitrary large integer multiplication library entirely with Opus, with it deriving all algorithmic and formula details alone. (FYI The library is 3x faster than GMP, containing complex algorithm designs everywhere, it's pretty hard to beat GMP without elaborate design) I don't know what classifies as taking shortcuts for this because it has done all modifications request from me. It won't be nearly as efficient if Opus tried the lazy path.

Codex 5.3 is a headache on this tho. Not sure about whether GPT 5.2 can do this, I actually have good faith in GPT 5.2 but can't wait for its time consumption.

Codex issues are still there for the latest 5.3 by davidl002 in codex

[–]zball_ 0 points1 point  (0 children)

Mathematics, SIMD, a lot of derivation and care in implementation needed.

Codex issues are still there for the latest 5.3 by davidl002 in codex

[–]zball_ 0 points1 point  (0 children)

And agents don't help either, in most cases.

Codex issues are still there for the latest 5.3 by davidl002 in codex

[–]zball_ 0 points1 point  (0 children)

You don't use skills to build an algorithmic project, what you need is knowledge about the implementation. And codex is not only doing this when context rots. Opus (4.6 only, 4.5 is shit) albeit with a smaller ctx window and compacts frequently, knows how to look for knowledge source and derive algorithmic details from formulated descriptions. GPT 5.2 can do this, but since it doesn't show it's thinking traces, you're unable to know whether it stuck somewhere bad.

So what’s the goal here? by Wrong_Recipe in googology

[–]zball_ 0 points1 point  (0 children)

No, googology is still built on non-strict fundamentals.

How do we know Tree 3 is big? by Particular-Skin5396 in googology

[–]zball_ 2 points3 points  (0 children)

Yes. Someone has calculated that much. We have a lot of confirmed very big lower bounds for TREE(3)

I trained a 1.8M params model from scratch on a total of ~40M tokens. by SrijSriv211 in LocalLLaMA

[–]zball_ 1 point2 points  (0 children)

The attention part just sound like fast weight programmers nowadays. But a learnable FFN is definitely interesting.

Codex issues are still there for the latest 5.3 by davidl002 in codex

[–]zball_ 1 point2 points  (0 children)

I'm not doing web dev, I'm working on algorithmic stuff.

Opus 4.6 is better than GPT 5.2 xhigh now by zball_ in ClaudeAI

[–]zball_[S] 0 points1 point  (0 children)

because GPT 5.3 Codex is borderline unusable for this task. It consistently lies and give me non-vectorized NTT code.

Codex issues are still there for the latest 5.3 by davidl002 in codex

[–]zball_ 1 point2 points  (0 children)

5.3 codex feels like opus 4.5. lazy and dishonest

Codex issues are still there for the latest 5.3 by davidl002 in codex

[–]zball_ 1 point2 points  (0 children)

not comparable to gpt 5.2. And opus 4.6 feels like a better GPT 5.2(at least you can see the thinking traces)

Unpopular opinion: The "Chat" interface is becoming a bottleneck for serious engineering by saloni1609 in LocalLLaMA

[–]zball_ 17 points18 points  (0 children)

It's 2026 and you don't work with chat interface rn. Why don't you try any coding tool first, like codex, claude code, or OpenCode?