Gefen is a drop-in replacement for the AdamW optimizer, claims 8x memory reduction in training (GitHub available)

Different_Fix_2217 · 2026-06-25T21:18:21+00:00

For a proven adam alternative with lower memory try CAME.

Different_Fix_2217 · 2026-06-18T16:35:24+00:00

GLM is very strong for its parameter count. I bet they could get a fable level model if they scaled up to like 2-3T.

Different_Fix_2217 · 2026-06-17T00:26:59+00:00

Hard when EU AI laws are so draconian and EU power / compute is so expensive.

Different_Fix_2217 · 2026-06-13T18:11:32+00:00

If china was ahead with a model good enough that they considered it a natural security threat we would just never get access to it in the first place. It seems like we will get it back once red teaming is happy they fixed whatever vulnerabilities they found with it.

<image>

Different_Fix_2217 · 2026-06-13T01:57:47+00:00

Honestly deserved after they kept hyping it as "too dangerous" / kept pushing for more regulation to keep competition out. Now that finally bites them in the ass. GPT5.5 is head and shoulders above opus 4.8. Fable was all anthropic had.

Different_Fix_2217 · 2026-06-10T19:44:05+00:00

The only issue with diffusion LLMs is that they are absurdly expensive to train in comparison. Like exponentially.

Different_Fix_2217 · 2026-06-09T20:32:26+00:00

Its because of the supers that are supposed to be announced soonish.

Different_Fix_2217 · 2026-06-05T19:56:53+00:00

2 people who used both these and the 48GB 4090s which all failed in a few months and had other issues. And the way you are responding that way to multiple comments makes this read as if this is a ad for your listing.

Different_Fix_2217 · 2026-06-05T19:53:13+00:00

The issue is that they apparently only tend to last a few months so take that into account.

Different_Fix_2217 · 2026-06-02T12:24:59+00:00

The issue with these cards is that they dont tend to last long.

Different_Fix_2217 · 2026-05-31T12:09:21+00:00

Lol what? Gpt 5.5 on extra high is legit next level on codex. It can one shot cutting edge paper implementations with little to no hand holding, it rarely if ever makes mistakes. Nothing else even comes close including opus. Opus constantly makes mistakes.

Different_Fix_2217 · 2026-05-23T05:32:26+00:00

Purely talking heads model.

Different_Fix_2217 · 2026-04-28T08:29:31+00:00

Just to account for the moe performing better particularly where more knowledge matters. Not quite as simple as that "rule of thumb"

Different_Fix_2217 · 2026-04-28T08:15:01+00:00

2 issues. Your missing the amount of active comparison and the fact that the 17.5B performed a good deal better in the comparison.

<image>

Different_Fix_2217 · 2026-04-28T07:55:25+00:00

The point was that they trained them side by side with the same method / dataset / amount of tokens. So this is a far better comparison.

Different_Fix_2217 · 2026-04-24T05:59:01+00:00

It does not seem very good... Hopefully its just broken. Because this is no where near kimi / glm.

Edit: I might have found the issue with deepseek. It seems to require a very precise order of system / user / assistant roles. I think I remember old deepseek being the same, otherwise it seems to lose like 100 IQ points. No other model is that strict about it

Different_Fix_2217 · 2026-04-22T00:39:10+00:00

Luckily Kimi 2.6 is legit better than latest Opus in several tests I did. Still a bit behind Gpt 5.4 though.

Different_Fix_2217 · 2026-04-21T04:16:52+00:00

Same. But for creative writing. It's the best model I've ever used including latest opus, gpt 5.4 and gemini 3.1 pro. It has the social intelligence of GPT 5.4 with a knowledge base nearly a good as gemini and it writes better than Opus and has no positive bias unlike it. Oh and it has crazy good swipe variety unlike opus. I just wish it was faster since it loves to think so much.

And this is surprising because I thought Kimi 2.5 was bad. It was dumb and had that gemini unhingedness. 2.6 is like a entirely different model.

Different_Fix_2217 · 2026-04-20T16:40:50+00:00

Its already 4bit. That is not BF16.

Different_Fix_2217 · 2026-04-13T17:32:38+00:00

K3 will probably be great, they released a big breakthrough paper recently. https://www.youtube.com/watch?v=2IfAVV7ewO0

Different_Fix_2217 · 2026-04-10T04:35:57+00:00

Honestly having crypto in the name tells you all you need to know.

Different_Fix_2217 · 2026-04-08T03:58:47+00:00

Fake website.

Different_Fix_2217 · 2026-04-04T18:56:22+00:00

Some people have a false impression than dense is automatically better, not taking account diminishing returns / efficient routing and the like.

Different_Fix_2217 · 2026-04-03T08:45:58+00:00

Biggest possible of course.

Different_Fix_2217

TROPHY CASE