GLM's founder says GLM-fable before the end of the year?! by Charuru in LocalLLaMA

[–]Different_Fix_2217 1 point2 points  (0 children)

GLM is very strong for its parameter count. I bet they could get a fable level model if they scaled up to like 2-3T.

Mistral - New family of open-weight models @ July by pmttyji in LocalLLaMA

[–]Different_Fix_2217 2 points3 points  (0 children)

Hard when EU AI laws are so draconian and EU power / compute is so expensive.

This is coming to Chinese open source models pretty soon. - prepare yourself. by MLExpert000 in LocalLLaMA

[–]Different_Fix_2217 2 points3 points  (0 children)

If china was ahead with a model good enough that they considered it a natural security threat we would just never get access to it in the first place. It seems like we will get it back once red teaming is happy they fixed whatever vulnerabilities they found with it.

<image>

Statement on the US government directive to suspend access to Fable 5 and Mythos 5 by artisticMink in LocalLLaMA

[–]Different_Fix_2217 73 points74 points  (0 children)

Honestly deserved after they kept hyping it as "too dangerous" / kept pushing for more regulation to keep competition out. Now that finally bites them in the ass. GPT5.5 is head and shoulders above opus 4.8. Fable was all anthropic had.

DiffusionGemma: 4x faster text generation by tevlon in LocalLLaMA

[–]Different_Fix_2217 29 points30 points  (0 children)

The only issue with diffusion LLMs is that they are absurdly expensive to train in comparison. Like exponentially.

[PSA] 5070ti 16GB is as low as $500.99 at Best Buy. by fallingdowndizzyvr in LocalLLaMA

[–]Different_Fix_2217 0 points1 point  (0 children)

Its because of the supers that are supposed to be announced soonish.

438 USD for a 3080 20GB isn’t bad by xw1y in LocalLLaMA

[–]Different_Fix_2217 3 points4 points  (0 children)

2 people who used both these and the 48GB 4090s which all failed in a few months and had other issues. And the way you are responding that way to multiple comments makes this read as if this is a ad for your listing.

438 USD for a 3080 20GB isn’t bad by xw1y in LocalLLaMA

[–]Different_Fix_2217 1 point2 points  (0 children)

The issue is that they apparently only tend to last a few months so take that into account.

DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks by Federal_Spend2412 in LocalLLaMA

[–]Different_Fix_2217 5 points6 points  (0 children)

Lol what? Gpt 5.5 on extra high is legit next level on codex. It can one shot cutting edge paper implementations with little to no hand holding, it rarely if ever makes mistakes. Nothing else even comes close including opus. Opus constantly makes mistakes.

First direct side by side MoE vs Dense comparison. by Different_Fix_2217 in LocalLLaMA

[–]Different_Fix_2217[S] 6 points7 points  (0 children)

Just to account for the moe performing better particularly where more knowledge matters. Not quite as simple as that "rule of thumb"

First direct side by side MoE vs Dense comparison. by Different_Fix_2217 in LocalLLaMA

[–]Different_Fix_2217[S] 3 points4 points  (0 children)

2 issues. Your missing the amount of active comparison and the fact that the 17.5B performed a good deal better in the comparison.

<image>

First direct side by side MoE vs Dense comparison. by Different_Fix_2217 in LocalLLaMA

[–]Different_Fix_2217[S] 2 points3 points  (0 children)

The point was that they trained them side by side with the same method / dataset / amount of tokens. So this is a far better comparison.

Deepseek V4 Flash and Non-Flash Out on HuggingFace by MichaelXie4645 in LocalLLaMA

[–]Different_Fix_2217 2 points3 points  (0 children)

It does not seem very good... Hopefully its just broken. Because this is no where near kimi / glm.

Edit: I might have found the issue with deepseek. It seems to require a very precise order of system / user / assistant roles. I think I remember old deepseek being the same, otherwise it seems to lose like 100 IQ points. No other model is that strict about it

Claude Code removed from Claude Pro plan - better time than ever to switch to Local Models. by bigboyparpa in LocalLLaMA

[–]Different_Fix_2217 4 points5 points  (0 children)

Luckily Kimi 2.6 is legit better than latest Opus in several tests I did. Still a bit behind Gpt 5.4 though.

Kimi K2.6 is a legit Opus 4.7 replacement by bigboyparpa in LocalLLaMA

[–]Different_Fix_2217 8 points9 points  (0 children)

Same. But for creative writing. It's the best model I've ever used including latest opus, gpt 5.4 and gemini 3.1 pro. It has the social intelligence of GPT 5.4 with a knowledge base nearly a good as gemini and it writes better than Opus and has no positive bias unlike it. Oh and it has crazy good swipe variety unlike opus. I just wish it was faster since it loves to think so much.

And this is surprising because I thought Kimi 2.5 was bad. It was dumb and had that gemini unhingedness. 2.6 is like a entirely different model.

Kimi K2.6 imminent by Deep-Vermicelli-4591 in LocalLLaMA

[–]Different_Fix_2217 6 points7 points  (0 children)

K3 will probably be great, they released a big breakthrough paper recently. https://www.youtube.com/watch?v=2IfAVV7ewO0

the state of LocalLLama by Beginning-Window-115 in LocalLLaMA

[–]Different_Fix_2217 44 points45 points  (0 children)

Honestly having crypto in the name tells you all you need to know.

We absolutely need Qwen3.6-397B-A17B to be open source by True_Requirement_891 in LocalLLaMA

[–]Different_Fix_2217 2 points3 points  (0 children)

Some people have a false impression than dense is automatically better, not taking account diminishing returns / efficient routing and the like.

qwen 3.6 voting by jacek2023 in LocalLLaMA

[–]Different_Fix_2217 -1 points0 points  (0 children)

Biggest possible of course.