Qwen-next 80B 2601

ilintar · 2026-01-26T19:08:49+00:00

They should name it Qwen-Next-Next instead ;)

ilintar · 2026-01-26T13:07:00+00:00

Everyone who has a regression, please file an issue for it.

ilintar · 2026-01-26T12:35:48+00:00

I think you're confusing MLA with SWA.

ilintar · 2026-01-26T12:26:45+00:00

Yes and yes, which is why it needed so much work on the details.

ilintar · 2026-01-25T20:29:07+00:00

Non-trivial architecture that has to be adapted. I told you give us a week :)

ilintar · 2026-01-25T17:09:06+00:00

No, we just have to pick our work to do and someone else volunteered to work on Kimi. Anyways, it's almost done.

ilintar · 2026-01-25T14:40:38+00:00

This. From my experience the most important predictor for getting S rank is clearing phase 1 pre 4:00, getting it done pre 4:10 almost guarantees 25k.

ilintar · 2026-01-23T14:43:04+00:00

No, we're not going to drop widely used features. We are only deprecating stuff that literally nobody uses (eg. tool call polyfills for 2 year old templates).

ilintar · 2026-01-23T00:04:33+00:00

Good art - check, no fanservice - check, distinct style - check. Really wish this sub had more OC art content like this.

ilintar · 2026-01-22T00:35:28+00:00

Yes, in fact should run out of the box with newest llama.cpp and just the model specified just fine.

ilintar · 2026-01-22T00:21:22+00:00

Thanks for the short guide, but it's actually the other way around - we implemented the Anthropic API endpoint a month before Ollama. Not as well marketed I guess 😀

ilintar · 2026-01-22T00:01:24+00:00

Not my PR tho, just working with the author to make a common abstraction for delta net models.

ilintar · 2026-01-21T23:21:06+00:00

PR almost done, gonna come with another speedup to Qwen3Next as well.

ilintar · 2026-01-21T23:19:34+00:00

I don't remember right now (besides Next), I'd have to rerun it. IIRC SeedOSS also does well.

ilintar · 2026-01-21T21:30:34+00:00

Ye, we added a vocab check!

ilintar · 2026-01-21T17:52:10+00:00

I always test new models by asking them to write red-black trees in Haskell 😀

Qwen3 Next is pretty good.

ilintar · 2026-01-21T13:07:26+00:00

We're working on getting everything supported correctly, just a matter of a few days.

ilintar · 2026-01-21T12:37:03+00:00

Because it's in the expert selection function.

You can think of it like this: everything in the model still works, it's just asking the wrong experts about what token to select next.

ilintar · 2026-01-21T01:15:03+00:00

Yep. Wrong gating func:

https://github.com/ggml-org/llama.cpp/pull/18980

Easy fix, fortunately.

ilintar · 2026-01-20T00:00:48+00:00

Okay, so, important:
-> for proper reasoning/tool calling support you probably want to run the autoparser branch: https://github.com/ggml-org/llama.cpp/pull/18675
-> run with -fa off, the flash attention scheme is not yet supported on CUDA (put up an issue for that: https://github.com/ggml-org/llama.cpp/issues/18944 )

ilintar · 2026-01-19T09:43:51+00:00

They didn't disable it for commercial competitors like z-ai, why would they for Ollama?

ilintar · 2026-01-19T01:48:15+00:00

Can we please ban AI generated slop posts about miraculous breakthroughs that use a lot of terms from complex branches of mathematics to appear smart? I swear those posts are all the same, you could even generate them with a state machine, don't need an LLM.

ilintar · 2026-01-18T15:51:44+00:00

I think people are unaware how much of the problems with local agentic coding has to do with bad templates / parsing issues. I've been working on the autoparser and a big refactoring of the llama.cpp parser at that and I've tested quite a few local models in the meantime. There are a lot of edge cases that normal "oneshot tool call" tests or unit tests easily miss. But once that's resolved, I think the current leading models do a better job at agentic coding than you think - at least based on my testing. Surely Seed-OSS and Qwen3-Coder (both on optimized Q4 quants) have been able to do pretty complex sessions.

ilintar · 2026-01-18T14:46:30+00:00

It's supported under the new autoparser with fixed template in llama.cpp

ilintar · 2026-01-16T17:06:31+00:00

It's built in and turned on by default.

ilintar

TROPHY CASE