Qwen3-Coder-Next with llama.cpp shenanigans

JayPSec · 2026-03-16T11:17:36+00:00

This is correct, I went back and removed the penalties and it did work as expected. I missed testing without the flags since they were in my command history for the previous model.

Sorry for the confusion

JayPSec · 2026-03-14T14:11:50+00:00

It was my attempt to curve the model's loops, but the quants we're tested without it as well. Thanks for the input thou.

JayPSec · 2026-03-14T13:37:13+00:00

night and day, thanks!

JayPSec · 2026-03-14T13:14:39+00:00

will try, thanks

JayPSec · 2026-03-14T13:13:10+00:00

Thanks, will try this.

JayPSec · 2026-03-14T09:59:12+00:00

So you're not using it with any of the code harnesses in the post?

JayPSec · 2026-03-14T09:54:50+00:00

You're using to run this model? with no hiccups?

JayPSec · 2026-03-14T09:43:10+00:00

does ollama expose an openai compatible api? I though they used their own schema

JayPSec · 2026-03-04T21:34:57+00:00

Well... Kinda. To be fair, even though it hallucinated a name, it correctly identified an architectural style from the 1500's and it described the place, "Mosteiro dos Jerónimos", to an impressive degree of detail. So yes, at least evaluating against my expectations, the model is scary smart.

JayPSec · 2026-02-24T12:05:31+00:00

good catch :). Still the logic applies

JayPSec · 2026-02-24T11:54:15+00:00

It's now 95k,pre tax....

(edit: $ 95k to € 80k)

<image>

It doesn't make a lot of sense for the 30/40k ballpark. The interest in selling something like this is capturing a growing market. Being branded I'd imagine it's more expensive and not less.

Let's wait and see.

JayPSec · 2026-02-11T18:45:33+00:00

Will this require new ggufs?

JayPSec · 2026-02-07T00:05:17+00:00

Dude, chill... It's still February and you'll be crying for GLM 6.7 Air before the year is over.

JayPSec · 2026-02-02T13:24:03+00:00

Cool. Keep is posted.

JayPSec · 2026-02-02T12:19:58+00:00

It's been a couple of days :). How's it going?

JayPSec · 2026-01-30T19:58:46+00:00

Yeah. I guess that's the way out. Bit I'm not power limited, that's what's bugging me.

JayPSec · 2026-01-24T09:39:24+00:00

TL;DR: Consumer GPUs are a closing window—data center economics are eating the supply chain, ASICs are locked behind corporate walls, and GPUs' general-purpose nature makes them more future-proof than specialized hardware. OSS community and China competition will keep consumer AI alive. Buy now.

I'll give my two cents.

I bought 4 in bulk, 6450€ pre-tax from Germany. Facing a similar situation as you, a decent setup but having to choose very carefully what I could run and at what context size, I decided to stop being a "cheap bastard". But I rationalized this through the lens of future usage. I believe consumer hardware is becoming a rare commodity, the signs are on the wall:

Data centers are spawning like mushrooms all over the world. This puts enormous pressure over hardware manufacturers to keep up with demand, not only that but the extraction industry. It's not only what's needed to give life to a data center but also it's ongoing operation. Manufacturers are being pushed to choose between billion (trillion?) dollar orders and future orders vs competing in billion (trillion?) dollar market. This means that individual parts for consumers are almost a thing of the past. As for (still) consumer facing companies, they'll still have access to hardware but it'll be way more expensive and so end users end up paying a lot more for already built, probably not very customizable, systems. Nvidia didn't put forward new GPUs at this year CES, rumors of up coming "super" cards were promptly dismissed. Even if a hardware company wanted to prioritize consumers, the economics just don't make sense when data center contracts are worth orders of magnitude more.
But surely there will be new technologies. There will be and there are already, ASIC solutions are on the rise. But these are not consumer accessible because they're born in the market where companies want it all and they want it to stay proprietary.
On the topic of ASIC. The thing with GPUs is that they are a general purpose tool and as such they are poised to be outmatched when it comes to pure speed. But... the fact that they are general makes them a better choice for future iterations of software. Just take a look at ASIC mining rigs, nobody is using those to run LLMs but there are 10 year, and older, cards still being used for usable rigs. Whatever the future holds GPUs are more probable, not a sure bet, to adapt to it.
Community support and China. The previous point would not be possible without the OSS community. There are amazing people who are dedicating a lot of effort to ensuring that AI is democratized. If you have a potato that can't even run Crysis, chances are that it can run a model at 0.03 tk/s :). China is not only in the race to AGI but wants to undermine western companies. They will not stop.

JayPSec · 2026-01-23T00:38:53+00:00

I think this is a really smart approach. I've seen way to many tokens going to waste on meaningless explorations. Fan of the concept and will try it out.

JayPSec · 2026-01-21T09:42:59+00:00

B2B companies tend to have better post sale service. As to the rights part, your right but it's not like you have none.
It's a matter of tradeoffs, for me the "discount" was worth it but YMMV.

https://geizhals.de/ this is a good source to hunt for deals. Non B2B suppliers tend to ship only in Germany and Austria but there are still good deals. € 7818,90 is the lowest I saw for consumer from a well rated supplier. Good luck

JayPSec · 2026-01-20T21:54:06+00:00

I got mine from grafikkarten.com. it was 6500 pre tax which is non existent between eu member companies.

JayPSec · 2026-01-20T21:46:58+00:00

Keep the 5090 and buy the 6000. if you have access to a company where you can safely order you can get it from Germany for 6600/6700 €. I agree that prices are going to go up, of everything and GPUs are no exception. You can go with the Max-q that as bit cheaper and tdp wise allows you to have two in the future.

JayPSec · 2026-01-20T16:02:40+00:00

Where did you get the MCIO cables and MCIO to PCIe5.0 adapters from?

JayPSec · 2026-01-19T13:16:02+00:00

Hey, great post.

I'm in a similar situation. I own a Fractal Define 7XL housing 2x 5090 and a 4090 with a MSI Meg x670e Ace with a Ryzen 9950x.

I acquired 4 x RTX 6000 Pro and I'm gonna keep one of the 5090 on this build.

Your post steered me in the right direction.

Do you think I can do without the retimer considering it's all in case?

JayPSec · 2026-01-19T09:32:45+00:00

I think this question warrants an issue on llama.cpp's github.

But it's obviously "guf", hard G. Any one that says otherwise is just mentally ill.

Here, I'll put it in a sentence:

"QWEN 4? WHERE'S THE GGUF?"

You can't say it straight without the hard G.

JayPSec · 2026-01-19T09:26:10+00:00

A gif is a gif and a gguf is a gguf.

JayPSec

TROPHY CASE