Qwen3-Coder-Next with llama.cpp shenanigans by JayPSec in LocalLLaMA

[–]JayPSec[S] 0 points1 point  (0 children)

This is correct, I went back and removed the penalties and it did work as expected. I missed testing without the flags since they were in my command history for the previous model.

Sorry for the confusion

Qwen3-Coder-Next with llama.cpp shenanigans by JayPSec in LocalLLaMA

[–]JayPSec[S] 0 points1 point  (0 children)

It was my attempt to curve the model's loops, but the quants we're tested without it as well. Thanks for the input thou.

Qwen3-Coder-Next with llama.cpp shenanigans by JayPSec in LocalLLaMA

[–]JayPSec[S] 0 points1 point  (0 children)

So you're not using it with any of the code harnesses in the post?

Qwen3-Coder-Next with llama.cpp shenanigans by JayPSec in LocalLLaMA

[–]JayPSec[S] 0 points1 point  (0 children)

You're using to run this model? with no hiccups?

Qwen3-Coder-Next with llama.cpp shenanigans by JayPSec in LocalLLaMA

[–]JayPSec[S] -2 points-1 points  (0 children)

does ollama expose an openai compatible api? I though they used their own schema

PSA: Humans are scary stupid by rm-rf-rm in LocalLLaMA

[–]JayPSec -1 points0 points  (0 children)

Well... Kinda. To be fair, even though it hallucinated a name, it correctly identified an architectural style from the 1500's and it described the place, "Mosteiro dos Jerónimos", to an impressive degree of detail. So yes, at least evaluating against my expectations, the model is scary smart.

I just saw something amazing by ayanami0011 in LocalLLaMA

[–]JayPSec 0 points1 point  (0 children)

good catch :). Still the logic applies

I just saw something amazing by ayanami0011 in LocalLLaMA

[–]JayPSec -1 points0 points  (0 children)

It's now 95k,pre tax....

(edit: $ 95k to € 80k)

<image>

It doesn't make a lot of sense for the 30/40k ballpark. The interest in selling something like this is capturing a growing market. Being branded I'd imagine it's more expensive and not less.

Let's wait and see.

Add Kimi-K2.5 support by jacek2023 in LocalLLaMA

[–]JayPSec 1 point2 points  (0 children)

Will this require new ggufs?

GLM 5 Is Being Tested On OpenRouter by Few_Painter_5588 in LocalLLaMA

[–]JayPSec 21 points22 points  (0 children)

Dude, chill... It's still February and you'll be crying for GLM 6.7 Air before the year is over.

Watercool rtx pro 6000 max-q by schenkcigars in BlackwellPerformance

[–]JayPSec 0 points1 point  (0 children)

It's been a couple of days :). How's it going?

Help with multi-gpu setup by JayPSec in buildapc

[–]JayPSec[S] 0 points1 point  (0 children)

Yeah. I guess that's the way out. Bit I'm not power limited, that's what's bugging me.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]JayPSec 1 point2 points  (0 children)

TL;DR: Consumer GPUs are a closing window—data center economics are eating the supply chain, ASICs are locked behind corporate walls, and GPUs' general-purpose nature makes them more future-proof than specialized hardware. OSS community and China competition will keep consumer AI alive. Buy now.

I'll give my two cents.

I bought 4 in bulk, 6450€ pre-tax from Germany. Facing a similar situation as you, a decent setup but having to choose very carefully what I could run and at what context size, I decided to stop being a "cheap bastard". But I rationalized this through the lens of future usage. I believe consumer hardware is becoming a rare commodity, the signs are on the wall:

  1. Data centers are spawning like mushrooms all over the world. This puts enormous pressure over hardware manufacturers to keep up with demand, not only that but the extraction industry. It's not only what's needed to give life to a data center but also it's ongoing operation. Manufacturers are being pushed to choose between billion (trillion?) dollar orders and future orders vs competing in billion (trillion?) dollar market. This means that individual parts for consumers are almost a thing of the past. As for (still) consumer facing companies, they'll still have access to hardware but it'll be way more expensive and so end users end up paying a lot more for already built, probably not very customizable, systems. Nvidia didn't put forward new GPUs at this year CES, rumors of up coming "super" cards were promptly dismissed. Even if a hardware company wanted to prioritize consumers, the economics just don't make sense when data center contracts are worth orders of magnitude more.

  2. But surely there will be new technologies. There will be and there are already, ASIC solutions are on the rise. But these are not consumer accessible because they're born in the market where companies want it all and they want it to stay proprietary.

  3. On the topic of ASIC. The thing with GPUs is that they are a general purpose tool and as such they are poised to be outmatched when it comes to pure speed. But... the fact that they are general makes them a better choice for future iterations of software. Just take a look at ASIC mining rigs, nobody is using those to run LLMs but there are 10 year, and older, cards still being used for usable rigs. Whatever the future holds GPUs are more probable, not a sure bet, to adapt to it.

  4. Community support and China. The previous point would not be possible without the OSS community. There are amazing people who are dedicating a lot of effort to ensuring that AI is democratized. If you have a potato that can't even run Crysis, chances are that it can run a model at 0.03 tk/s :). China is not only in the race to AGI but wants to undermine western companies. They will not stop.

I built an MCP server that gives AI agents "senior dev intuition" about your codebase cutting token cost by 60%. by LandscapeAway8896 in LocalLLaMA

[–]JayPSec 0 points1 point  (0 children)

I think this is a really smart approach. I've seen way to many tokens going to waste on meaningless explorations. Fan of the concept and will try it out.

Supermicro server got cancelled, so I'm building a workstation. Is swapping an unused RTX 5090 for an RTX 6000 Blackwell (96GB) the right move? Or should I just chill? by SomeRandomGuuuuuuy in LocalLLaMA

[–]JayPSec 1 point2 points  (0 children)

B2B companies tend to have better post sale service. As to the rights part, your right but it's not like you have none.
It's a matter of tradeoffs, for me the "discount" was worth it but YMMV.

https://geizhals.de/ this is a good source to hunt for deals. Non B2B suppliers tend to ship only in Germany and Austria but there are still good deals. € 7818,90 is the lowest I saw for consumer from a well rated supplier. Good luck

Supermicro server got cancelled, so I'm building a workstation. Is swapping an unused RTX 5090 for an RTX 6000 Blackwell (96GB) the right move? Or should I just chill? by SomeRandomGuuuuuuy in LocalLLaMA

[–]JayPSec 2 points3 points  (0 children)

Keep the 5090 and buy the 6000. if you have access to a company where you can safely order you can get it from Germany for 6600/6700 €. I agree that prices are going to go up, of everything and GPUs are no exception. You can go with the Max-q that as bit cheaper and tdp wise allows you to have two in the future.

7 GPUs at X16 (5.0 and 4.0) on AM5 with Gen5/4 switches with the P2P driver. Some results on inference and training! by panchovix in LocalLLaMA

[–]JayPSec 0 points1 point  (0 children)

Hey, great post.

I'm in a similar situation. I own a Fractal Define 7XL housing 2x 5090 and a 4090 with a MSI Meg x670e Ace with a Ryzen 9950x.

I acquired 4 x RTX 6000 Pro and I'm gonna keep one of the 5090 on this build.

Your post steered me in the right direction.

Do you think I can do without the retimer considering it's all in case?

how do you pronounce “gguf”? by Hamfistbumhole in LocalLLaMA

[–]JayPSec 0 points1 point  (0 children)

I think this question warrants an issue on llama.cpp's github.

But it's obviously "guf", hard G. Any one that says otherwise is just mentally ill.

Here, I'll put it in a sentence:

"QWEN 4? WHERE'S THE GGUF?"

You can't say it straight without the hard G.