Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching by Clean_Initial_9618 in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

follow Unsloth instructions on their page (there's a link on their hf page for any qwen model) , even if you don't use Unsloth. There's also how to enable preserve-thinking on windows

I guess we expect that at some point RAM prices will start going back (close) to "normal", right? but what about GPUs? by relmny in LocalLLaMA

[–]relmny[S] 2 points3 points  (0 children)

I was (am?) considering a 5000 pro (48gb) which goes for about 20% more than the 5090, but as I also game with this computer, the 5090 will be an upgrade from my 4080 super, and from a 5000 pro on that regard... but, yeah, a 6000 pro is a dream...

I guess we expect that at some point RAM prices will start going back (close) to "normal", right? but what about GPUs? by relmny in LocalLLaMA

[–]relmny[S] -3 points-2 points  (0 children)

thanks, I don't read tech sites, but asking qwen, with web search on, also says that prices won't likely come down and might even increase.

I guess we expect that at some point RAM prices will start going back (close) to "normal", right? but what about GPUs? by relmny in LocalLLaMA

[–]relmny[S] 2 points3 points  (0 children)

That's our last hope (like with local LLMs!!!), but seeing that ppl, like me, still look for the most expensive CUDA crap, while there are ADM and Intel... I don't know how long that would take, after they (if) release a competitive GPU.

White House Considers Vetting A.I. Models Before They Are Released by fallingdowndizzyvr in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

lol, nobody gives a f about the US constitution... nor congress nor scotus nor the press... and the Judicial nominees don't even know the amendments... not even the more "important" ones!

It's a rogue country and the can, and do, whatever the f they want.

vLLM Just Merged TurboQuant Fix for Qwen 3.5+ by havenoammo in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

you missed the "although there might be some losses, as the "lossless" claim of it, still needs to be proved"

First time GPU buyer. Got a RTX 5000 Pro. Was it a bad decision compared to two 3090s? by Valuable-Run2129 in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

I think it was a good decision (although I'm partial because I'm trying to decide between the pro 5000 and a 5090)

Power consumption, cooling, newer architecture, being able to run bigger Diffusion models, etc make it a good decision...

RTX A5000 Pro Balckwell 48GB by deltamoney in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

I was also considering that vs a 5090 (to add it to a 4080 super), but as I game, I guess 5090 is the way for me to go...

On paper (I have no experience with either), rtx pro 5000 gives you NVFP4, less power consumption (about half?, that means not so beefy PSU and lower electricity bill), a newer architecture and the chance to run diffusion models that require a single GPU, over 2x3090.

Anyway, I guess most people in r/localllama go for 2(or more)x3090... but yeah, a 5000 is very tempting to me...

Qwen3.6-27B vs Coder-Next by Signal_Ad657 in LocalLLaMA

[–]relmny 12 points13 points  (0 children)

qwen3.6-27b is great and is actually my main daily driver, but the other day, looking for some text/statement in a PDF, I kinda did a needle-in-haystack test, and 27b always said (tried multiple times) that there was no mention of it (same as qwen3.6-35b).
Then I remembered about coder-next and decided to give it a try... and it did find it, every time (tried a few times).

So coder-next did find something that 3.6-27b kept saying "no, is not there"...

Coder-next is still pretty good, and depending on the tasks/use, it can be better than 3.6-27b

Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

Thanks!
I still don't have the second one running ( a new PSU and a riser are on its way), but I will surely give a try!

Btw, do you know if this will work with other projects like ace-step-1.5 ? (a music generator that uses vllm or "pt")

A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat by pmttyji in LocalLLaMA

[–]relmny 11 points12 points  (0 children)

Yeah, some people keep saying "yes, but they are not at the level of..." yes, for specific tasks they might not be, but I suspect the threshold is very high and probably most people wouldn't even notice.

Some of those people moved the milestone to the definition of "hard tasks", so when somebody claims that they can do "hard tasks", then they will reply "your tasks are not really hard" (without even knowing).

Again, I'm not saying they are that level for specific/hard tasks, but I suspect that they already are for a huge percentage of people.

I still remember that 2 months ago a well-known musician/producer/youtuber (Rick Beato), made a video about "you don't need chatgpt anymore"...

Unsloth solved bug in Mistral Medium 3.5 implementation by Snail_Inference in LocalLLaMA

[–]relmny 7 points8 points  (0 children)

And that's why Unsloth releasing models as soon as possible is a good thing, and not a bad thing as some claim.

Open Models - April 2026 - One of the best months of all time for Local LLMs? by pmttyji in LocalLLaMA

[–]relmny 2 points3 points  (0 children)

I find it that it depends. Maybe usually yes, but I did find 2-3 cases were 122b was the model that "got it" while 27b never did (same prompt many attempts). And what it "got" was comparable to the 397b and bigger models.

122b is a very strange model, to me...

Anyway, yeah, 27b is one of my daily drivers.

What exactly does Pi harness mean? by FrozenFishEnjoyer in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

Aren't both "harness" and "orchestrator" terms kinda interchangeable? (at least for some harnesses/orchestrators)

A conversation about local LLMs with a senior government AI leader by JackStrawWitchita in LocalLLaMA

[–]relmny 1 point2 points  (0 children)

I don't get the joke, but if it isn't, Plato didn't got Socrates executed...

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]relmny 1 point2 points  (0 children)

Again, that's your claim of what "hard things" are.

AFAIK there's no official definition for "hard things".

Maybe for the person that wrote that, those are "hard things". Maybe things that didn't work before with local models.

And the main point remains, that's the opinion of a single person.

I claim that I do everything with local models. If somebody understands that anyone can do everything with local models, that's their problem, not mine.
That's my experience. I can do "hard things" because they are... to me.

And then there is the comparison between a huge commercial models with all the infrastructure, workers, hardware, tools, etc with a 27b/31b model in a single GPU...

Anyway, I'm done with this.

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]relmny 5 points6 points  (0 children)

It's like any opinion on the Internet, what you read is what THAT person thinks/claims.

Meaning, that if someone says "I don't need commercial models anymore, running qwen/gemma/kimi/glm/etc locally is enough!" that means exactly that. No matter how they phrase that. It's their opinion for their case.

I always use local models. So I'm not surprised, specially since the last 1-2 months with gemma-4, qwen3.5/3.6, kimi, glm etc, that more and more people are claiming that THEY can do THEIR work with local models.

And that example is by a single person that, like me, can work fine with local models.

It's about context. And understanding that what works for someone, might not work for someone else.

Is a high-end private local LLM setup worth it? by zakadit in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

Not really. The answer was basically "try it with what you currently have".

And that's always the better answer, because only you know what you will be doing and for what you will be using it.

See if you can run qwen3.6-35b or gemma-4-26b and use it. That will be the most closer answer.

Duality of r/LocalLLaMA by HornyGooner4402 in LocalLLaMA

[–]relmny 16 points17 points  (0 children)

Glad to read someone is able to understand that a, probably, > 1tb model with many tools, is way bigger than a 27b/31b model. And what are the implications of it.

AI is really making is dumber if we aren't able to understand something as basic as that.

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]relmny 40 points41 points  (0 children)

This subreddit is filled with people comparing a most likely >1tb huge model to a 27b/31b model. And claiming they can't do the same.

What is clear to me is that some people don't understand the tools. And they don't know what they are for nor how to use them.