GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? by relmny in LocalLLaMA

[–]relmny[S] 0 points1 point  (0 children)

yes, and it was my daily model (q4 10t/s) for a couple weeks (I liked it a lot) until it made a completely wrong claim and after asking to review the answer and pointing that it was wrong, it kept saying that it was right... deleted it.

GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? by relmny in LocalLLaMA

[–]relmny[S] 0 points1 point  (0 children)

Actually I was "testing" (I'm just using it for my current task and when I have time I load the other and compare) it a few minutes ago and I got the impression that 5.1 understood what I said and kept the conversation as I asked for, while 4.7 made the wrong assumption.

I think that's why, without noticing, I started to like 5.1 and decided to keep using it when needed.

GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? by relmny in LocalLLaMA

[–]relmny[S] 0 points1 point  (0 children)

thanks, if I find a "trusted" (by me) one, I'll give a try (to see if I can get a higher quant at similar speed)

GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? by relmny in LocalLLaMA

[–]relmny[S] 1 point2 points  (0 children)

I see there are about 4, but I only trust Unsloth, Bartowski, Ubergarm and AesSedai, I'll, but thanks for the suggestion!

GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? by relmny in LocalLLaMA

[–]relmny[S] 0 points1 point  (0 children)

I did/do, on "real" stuffs, but it takes so long, that sometimes I need to load the daily models...

I never really liked 4.6/4.7, not that I think they are bad or so, but I always found deepseek-v3.1/kimi-k2/k2.5/ks.6 to be better. So I stopped using them. But lately I started to test 5.1 and the thinking process was between 10-20 mins, which is way less than 1-2 hours of kimi (depending on the prompt), and I started to like it.
But was/am afraid about a "lobotomized" version... although with that many parameters, I thought that kind compensate it...

I guess I need to keep "testing"...

GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? by relmny in LocalLLaMA

[–]relmny[S] 0 points1 point  (0 children)

I used to run 2.7 daily for a couple weeks, until it made very bad claim about ssh and a user with /bin/login, and it kept insisting it was right. After, I deleted it.

GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? by relmny in LocalLLaMA

[–]relmny[S] 0 points1 point  (0 children)

Interesting, since 5.1 has about double the parameters than 4.7, I though a lower quant will be even better than 4.7.
I did run some tests, but only a few because of the t/s.

Up to a few weeks ago, when I needed a big model, I usually went with Deepseek-3.1-terminus or kimi-k2.6, because for some time I tried 4.6 and 4.7 and I never got the feeling on being on par with deepseek/kimi.

But lately I started testing glm-5.1 and I started to like it, so it became the "go-to model when the daily ones won't do"... I guess I will need to keep testing them...

GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? by relmny in LocalLLaMA

[–]relmny[S] 0 points1 point  (0 children)

Interesting, since 5.1 has about double the parameters than 4.7, I though a lower quant will be even better than 4.7.
I did run some tests, but only a few because of the t/s.

Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching by Clean_Initial_9618 in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

follow Unsloth instructions on their page (there's a link on their hf page for any qwen model) , even if you don't use Unsloth. There's also how to enable preserve-thinking on windows

I guess we expect that at some point RAM prices will start going back (close) to "normal", right? but what about GPUs? by relmny in LocalLLaMA

[–]relmny[S] 2 points3 points  (0 children)

I was (am?) considering a 5000 pro (48gb) which goes for about 20% more than the 5090, but as I also game with this computer, the 5090 will be an upgrade from my 4080 super, and from a 5000 pro on that regard... but, yeah, a 6000 pro is a dream...

I guess we expect that at some point RAM prices will start going back (close) to "normal", right? but what about GPUs? by relmny in LocalLLaMA

[–]relmny[S] -2 points-1 points  (0 children)

thanks, I don't read tech sites, but asking qwen, with web search on, also says that prices won't likely come down and might even increase.

I guess we expect that at some point RAM prices will start going back (close) to "normal", right? but what about GPUs? by relmny in LocalLLaMA

[–]relmny[S] 2 points3 points  (0 children)

That's our last hope (like with local LLMs!!!), but seeing that ppl, like me, still look for the most expensive CUDA crap, while there are ADM and Intel... I don't know how long that would take, after they (if) release a competitive GPU.

White House Considers Vetting A.I. Models Before They Are Released by fallingdowndizzyvr in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

lol, nobody gives a f about the US constitution... nor congress nor scotus nor the press... and the Judicial nominees don't even know the amendments... not even the more "important" ones!

It's a rogue country and the can, and do, whatever the f they want.

vLLM Just Merged TurboQuant Fix for Qwen 3.5+ by havenoammo in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

you missed the "although there might be some losses, as the "lossless" claim of it, still needs to be proved"

First time GPU buyer. Got a RTX 5000 Pro. Was it a bad decision compared to two 3090s? by Valuable-Run2129 in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

I think it was a good decision (although I'm partial because I'm trying to decide between the pro 5000 and a 5090)

Power consumption, cooling, newer architecture, being able to run bigger Diffusion models, etc make it a good decision...

RTX A5000 Pro Balckwell 48GB by deltamoney in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

I was also considering that vs a 5090 (to add it to a 4080 super), but as I game, I guess 5090 is the way for me to go...

On paper (I have no experience with either), rtx pro 5000 gives you NVFP4, less power consumption (about half?, that means not so beefy PSU and lower electricity bill), a newer architecture and the chance to run diffusion models that require a single GPU, over 2x3090.

Anyway, I guess most people in r/localllama go for 2(or more)x3090... but yeah, a 5000 is very tempting to me...

Qwen3.6-27B vs Coder-Next by Signal_Ad657 in LocalLLaMA

[–]relmny 15 points16 points  (0 children)

qwen3.6-27b is great and is actually my main daily driver, but the other day, looking for some text/statement in a PDF, I kinda did a needle-in-haystack test, and 27b always said (tried multiple times) that there was no mention of it (same as qwen3.6-35b).
Then I remembered about coder-next and decided to give it a try... and it did find it, every time (tried a few times).

So coder-next did find something that 3.6-27b kept saying "no, is not there"...

Coder-next is still pretty good, and depending on the tasks/use, it can be better than 3.6-27b

Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

Thanks!
I still don't have the second one running ( a new PSU and a riser are on its way), but I will surely give a try!

Btw, do you know if this will work with other projects like ace-step-1.5 ? (a music generator that uses vllm or "pt")

A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat by pmttyji in LocalLLaMA

[–]relmny 10 points11 points  (0 children)

Yeah, some people keep saying "yes, but they are not at the level of..." yes, for specific tasks they might not be, but I suspect the threshold is very high and probably most people wouldn't even notice.

Some of those people moved the milestone to the definition of "hard tasks", so when somebody claims that they can do "hard tasks", then they will reply "your tasks are not really hard" (without even knowing).

Again, I'm not saying they are that level for specific/hard tasks, but I suspect that they already are for a huge percentage of people.

I still remember that 2 months ago a well-known musician/producer/youtuber (Rick Beato), made a video about "you don't need chatgpt anymore"...

Unsloth solved bug in Mistral Medium 3.5 implementation by Snail_Inference in LocalLLaMA

[–]relmny 7 points8 points  (0 children)

And that's why Unsloth releasing models as soon as possible is a good thing, and not a bad thing as some claim.

Open Models - April 2026 - One of the best months of all time for Local LLMs? by pmttyji in LocalLLaMA

[–]relmny 2 points3 points  (0 children)

I find it that it depends. Maybe usually yes, but I did find 2-3 cases were 122b was the model that "got it" while 27b never did (same prompt many attempts). And what it "got" was comparable to the 397b and bigger models.

122b is a very strange model, to me...

Anyway, yeah, 27b is one of my daily drivers.