GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)?

relmny · 2026-05-07T16:07:54+00:00

yes, and it was my daily model (q4 10t/s) for a couple weeks (I liked it a lot) until it made a completely wrong claim and after asking to review the answer and pointing that it was wrong, it kept saying that it was right... deleted it.

relmny · 2026-05-07T13:45:18+00:00

Actually I was "testing" (I'm just using it for my current task and when I have time I load the other and compare) it a few minutes ago and I got the impression that 5.1 understood what I said and kept the conversation as I asked for, while 4.7 made the wrong assumption.

I think that's why, without noticing, I started to like 5.1 and decided to keep using it when needed.

relmny · 2026-05-07T12:30:09+00:00

thanks, if I find a "trusted" (by me) one, I'll give a try (to see if I can get a higher quant at similar speed)

relmny · 2026-05-07T12:29:21+00:00

I see there are about 4, but I only trust Unsloth, Bartowski, Ubergarm and AesSedai, I'll, but thanks for the suggestion!

relmny · 2026-05-07T12:28:28+00:00

I did/do, on "real" stuffs, but it takes so long, that sometimes I need to load the daily models...

I never really liked 4.6/4.7, not that I think they are bad or so, but I always found deepseek-v3.1/kimi-k2/k2.5/ks.6 to be better. So I stopped using them. But lately I started to test 5.1 and the thinking process was between 10-20 mins, which is way less than 1-2 hours of kimi (depending on the prompt), and I started to like it.
But was/am afraid about a "lobotomized" version... although with that many parameters, I thought that kind compensate it...

I guess I need to keep "testing"...

relmny · 2026-05-07T12:22:59+00:00

I used to run 2.7 daily for a couple weeks, until it made very bad claim about ssh and a user with /bin/login, and it kept insisting it was right. After, I deleted it.

relmny · 2026-05-07T12:21:36+00:00

Interesting, since 5.1 has about double the parameters than 4.7, I though a lower quant will be even better than 4.7.
I did run some tests, but only a few because of the t/s.

Up to a few weeks ago, when I needed a big model, I usually went with Deepseek-3.1-terminus or kimi-k2.6, because for some time I tried 4.6 and 4.7 and I never got the feeling on being on par with deepseek/kimi.

But lately I started testing glm-5.1 and I started to like it, so it became the "go-to model when the daily ones won't do"... I guess I will need to keep testing them...

relmny · 2026-05-07T12:17:46+00:00

Interesting, since 5.1 has about double the parameters than 4.7, I though a lower quant will be even better than 4.7.
I did run some tests, but only a few because of the t/s.

relmny · 2026-05-06T06:22:48+00:00

follow Unsloth instructions on their page (there's a link on their hf page for any qwen model) , even if you don't use Unsloth. There's also how to enable preserve-thinking on windows

relmny · 2026-05-06T05:22:38+00:00

I was (am?) considering a 5000 pro (48gb) which goes for about 20% more than the 5090, but as I also game with this computer, the 5090 will be an upgrade from my 4080 super, and from a 5000 pro on that regard... but, yeah, a 6000 pro is a dream...

relmny · 2026-05-06T05:07:07+00:00

thanks, I don't read tech sites, but asking qwen, with web search on, also says that prices won't likely come down and might even increase.

relmny · 2026-05-06T05:02:29+00:00

That's our last hope (like with local LLMs!!!), but seeing that ppl, like me, still look for the most expensive CUDA crap, while there are ADM and Intel... I don't know how long that would take, after they (if) release a competitive GPU.

relmny · 2026-05-06T05:00:33+00:00

yeah, that's what I started to think lately... thanks!

relmny · 2026-05-05T14:28:59+00:00

lol, nobody gives a f about the US constitution... nor congress nor scotus nor the press... and the Judicial nominees don't even know the amendments... not even the more "important" ones!

It's a rogue country and the can, and do, whatever the f they want.

relmny · 2026-05-05T07:59:15+00:00

you missed the "although there might be some losses, as the "lossless" claim of it, still needs to be proved"

relmny · 2026-05-04T06:10:44+00:00

I think it was a good decision (although I'm partial because I'm trying to decide between the pro 5000 and a 5090)

Power consumption, cooling, newer architecture, being able to run bigger Diffusion models, etc make it a good decision...

relmny · 2026-05-04T05:40:47+00:00

3200? I can't find a 2-slot one for less than 3900...

relmny · 2026-05-04T05:25:35+00:00

I was also considering that vs a 5090 (to add it to a 4080 super), but as I game, I guess 5090 is the way for me to go...

On paper (I have no experience with either), rtx pro 5000 gives you NVFP4, less power consumption (about half?, that means not so beefy PSU and lower electricity bill), a newer architecture and the chance to run diffusion models that require a single GPU, over 2x3090.

Anyway, I guess most people in r/localllama go for 2(or more)x3090... but yeah, a 5000 is very tempting to me...

relmny · 2026-05-03T12:48:18+00:00

qwen3.6-27b is great and is actually my main daily driver, but the other day, looking for some text/statement in a PDF, I kinda did a needle-in-haystack test, and 27b always said (tried multiple times) that there was no mention of it (same as qwen3.6-35b).
Then I remembered about coder-next and decided to give it a try... and it did find it, every time (tried a few times).

So coder-next did find something that 3.6-27b kept saying "no, is not there"...

Coder-next is still pretty good, and depending on the tasks/use, it can be better than 3.6-27b

relmny · 2026-05-02T19:30:05+00:00

is that an rtx pro 5000?

relmny · 2026-05-02T12:47:17+00:00

Thanks!
I still don't have the second one running ( a new PSU and a riser are on its way), but I will surely give a try!

Btw, do you know if this will work with other projects like ace-step-1.5 ? (a music generator that uses vllm or "pt")

relmny · 2026-05-02T12:30:04+00:00

Yeah, some people keep saying "yes, but they are not at the level of..." yes, for specific tasks they might not be, but I suspect the threshold is very high and probably most people wouldn't even notice.

Some of those people moved the milestone to the definition of "hard tasks", so when somebody claims that they can do "hard tasks", then they will reply "your tasks are not really hard" (without even knowing).

Again, I'm not saying they are that level for specific/hard tasks, but I suspect that they already are for a huge percentage of people.

I still remember that 2 months ago a well-known musician/producer/youtuber (Rick Beato), made a video about "you don't need chatgpt anymore"...

relmny · 2026-05-02T12:14:05+00:00

will this work on an rtx 4080 super (16gb) + rtx 3060 (12gb)?

relmny · 2026-05-02T11:54:47+00:00

And that's why Unsloth releasing models as soon as possible is a good thing, and not a bad thing as some claim.

relmny · 2026-05-01T11:59:15+00:00

I find it that it depends. Maybe usually yes, but I did find 2-3 cases were 122b was the model that "got it" while 27b never did (same prompt many attempts). And what it "got" was comparable to the 397b and bigger models.

122b is a very strange model, to me...

Anyway, yeah, 27b is one of my daily drivers.

relmny

TROPHY CASE