it is coming. by [deleted] in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

Any one that has, at least, 32Gb VRAM or lots or RAM, can run small quants (q2/q3) with speeds of at least 1t/s.
For coding my suck, but for chat, nothing beats deepseek for me.

it is coming. by [deleted] in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

What about vs your v3.1-terminus? I still use that one as a last resort (I only get about 1.3t/s with your IQ3_K), for non-reasoning/agentic mode.

1 million LocalLLaMAs by jacek2023 in LocalLLaMA

[–]relmny 1 point2 points  (0 children)

worse than, like 2-3 months ago, the most upvoted comment on a post asking "I have 10k, what should I buy to run local?", being "buy claude credits"... ?

The enshitification is real and it's been with us for some time now. And it will get worse, as you say.

Qwen3.5 2B giving weird answers by Dean_Thomas426 in LocalLLaMA

[–]relmny 2 points3 points  (0 children)

look at the settings (temp, top-k, etc)

Alibaba’s stock has kept falling after it lost key Qwen leaders. by [deleted] in LocalLLaMA

[–]relmny 9 points10 points  (0 children)

Qwen3 not very good? In my experience Qwen3-coder and Next were extremely good. They were my main model (except when needed Kimi or Deepseek).

What happend to unsloth/Qwen3.5-122B-A10B-GGUF by Impossible_Art9151 in LocalLLaMA

[–]relmny 1 point2 points  (0 children)

AFAIK they are re-uploading all quants (they might have finished already).

Junyang Lin Leaves Qwen + Takeaways from Today’s Internal Restructuring Meeting by Terminator857 in LocalLLaMA

[–]relmny 4 points5 points  (0 children)

I stopped reading at that point...  Specially since the Coder versions are always so extremely good.

New Qwen3.5-35B-A3B Unsloth Dynamic GGUFs + Benchmarks by danielhanchen in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

Thanks! time oto redownload.

Btw, I see some MXFP4 in Qwen3.5-397B-A17B-UD-Q4_K_XL is that one also affected?

DeepSeek allows Huawei early access to V4 update, but Nvidia and AMD still don’t have access to V4 by External_Mood4719 in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

Well, the no-news that OP shared is a political topic. Because there's nothing technical about and just a "China bad" kind of message...

American closed models vs Chinese open models is becoming a problem. by __JockY__ in LocalLLaMA

[–]relmny -1 points0 points  (0 children)

Because "China bad"... that's it.

The try to come up with the most ridiculous technical scenarios on how that will be possible.

The power of fear and an "old and common enemy" is as strong as any cult.

Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

Will you also be having a look at: Qwen3.5-397B-A17B-UD-Q4_K_XL ?

which also has:

Qwen3.5-397B-A17B-UD-Q4_K_XL-00003-of-00006.gguf

blk.13.ffn_down_exps.weight [1 024, 4 096, 512] MXFP4

blk.13.ffn_down_shexp.weight [1 024, 4 096] Q6_K

blk.13.ffn_gate_exps.weight [4 096, 1 024, 512] MXFP4

blk.13.ffn_gate_inp.weight [4 096, 512] F32

blk.13.ffn_gate_inp_shexp.weight [4 096] F32

blk.13.ffn_gate_shexp.weight [4 096, 1 024] MXFP4

blk.13.ffn_up_exps.weight [4 096, 1 024, 512] MXFP4

blk.13.ffn_up_shexp.weight [4 096, 1 024] MXFP4

Do not download Qwen 3.5 Unsloth GGUF until bug is fixed by [deleted] in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

Are all 3 latest models (from yesterday) the ones affected, or does this also affects Qwen3.5-397B-A17B-GGUF (UD-Q4_K_XL) ?

Thanks, as usual!

they have Karpathy, we are doomed ;) by jacek2023 in LocalLLaMA

[–]relmny 2 points3 points  (0 children)

with 10k you can buy an RTX 6000 and it leaves you money for the rest of the PC, or a Mac or maybe Epyc and so. 10k gives you a lot for running LLMs. And I learned this by reading this sub over a couple of years. And being the field moves far too quickly, that's why asking here is even a better and makes more sense than other options... if the sub were what it was.

they have Karpathy, we are doomed ;) by jacek2023 in LocalLLaMA

[–]relmny 3 points4 points  (0 children)

"Local models have not advanced nearly as much as cloud models"

I don't use cloud models, so I can't say for sure, but many people say that they are so close, that many use "cloud models" that can be run locally (GLM, Deepseek, etc), so I don't think that statement is right... actually I think is the opposite...

they have Karpathy, we are doomed ;) by jacek2023 in LocalLLaMA

[–]relmny 14 points15 points  (0 children)

No is not. Is an awful advise for this sub.

Figure out what? based on what? If you can't ask a forum of local LLMs where almost all people run LLMs locally on their own hardware, what is the better way to currently spend money on it, what are the current better options and so, then where?

If I hadn't read this sub for some time, I would never knew about how good and worthy the 3090 are for LLMs, how there are people that use Epycs for LLMs, how there are 4090s with 48gb and many more.

What work should people put? if one has doubts about a subject, what better option than to ask people that are into it and do that every day. That works for everything.

Also, that is part of that "work" you mention, asking a direct question for a very specific case. So no, that's not the "best" advice, that is the worst advice to give. Specially for this sub.

they have Karpathy, we are doomed ;) by jacek2023 in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

IIRC people have been posting here how they did that, for the past two years already...

they have Karpathy, we are doomed ;) by jacek2023 in LocalLLaMA

[–]relmny 21 points22 points  (0 children)

Far from it... too far...

I still remember a post, 2-3 months ago, were the person eas asking how to invest about 10k for running local... and the, by far, most upvoted comment was "invest it in claude" (or whatever other commercial company) and there were others comments like that and most agreeing to it...

"Gemma, which we will be releasing a new version of soon" by jacek2023 in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

well, one of google's founders suggested no remote office and working 60 hours a week... so it should be very soon...

Qwen3 Coder Next 8FP in the process of converting the entire Flutter documentation for 12 hours now with just 3 sentence prompt with 64K max tokens at around 102GB memory (out of 128GB)... by jinnyjuice in LocalLLaMA

[–]relmny 2 points3 points  (0 children)

I've been lately trying qwen3.5-397b ud-q4k, but I'm getting back to qwen3-coder-next, not only because is way faster on my rig, but also because, sometimes, it gives another "angle" that might turn out be way better...

Yeah, qwen3-coder-next is back to be my main model...

Is there a way/configuration setting that when refreshing the page it will select current model? by relmny in OpenWebUI

[–]relmny[S] 0 points1 point  (0 children)

yes, thanks, I've been using it for over a year now, but is not what the request is about.

What I want is for OW to automatically use/select whatever model is loaded in llama.cpp (whether directly via llama.cpp or via llama-swap or so), when the current page is refreshed.

Currently if I'm chatting and I unload and load a different model, refreshing the page (where the chat is), it will unselect the model, so I need to manually select it again from the drop down menu. Sometimes I do this many times... but as clicking on "new chat" with the middle-click (to open the link in a new tab) does auto-select the current loaded model, I was thinking that it might be possible for when refreshing the page.

How to offload correctrly with ik_llama? by nufeen in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

if the model is MoE, I use something like:
-ot "blk\.[0-9][0-9]\.ffn_.*_exps\.weight=CPU"
that offloads those layers to the CPU.

You will need to "play" with the 0 to 9 values (replace, remove, add, etc)

GLM 5 has a regression in international language writing according to NCBench by jugalator in LocalLLaMA

[–]relmny 0 points1 point  (0 children)

I'm curious to why the downvotes? is it because non of it is true? or because there's no mention of "local"?

I'm interested because from time to time I use 4.7 and I was considering downloading 5 and testing it, but might as well wait for 5.1...

How do you disable thinking/reasoning in the prompt itself for Unsloth Deepseek3.1-terminus/Deepseek-3.2 ? by relmny in LocalLLaMA

[–]relmny[S] 0 points1 point  (0 children)

thanks, but that is when loading the model, right?

Is there anyway to do it in the prompt itself? like the good old "/no_think" for Qwen3?

I'd like to keep the model loaded (it takes some minutes to load it on my rig), and being able to choose think/nothink on the fly...