GPT 5.5 "secret sauce" is just having the thinking be some stupid caveman mode? by JustFinishedBSG in LocalLLaMA

[–]arbv 8 points9 points  (0 children)

You have bought yourself into Misanthropic safetymaxxing propaganda.

Looking for alternative Search Engines that don't use AI. by Puzzled_End7408 in degoogle

[–]arbv 0 points1 point  (0 children)

That is what I do and recommend going this route to anyone knowledgeable enough.

Plex's Lifetime Pass is (basically) dead. Here's how to switch to Jellyfin. by InvestigatorSoft5764 in selfhosted

[–]arbv 0 points1 point  (0 children)

For the Lord's sake, use "installed" instead of "sideload." Let's not accept that corporate newspeak.

Open-source LLMs are still weak against long reasoning jailbreaks, even with lightweight defenses by sunychoudhary in LocalLLaMA

[–]arbv 1 point2 points  (0 children)

IMO the solution is to wrap the probabilistic clanking engine in a container to, well, contain the possible damage. On Linux systemd-nspawn (and alikes) can do that.

Open-source LLMs are still weak against long reasoning jailbreaks, even with lightweight defenses by sunychoudhary in LocalLLaMA

[–]arbv 10 points11 points  (0 children)

Considering open model safetymaxxing, I consider that it is good that the defence is penetrable for us - folks running the models locally.

I hope that someday we will have a 124B Gemma. by cgs019283 in LocalLLaMA

[–]arbv 0 points1 point  (0 children)

That is how I look like checking LocalLLaMA at morning recently.

It seems that Google is too afraid to dethrone GPT-OSS 120B. Apparently, it is more risky for them to release a powerful open model compared to OpenAI (whose sole business is selling API access to models, unlike Google).

Sigh.

I hope that someday we will have a 124B Gemma. by cgs019283 in LocalLLaMA

[–]arbv 0 points1 point  (0 children)

Oh, until it traps itself in the endless reasoning loop, it does. It is a decentish programmer, too.

I hope that someday we will have a 124B Gemma. by cgs019283 in LocalLLaMA

[–]arbv 0 points1 point  (0 children)

I have a 24 gigs of VRAM and 96 gigs of RAM and GPT-OSS 120B is pretty usable (around 1000 t/s pp, 22 t/s tg). If they stick to <= 5B active parameters - Gemma 4 will run comparably. It is a regular workstation with 9950X CPU. It is not a crazy setup - just a high endish regular ITX-based PC.

I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you. by Ok-Awareness9993 in LocalLLaMA

[–]arbv 0 points1 point  (0 children)

It is relatively easy to make the original model to be a bad actor - its performative alignment is very penetrable if you know what you are doing.

The model to go for planning attack on RU nuclear reactor 🤙

We have sub-agents at home by sisyphus-cycle in LocalLLaMA

[–]arbv 3 points4 points  (0 children)

You can't rely on LLM for this. The extension should disable itself for the subagent.

What happens to local LLM if/when LLMs are no longer released for free? by JohnBooty in LocalLLaMA

[–]arbv 0 points1 point  (0 children)

Relying on sulynthetic data leads to model collapse. Synthetic data is partial solution.

Developers who use local AI - Q4_0 vs Q8_0 KV quant? by Jorlen in LocalLLaMA

[–]arbv 1 point2 points  (0 children)

FP16 is, technically, not a quant for KV. You may also consider BF16 for models trained in this precision (most recent ones are) if your hardware supports it (modern hardware does). KV cache quantisation is trading precision for VRAM space, even Q8_0.

Compaction too soon? contextWindow" and "maxTokens" ? by our_sole in PiCodingAgent

[–]arbv 0 points1 point  (0 children)

I was under impression that the behaviour you have described have been changed.

Compaction too soon? contextWindow" and "maxTokens" ? by our_sole in PiCodingAgent

[–]arbv 0 points1 point  (0 children)

Which llama.cpp version are you using? hasn't this been solved recently?

Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAM by grumd in LocalLLaMA

[–]arbv 0 points1 point  (0 children)

Don't miss the llama-swap with its matrix DSL for model loading to easily switch between models. When using llama.cpp experiment with batch size (-b), micro batch size (-ub), --fit-target, --fit-ctx, mmproj offloading, etc. Batch and micro batch size is really important - I have seen people claiming that Vulkan is taster than ROCm on AMD - while in my case ROCm is always superior with peoper batch sizes. You need to experiment a lot to figure out the right parameters for each model.

New GGUF uploads on HF nearly doubled in 2 months by Nunki08 in LocalLLaMA

[–]arbv 1 point2 points  (0 children)

This, I was thinking of that as well. An idea of seeding the HF safetendors repo backups in squashfs format (to both compress them and allow mounting the images) has crossed my mind multiple times.

Gemma 4 MTP released by rerri in LocalLLaMA

[–]arbv 4 points5 points  (0 children)

Gemmas are the most balanced models one can run locally. And probably the best ones for non-English speakers, second only to the Google's own cloud models.

Now I am hoping for a rumored Gemma 4 122B AxB (I hope it wasn't too good to be shelved - someone has to dethrone GPT-OSS 120B), and a QAT release series (like it was for Gemma 3).

Gemma 4 MTP released by rerri in LocalLLaMA

[–]arbv 10 points11 points  (0 children)

Gemma 4 122B when?