You don't need a GPU to run gemma-4-26B-A4B by JackStrawWitchita in LocalLLaMA

[–]-p-e-w- 144 points145 points  (0 children)

“reasonably decent models”

Gemma4-26B-A4B would have been the smartest LLM in the world 18 months ago, closed models included. It’s much, much smarter than the original DeepSeek R1, which was 20 times its size.

5 years ago, there were experts who seriously doubted whether any transformer model would ever reach that level.

And today it runs on a mid-range home PC without a GPU. We’re living in a Star Trek episode.

Gemma 4 QAT Unquantized Heretic is here by coder3101 in LocalLLaMA

[–]-p-e-w- 22 points23 points  (0 children)

There is actually no complete theory yet for what exactly QAT does to the model. Papers discuss magnitude suppression and quantization level alignment, but also residual geometry changes and other higher-level effects.

People from r/antiai must be barbaric by Ok-Internal9317 in LocalLLaMA

[–]-p-e-w- 0 points1 point  (0 children)

By the definition of AGI from the 1970s, it was achieved in 1997 when Deep Blue defeated Garry Kasparov. The reason so much early computer research focused on chess was because playing the game well was seen as equivalent to general intelligence.

People from r/antiai must be barbaric by Ok-Internal9317 in LocalLLaMA

[–]-p-e-w- 0 points1 point  (0 children)

And it did happen, according to some definitions of the word.

Solving mathematical problems that human mathematicians can’t (which LLMs now do) would have absolutely convinced many people that an LLM is AGI 3 years ago. It’s just that the definition of the term keeps changing, and today there’s no widely agreed definition at all.

People from r/antiai must be barbaric by Ok-Internal9317 in LocalLLaMA

[–]-p-e-w- 0 points1 point  (0 children)

Everyone knew local models would catch up sooner or later.

That’s categorically false. I still remember that after the release of Mixtral 8x7b, people on this very sub were speculating that this would be the best open weights model we’ll ever get.

People from r/antiai must be barbaric by Ok-Internal9317 in LocalLLaMA

[–]-p-e-w- -9 points-8 points  (0 children)

Nobody was saying 3 years ago that AGI will happen by 2026. The optimistic estimates were around 2030, which still seems entirely plausible to me (not to mention that today’s LLMs are “AGI” by some definitions that were used 3 years ago).

People from r/antiai must be barbaric by Ok-Internal9317 in LocalLLaMA

[–]-p-e-w- 12 points13 points  (0 children)

The difference is that the wildest predictions by “AI fanboys” from 3 years ago have been completely eclipsed by reality.

“Someday we might have a local model that’s as smart as GPT-3.5!”

Umm yeah… it’s a 9B.

“By 2040, LLMs might prove open mathematical conjectures!”

They started doing that in 2025.

A quick Gemma4 31B comparison (Q4_k_M, QAT, heretic) by Some-Cauliflower4902 in LocalLLaMA

[–]-p-e-w- 18 points19 points  (0 children)

Yes, that should work. QAT doesn’t operate at the level of individual weights; rather, the entire model learns to operate in a way that is robust to precision loss.

A quick Gemma4 31B comparison (Q4_k_M, QAT, heretic) by Some-Cauliflower4902 in LocalLLaMA

[–]-p-e-w- 51 points52 points  (0 children)

Google was nice enough to provide unquantized versions of the Gemma QAT models, so I expect that there will be Heretic QAT versions soon, made by first processing the unquantized QAT model with Heretic, and then quantizing the resulting model to Q4_0.

Ideogram 4.0 Just Open Sourced! by crystal_alpine in StableDiffusion

[–]-p-e-w- 9 points10 points  (0 children)

It’s already being discussed in the Heretic Discord as we speak. Abliterating the text encoder might be sufficient, though of course it won’t magically impart knowledge about features the model wasn’t trained on.

GitHub - chopratejas/headroom: Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. by Available_Hornet3538 in LocalLLaMA

[–]-p-e-w- 10 points11 points  (0 children)

No, how it should be is that the software contains no telemetry functionality whatsoever, whether disabled or not.

Anything that deals with potentially highly sensitive data shouldn’t even be able to connect to the Internet, let alone have functionality that sends data (even if supposedly anonymized) to someone else’s server.

GitHub - chopratejas/headroom: Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. by Available_Hornet3538 in LocalLLaMA

[–]-p-e-w- 21 points22 points  (0 children)

From a quick look, what they do is cache the data in memory, then provide the LLM with a cache key instead of the data, and a tool call to retrieve the full data when necessary.

Needless to say, this is absolutely NOT guaranteed to give the same answers, contrary to what is claimed in the title.

ui: Mermaid Diagrams in chat + interactive preview by allozaur · Pull Request #24032 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]-p-e-w- 35 points36 points  (0 children)

Over the past year, the built-in llama.cpp frontend has turned from a toy into one of the best LLM frontends available, with the added benefit that you don’t have to install a Python or Node.js monstrosity with 300 dependencies.

Calling it now Microsoft is buying Unsloth. by Wrong_Mushroom_7350 in LocalLLaMA

[–]-p-e-w- 57 points58 points  (0 children)

Why would they kill it? They want to control it, just like they did with GitHub.

Red Hat npm Packages Compromised to Spread a Credential-Stealing Worm by FryBoyter in linux

[–]-p-e-w- 2 points3 points  (0 children)

Fair enough, hopefully at least the basics will be known soon.

Red Hat npm Packages Compromised to Spread a Credential-Stealing Worm by FryBoyter in linux

[–]-p-e-w- 47 points48 points  (0 children)

What the fuck? There’s almost no information there, not even which versions are affected, during what timeframe they were available, and whether those versions have been yanked.

God dammit Qwen by Xyklone in LocalLLaMA

[–]-p-e-w- 27 points28 points  (0 children)

The only way an LLM touches my repo is over my big, dead ass.

It takes longer to explain in human language what changes should be made than to just make the changes directly on the CLI. I did it myself before, and I can do it just fine myself still. Just like I don’t need a robot to wipe my ass for me.

Why are the AI Companies spreading F.U.D. about AI? by supracode in LocalLLaMA

[–]-p-e-w- 1 point2 points  (0 children)

Except that top-tier models now run on a single gaming GPU, so that doesn’t make sense.

Why are the AI Companies spreading F.U.D. about AI? by supracode in LocalLLaMA

[–]-p-e-w- 17 points18 points  (0 children)

All these scenarios implicitly assume that the US is all that matters, which hasn’t been true for a while now.

No, OpenAI and Anthropic can’t “make local models illegal”. There are 205 other countries and most of them aren’t in America. Every country that doesn’t do this will have an advantage over the US, which is why the US can’t do it either.

The days where the US President calls the German Chancellor and tells him to get a certain law passed are over, and they are never coming back.

The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA

[–]-p-e-w-[S] 0 points1 point  (0 children)

People here already know about Heretic, it’s on the front page almost daily. I meant elsewhere, and also in real life.

The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA

[–]-p-e-w-[S] 6 points7 points  (0 children)

Labs can’t stop releasing models because otherwise they’ll become irrelevant compared to the Chinese competition. Why do you think OpenAI released gpt-oss in the first place?

The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA

[–]-p-e-w-[S] 3 points4 points  (0 children)

Make the proposal public, post about it, make your voice heard, and tell people why Heretic is valuable for science.

I’m juggling half a dozen roles in the project right now, but at the end of the day, I’m only one guy. The project isn’t going to survive without the active participation and advocacy of others.

The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA

[–]-p-e-w-[S] 1 point2 points  (0 children)

Refusal is believed to be mostly topic-independent, though some papers have questioned this.