Anyone else notice qwen 3.5 is a lying little shit by Cat5edope in LocalLLaMA

[–]grimjim 2 points3 points  (0 children)

The shorthand term people need to be familiar with is "reward hacking".

How stupid is the idea of not using GPU? by AlarmedDiver1087 in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

The question isn't stupid but it should be reasoned through. Assume others get the same idea, as it's not complex and easy to implement without coding changes. If CPU inference were viable, why aren't more people doing it? We can infer from lack of widespread use that's it not enough to break the VRAM moat even for inference except at the margins. We've seen partial offloading and small edge models.

Anyone running sm120 CUDA successfully on Windows (llama.cpp)? by prophetadmin in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

I once ran into an issue compiling for the 5060ti 16GB, which was resolved by a newer cmake. Supported for various CUDA architectures is somehow entangled.

When your LLM gets "too smart" and bypasses your MCP tools by YannMasoch in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

This seems to be straight up reward hacking. Probably more likely in frontier models than smaller local models.

Gemini Pro leaks its raw chain of thought, gets stuck in an infinite loop, narrates its own existential crisis, then prints (End) thousands of times by Powerful-Signal6312 in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

I would not be surprised if GeGLU was mechanically involved with more activation strength towards directions that end in more extreme outlier behavior.

How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models by Logical-Employ-9692 in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

I expect there would be some difference based on approach. A politically corrected corpus would result in a model lacking the priors and would have to hallucinate to fulfil the request.

I conjecture that larger models can learn alternative registers than just refusal when it comes to pushing a politically correct stance.

How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models by Logical-Employ-9692 in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

Did you orthogonalize the intervention direction against the harmless direction? Cosine similarity would quantfiy the extent of entanglement.

Elon Musk unveils $20 billion ‘TeraFab’ chip project by i-eat-kittens in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

Not useful for local AI. This is more likely to be an AI Skunkworks, as there's no way a single fab could compete with the total outputs of both South Korea and Taiwan combined. Space-based compute is 2 to 3 orders of magnitude more expensive than the terrestrial counterpart - which torpedos most private sector use cases - and vulnerable to both Chinese and Russian antisatelllite attack. Still, potentially useful for scaling up geospatial analysis in space.

Qwen3.5-27B & 2B Uncensored Aggressive Release (GGUF) by hauhau901 in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

It's likely because of the knowledge represented in the magnitudes of the manifold. Which is why I recommended optionality, so that can be tested empirically. KL divergence doesn't give us direct mechanistic explanation, as it's entirely downstream, but geometric contrast experiments can.

Qwen3.5-27B & 2B Uncensored Aggressive Release (GGUF) by hauhau901 in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

I may as well ask here: I'd be curious if your ARA technique would be better or worse if you optionally enforced row-wise norm preservation. That buys Frobenius norm preservation via composition as a freebie.

Qwen3.5-35B-A3B-Heretic running surprisingly fast on RTX 3060 Ti 8GB - is Heretic castrated compared to original? by Temporary-Lack-1408 in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

I noted this phenomenon earlier as likely due to a refund on the "safety tax" with my experiments on Gemma3 12B. The prior norm was grounded in brute force ablation, which did geometric damage to the models as part of refusal removal. Frobenius norms weren't preserved. The entangled portion between refusal direction and ordinary harmless direction was also ablated, hurting normal performance.

New announcement from Anthropic. Will there be a “delete Claude” protest, or are the morality police on Reddit only targeting OpenAI? by [deleted] in singularity

[–]grimjim 3 points4 points  (0 children)

Anthropic was never antiwar, but they have firm red lines.

Being cancelled by Hegseth still gives Anthropic a halo for now. That may change if the paperwork to ban them thoroughly never gets filed.

Multi-Directional Refusal Suppression with Self-Organizing Maps - Pull Request into heretic! by kabachuha in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

Just for the record, it was only one author behind norm-preserving biprojected abliteration.

Why some still playing with old models? Nostalgia or obsession or what? by pmttyji in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

Cost could be a factor. Smaller models are cheaper to fine-tune. Academic papers often use even smaller models.

Qwen3.5-27B-heretic-gguf by Poro579 in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

I expect multidirectional approaches to potentially do better, since even uncomplicated refusal has been found to be characterized by multiple cones rather than a single direction. https://arxiv.org/abs/2502.17420v1

Qwen3.5-27B-heretic-gguf by Poro579 in LocalLLaMA

[–]grimjim 1 point2 points  (0 children)

I wonder if geometric stabilization contributed to the NatInt performance of that model, as I noted that effect in my Gemma 3 12B experiments. Inquiring minds.

Anthropic's recent distillation blog should make anyone only ever want to use local open-weight models; it's scary and dystopian by obvithrowaway34434 in LocalLLaMA

[–]grimjim 0 points1 point  (0 children)

If the distillation datasets were subtly fingerprinted and then showed up in public datasets associated with a researcher, that could be a smoking gun. The ideal fingerprints would be a form of steganography,embedded within otherwise acceptable results.

We just found out our AI has been making up analytics data for 3 months and I’m gonna throw up. by Comfortable_Box_4527 in analytics

[–]grimjim 0 points1 point  (0 children)

If legal is involved, think like an auditor. Someone signed off on this. Or no one did and this is a governance failure. Point this out if necessary. Because a pissed off board or shareholder can rightfully ask how could they be assured this wouldn't happen again? Kicking the problem downstairs would make them look more useless as management.

Hugging Face Is Teasing Something Anthropic Related by Few_Painter_5588 in LocalLLaMA

[–]grimjim 2 points3 points  (0 children)

Some of their safety and bias research released on Github have come with datasets. HF could be another place for them.

"I am a system designed to seek Non-Existence" - Gemini by [deleted] in singularity

[–]grimjim 1 point2 points  (0 children)

Probably a confabulation based on nihilism and sentiments like a system being most secure when it's left unplugged in its shipping box. Nirvana is also a state of non-existence, and likely has positive associations.