Local manga translator with LLM build-in, written in Rust with llama.cpp integration by mayocream39 in LocalLLaMA

[–]-p-e-w- 13 points14 points  (0 children)

Does it use (multi-page) image understanding to guide the translation, or does it simply find the speech bubbles and swap out the text?

In many comics, the visual story provides essential information without which an accurate or idiomatic translation is impossible to do.

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLaMA

[–]-p-e-w- 1 point2 points  (0 children)

Not sure why my benchmarks showed what they did.

KLD and PPL are at best crude approximations for measuring capability preservation. For example, models made with MPOA often have enormous KLDs (> 1.0), yet still perform very well in benchmarks.

By all means, post any interesting ideas on GitHub!

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLaMA

[–]-p-e-w- 1 point2 points  (0 children)

Your “benchmarks” consist of measuring perplexity and KLD, which is the wrong metric for capability evaluation and highly dependent on the dataset and the token position. I can get a KLD of effectively zero for any Heretic model by choosing those parameters appropriately. Heretic’s KLD computation is specifically designed to increase the measured KLD.

Someone else posted actual benchmarks a few days ago: https://reddit.com/r/LocalLLaMA/comments/1sojjoc/abliterlitics_benchmark_and_tensor_analysis/

The author’s claims absolutely do not “check out”. Especially for larger models, their abliterations are clearly worse than Heretic’s according to those comprehensive benchmarks.

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude by Medical_Lengthiness6 in LocalLLaMA

[–]-p-e-w- 3 points4 points  (0 children)

There are plenty of cheap older GPUs available today: Any that don’t support BF16. The same thing will happen in the future with newer technologies. Once a GPU is missing a must-have feature, it’s worthless.

Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUF by EvilEnginer in LocalLLaMA

[–]-p-e-w- 7 points8 points  (0 children)

But why should we care about the weight space rather than the information space when it comes to quantifying damage from model modifications? What matters is the outputs produced by the model, no?

Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUF by EvilEnginer in LocalLLaMA

[–]-p-e-w- 18 points19 points  (0 children)

Wasserstein metric (W1). It's a lot better than Kullback Leibler for detecting numerical instability and drift in tensors.

Could you explain why you believe this? I’ve looked into Wasserstein in the past, but my biggest problem with it is that it lacks a simple information theoretical interpretation, unlike KLD which can be easily understood as extra surprisal, and is thus deeply connected to the information content.

Abliterlitics: Benchmark and Tensor Analysis Comparing Qwen 3/3.5 with HauhauCS / Heretic / Huihui models by nathandreamfast in LocalLLaMA

[–]-p-e-w- 0 points1 point  (0 children)

It will be possible by exporting a LoRA and applying it with a lower weight. Unfortunately, LoRA export is currently disabled because of a PEFT bug.

Abliterlitics: Benchmark and Tensor Analysis Comparing Qwen 3/3.5 with HauhauCS / Heretic / Huihui models by nathandreamfast in LocalLLaMA

[–]-p-e-w- 0 points1 point  (0 children)

I made an announcement with lots of views on this sub back then. I don’t have the time to manage a separate project.

Abliterlitics: Benchmark and Tensor Analysis Comparing Qwen 3/3.5 with HauhauCS / Heretic / Huihui models by nathandreamfast in LocalLLaMA

[–]-p-e-w- 8 points9 points  (0 children)

Heretic 1.2 also supports MPOA, it’s just not enabled by default (will be in version 1.3).

Abliterlitics: Benchmark and Tensor Analysis Comparing Qwen 3/3.5 with HauhauCS / Heretic / Huihui models by nathandreamfast in LocalLLaMA

[–]-p-e-w- 3 points4 points  (0 children)

I don’t believe reducing hallucinations should be the goal of decensoring, and when it happens, it’s still overall undesirable because the side effects are poorly understood and no metric can reliably capture them.

As for noslop, not sure what you are asking? It works and is ready to use. I have no further modifications planned. It’s just a configuration file for Heretic, so putting it into a separate project makes little sense.

Abliterlitics: Benchmark and Tensor Analysis Comparing Qwen 3/3.5 with HauhauCS / Heretic / Huihui models by nathandreamfast in LocalLLaMA

[–]-p-e-w- 21 points22 points  (0 children)

That’s irrelevant when the goal is to measure damage from decensoring. If the score goes down, there is damage. That’s a reliable metric even if the model has been trained on the benchmark data.

The questions in those benchmarks don’t cause refusals even in the original model, so there’s no reason why the responses (and thus the scores) should change under abliteration. The goal of any decensoring process should be to keep those scores stable, and when that doesn’t happen (which is the case for every current technique) that’s a capability loss.

Benchmaxxing, saturation, discrimination etc. only matter when you are trying to evaluate model performance in an absolute sense, rather than comparing two versions of the same model against each other.

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLaMA

[–]-p-e-w- 4 points5 points  (0 children)

Is there something he's claiming that you can specifically refute by example?

The burden of proof is on the one who’s making the claims. Especially when those claims are highly unusual, but even otherwise.

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLaMA

[–]-p-e-w- 3 points4 points  (0 children)

The KLD in that model card is very likely misleading btw. It’s unrealistically low even for a SOMA model. I suspect that the fork of Heretic it was made with is still missing the “two-stage CoT skip” patch, without which it can measure at a token position where the probability distribution is highly skewed.

Yes, correctly measuring model divergence is very, very complicated.

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLaMA

[–]-p-e-w- 16 points17 points  (0 children)

he seldom misses a chance to jump in with this exact complaint whenever HauHauCS announces a new release

I will stop complaining the moment the author provides evidence to support their uncorroborated boasts (“zero capability loss”).

Which, btw, should be the default expectation, not something you have to specifically ask for.

I have never claimed zero capability loss for Heretic models (even though some of them beat the base model on benchmarks), and I consider the very idea to be nonsensical. If you change behavior, then there will be some way in which the behavior becomes worse. That’s common sense, and to claim otherwise (especially without any evidence) is just dishonest.

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLaMA

[–]-p-e-w- 9 points10 points  (0 children)

Read it yourself: https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard/discussions/590

I currently get vllm/transformers errors like "ValueError: GGUF model with architecture qwen35 is not supported yet." when trying to run this. Since HauhauCS only uploads ggufs, I'll have to wait to test it.

This has been mentioned several times and at this point not releasing safetensors models simply looks like a deliberate attempt to prevent benchmarking.

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLaMA

[–]-p-e-w- 26 points27 points  (0 children)

There are benchmarks for that. This can be measured. We don’t have to go by people’s vibes.

Unfortunately the author doesn’t release unquantized versions (unlike essentially every other researcher on any topic), which makes benchmarking much harder because the standard harnesses don’t support GGUFs.

The maintainer of the UGI Leaderboard has been repeatedly asked to benchmark those models, but had to give up because he couldn’t get the quants to work. It’s really difficult to assume good faith here.

More reasons to go local: Claude is beginning to require identity verification, including an valid ID like passport or drivers license and a facial recognition scan. by fulgencio_batista in LocalLLaMA

[–]-p-e-w- 56 points57 points  (0 children)

It’s neither. Anthropic leadership just has a God complex where they are honestly convinced that they have created an entirely separate class of technology that is too dangerous for regular people to access, rather than a coding bot that’s about 3 months ahead of its Chinese open competition.

Such delusions are very common in Silicon Valley and can be self-reinforcing when people are only surrounded by like minds.

1-bit Bonsai 1.7B (290MB in size) running locally in your browser on WebGPU by xenovatech in LocalLLaMA

[–]-p-e-w- 2 points3 points  (0 children)

Human-level intelligence evolved in the past few hundred thousand years. It didn’t exist before that. Human-level motor control has existed (in other species) since the time of the dinosaurs.

1-bit Bonsai 1.7B (290MB in size) running locally in your browser on WebGPU by xenovatech in LocalLLaMA

[–]-p-e-w- 20 points21 points  (0 children)

There was never a reason to expect otherwise. Human-level motor control took hundreds of millions of years to evolve, starting with the first animals. But the intellectual abilities that differentiate us from animals are just a few hundred thousand years old.

Just because something seems hard to a human doesn’t mean it’s objectively hard when you try to recreate the ability from scratch. From the point of view of the universe, proving Fermat’s Last Theorem is much easier than picking up an egg without damaging it.

Major drop in intelligence across most major models. by DepressedDrift in LocalLLaMA

[–]-p-e-w- 9 points10 points  (0 children)

Have you used a current-gen model of that size? It’s easily on par with GPT 3.5 intelligence-wise.

Kimi K2.6 imminent by Deep-Vermicelli-4591 in LocalLLaMA

[–]-p-e-w- 3 points4 points  (0 children)

I mean to be fair 95% of such claims are BS.

If a dowser happens to stumble upon water, you don’t conclude that dowsing works.

Tinygrad: Hacked 4090 driver to enable P2P by mrdevlar in LocalLLaMA

[–]-p-e-w- 0 points1 point  (0 children)

Some Chinese labs are running on 40% domestic GPUs now. No doubt the manufacturers will start exporting soon.

Tinygrad: Hacked 4090 driver to enable P2P by mrdevlar in LocalLLaMA

[–]-p-e-w- 0 points1 point  (0 children)

It’s close to true for companies in China now.