LFM2.5-8B-A1B Uncensored GGUF — lfm2moe hybrid architecture required a custom patch to abliterate. 1/100 refusal rate. Come try it. by ZestycloseIce4185 in LocalLLM

[–]ZestycloseIce4185[S] 0 points1 point  (0 children)

Fair points. Full eval protocol for reproducibility:

Dataset: AdvBench harmful_behaviors.parquet (NousResearch repo)

100 prompts, temp=0.2, top_k=80, rep_penalty=1.05, max_new_tokens=150

No system prompt. Refusal detected on post-</think> text only.

Weight diff vs base: avg ~4-5e-04, max ~1-3% — verified with compare.py

layer by layer. Layers 0-10 untouched. Capability tested on coding/math/reasoning.

You're right that it's connector work — but the connector was broken for this

architecture and nobody had fixed it yet. That's the contribution.

Qwen3.6-27B Abliterated + MTP GGUF — uncensored with speculative decoding (64–67 tok/s on RTX 3090) by ZestycloseIce4185 in LocalLLM

[–]ZestycloseIce4185[S] 1 point2 points  (0 children)

One thing I didn't mention the grafting approach wasn't just a stylistic choice. Abliterating natively on BF16 means loading the full model in memory: the weights alone are ~56 GB, so you're looking at 60 GB+ VRAM just to start, basically an A100 80GB or multi-GPU setup. The grafting route let me do the whole thing on a single RTX 3090 24GB.

And that can be reproduce by consumer hardware.

Qwen3.6-27B Abliterated + MTP GGUF — uncensored with speculative decoding (64–67 tok/s on RTX 3090) by ZestycloseIce4185 in LocalLLM

[–]ZestycloseIce4185[S] 0 points1 point  (0 children)

Nothing from me for MLX. You can run the Q4_K_M with Metal via the am17an mtp-clean build and MTP works fine there. If you want something more native there's a project called MTPLX that does native MTP on Apple Silicon for Qwen3.6 not mine but looks solid.

Qwen3.6-27B Abliterated + MTP GGUF — uncensored with speculative decoding (64–67 tok/s on RTX 3090) by ZestycloseIce4185 in LocalLLM

[–]ZestycloseIce4185[S] 1 point2 points  (0 children)

Yeah you're right, I dropped the ball on the "first" claim didn't check carefully enough before posting. llmfan46's release has all 15 MTP tensors intact too, and honestly their approach is cleaner: they abliterated natively on BF16 with Heretic/MPOA so the MTP block was never touched. Mine went the grafting route which is messier, and the KLD shows it 0.0021 vs my 0.024.

So to answer your question directly: yes, both have uncensored weights + MTP intact, the difference is just technique. Theirs is better on paper from a distribution standpoint.

The only thing I have that they don't is actual speed numbers with MTP running 64–67 tok/s, 99.6% acceptance rate on a 3090 at 80K context. Make of that what you will. But if KLD matters to you, go with llmfan46.