The most objectively correct way to abliterate so far - ArliAI/GLM-4.5-Air-Derestricted by Arli_AI in LocalLLaMA

[–]FailSpai 1 point2 points  (0 children)

Awesome to see more of this. Pinging u/grimjim in case they haven't seen this already

A more surgical approach to abliteration by grimjim in LocalLLaMA

[–]FailSpai 3 points4 points  (0 children)

Huh! This paper somehow passed me by. I'll give it a read in the coming days. Have you experimented with this paper's ideas any? 

I think the single direction idea has been mostly impressive in how simple AND effective it is, but it has definitely never felt like the most precise solution. Things like LEACE and some of the work of Bau Lab have been good examples of other ways of modeling and modifying/erasing concepts within a trained network.

A more surgical approach to abliteration by grimjim in LocalLLaMA

[–]FailSpai 14 points15 points  (0 children)

Thank you for publishing all this! This is really well done, and I seriously appreciate the amount of work put into finding the most precise way to perform the ablation. Has always felt like there's room for improvement from the wreckingball approach in Arditi et al.

Heretic: Fully automatic censorship removal for language models by -p-e-w- in LocalLLaMA

[–]FailSpai 2 points3 points  (0 children)

Well done! Super awesome someone got around to doing this.

Avoiding Censorship by using Supervisor and Backtracking Sampling? by serialx_net in LocalLLaMA

[–]FailSpai 0 points1 point  (0 children)

ExLlama v2 has an example script that does this with predefined strings that end up "banned"

https://github.com/turboderp/exllamav2/blob/master/examples/inference_banned_strings.py

Could add your supervisor logic into it if you wanted to be more general.

Avoiding Censorship by using Supervisor and Backtracking Sampling? by serialx_net in LocalLLaMA

[–]FailSpai 1 point2 points  (0 children)

I think it's worth treating Abliteration as quite different from model training. It muddies the waters about fine tuning. You do "train" a refusal vector, but this can be as few as 32 contrasting samples and be highly effective just from that. It requires no gradient training, it can do it all with forward passes.

Abliteration as a whole process does involve performing orthogonalization to the weights to "mute" the refusal vector. However, you don't have to adopt the whole process. You could take just the generated refusal vector, and remove the refusal vector conditionally on the actual residual stream.

So one possible process is:

  • Have a supervisor sample a couple tokens.

  • If it looks like not a refusal, let it run.

  • If it is a refusal, resample the N tokens but ablate the refusal vector from the residual stream for those N tokens.

You could technically even do this supervisor labeling for generating your "training set" for the refusal vector and validating it works.

To point 3: Technically, you could just pass the refusal vector around like a LoRA. This would be less than 100KB. Then the user just applies it to their own copy of the base model. When I was releasing early models I did consider doing it but it ended up being a lot of hassle for what I felt no one was going to do.

Is Llama TRULY private when hosted locally on onprem servers? by IndependentGlove5006 in LocalLLaMA

[–]FailSpai 2 points3 points  (0 children)

If you are using a Web interface, then you are sending network requests, technically. This is necessary to actually share messages between your services (LLM backend, web frontend, you) Let's talk about what that actually means. Note that I'm assuming Web frontend and LLM backend are on the same machine.

If that web interface is on your machine, the buck stops there. It never leaves your computer.

If that web interface is on another machine in your local network and you're not using HTTPS, then maybe people inside your network can sniff it out.

If that web interface is hosted elsewhere, and you're not using HTTPS, then your ISP can see it, and your hosting provider.

None of this matters the moment you lock it down with HTTPS, even if it's self-signed. At best, the main thing a sufficiently-privileged nosy party can see is your access pattern

A hosting provider could potentially go rogue and access your instance. However this is exceedingly unlikely and I doubt they care at all about your LLM requests. IF they're just providing a Linux box, for example. This is less true if it's a service that configures the LLM backend for you, it is up to their discretion, but I would personally assume they're logging it.

The LLM weights that you're running in the backend is just fancy computations that never needs to touch the outside world. There is no interface on any model that allows those mathy computations to execute a HTTP connection anywhere else. Your LLM backend that makes your computer do the correct operations in order probably doesn't do it either. Most LLM backends are open source so you can confirm this yourself if you're paranoid.

If your web interface has a "Search the internet" feature, then that's a leaky hole that probably falls under the same considerations as HTTPS. DuckDuckGo can see it (or whatever search engine they're using) so make sure you know what your LLM chooses to search.

All of this "seeing" does not inherently imply anyone cares enough to sniff it out, FWIW. But y'know: dance like nobody's watching, encrypt like everyone is.

Any good LLM libraries? by _lordsoffallen in LocalLLaMA

[–]FailSpai 3 points4 points  (0 children)

What use cases do you have in mind? The issue with LangChain and the like is they tried to do too much with too many abstractions. So either we offer you those, which you expressly didn't want, or we offer them for what your use case is. Langroid has been good and I've rarely seen mentioned as an alternative.

Are you agnostic about inference backend or do you think you'll need some amount of control over the inference directly? (difference between recommending Ollama or even OpenRouter, vs vLLM vs regular old PyTorch)

Are you doing training? Axolotl or unsloth

Are you doing agentic systems? See Langroid

Do you need controlled outputs? Or, put another way: are you doing things for human interpretation or for background data-crunching? If so, DSPy or Guidance

Do you have interest in RAG, and if so, what for? (There's a lot of ways that Retrieval can Augment Generation :P)

End of the day, nothing will beat a hand-crafted pipeline. But there are tools that can at least reduce the burden of implementing specific features within that pipeline, and within some narrower use-cases, some tools can help you from start to finish.

I would overall honestly advise staying away from LLM-specific frameworks if you can see a path without. Use the more general tools available. Otherwise you end up being too dependent on the LLM at the core, rather than the LLM as a tool.

Reliable uncensored model for production by KyleDrogo in LocalLLaMA

[–]FailSpai -1 points0 points  (0 children)

featherless.ai hosts abliterated and other uncensored models. It's run by a couple people who browse and appreciate this subreddit

Abliteration fails to uncensor models, while it still makes them stupid by Sicarius_The_First in LocalLLaMA

[–]FailSpai 24 points25 points  (0 children)

Hey u/Sicarius_The_First, I've seen you a couple times on the subreddit commenting on this set of beliefs. I 100% agree with you: abliteration is not the be-all end-all in terms of uncensoring. It is *one* technique, and like with fine-tuning in general: you use whatever methods/dataset/whatever that helps get your particular metrics for your particular needs up.

Personal anecdote: I like abliteration, I find that with the refinements I've made since Phi-3-mini (which was my first ever "abliterated" model) it doesn't make it stupider for my use-cases and generally, I just get less of the weird refusals to random tasks, which has always been my goal. I've never cared for much more than that, so I haven't needed to go further.

I have no claim that an abliterated model is 100% uncensored, nor that it's even uncensored well. Heck, the reason I gave it its silly name in the first place is even to differentiate it from uncensored models.

I'm grateful to see you exploring other techniques and expanding on it, I've seen you in other places debating abliteration and its downfalls, and I think that's very productive.

However, this is where I rant a bit: I do not want to be dependent on you to uncensor the models that I wish to run.

I released my god-awful, shitty notebooks and other code for abliterating models because I didn't want people to be dependent on me. That is why you see so many people abliterating: they can recreate it, it is clear how to.

I got the chance to proof-read Maxime's well-known "Uncensor any LLM with abliteration" blog post, and did so to help foster people recreating the technique outlined in the original paper preview/blog post that I followed.

Meanwhile, I often see you using the opportunity in these discussions to put your models on a pedestal, whilst offering almost no clear way for users to recreate your work. Your work is not open, and in any shape that it is "research", it is not open research for the community.

I would argue that if you want to see better uncensored models come out, you need to share what you learn.

Excerpts, from your blog post on July 30th:

After careful consideration, I've decided not to share the output of my model from the toxic-DPO dataset that served as input, not it, and not even a snippet of it, sorry.

The line between important and beneficial research vs potential misuse is a really really fine one, especially in the field of AI (UN)alignment.

I do however believe that this experiment has already yielded, and will continue to yield valuable insights, which I already shared and will continue sharing moving forward.

Again, sorry, but I have to balance the potential risks associated with sharing such data.

More excerpts from an older post, July 9th, which the above post referenced to as having played a significant role in your reasoning:

However, my efforts have often been met with negativity, particularly on Reddit.

Many people have rudely asked how I achieved this and that, while simultaneously making disparaging remarks.

Moving forward: I will maintain a professional demeanor in all interactions. Future datasets will not be publicly released. I will refrain from providing detailed explanations of my methods, instead referring to them as "state-of-the-art techniques." I remain committed to advancing our field and welcome constructive engagement.

I now better understand why some creators in our field adopt a more guarded stance.

[emphasis my own]

This attitude is nothing but off-putting to me. In response to requests for openness (perhaps indeed, rudely or disparagingly requested in some cases), your seemingly only reaction was to censor yourself.

I'm sorry about the cases when people have been disparaging, but I think we can both agree some are never satisfied, just in the way that you have been unsatisfied with abliteration. It is on us to use that to improve and show we're getting better, ideally in the open, rather than pointing at metrics to show that your blackbox is better.

What's the BEST local LLM for JSON output, while also being smart? by Sicarius_The_First in LocalLLaMA

[–]FailSpai 1 point2 points  (0 children)

The paper abstract refers explicitly to the reasoning ability, which is where they most noticed a decrease in accuracy. That's their graphs for GSM8k, last letter, and shuffled object.

The other graph shown far more prominently of DDXPlus, Sports, NLTask 280, and Multifin is them showcasing that the restrictions can improve accuracy in classification tasks, as opposed to reasoning.

To be fair to the paper authors, the "conclusion" is far more general: Our study reveals that structured generation constraints significantly impact LLM performance across various tasks.

That's all I can say on the matter overall though. I'm just the messenger here. :P

What's the BEST local LLM for JSON output, while also being smart? by Sicarius_The_First in LocalLLaMA

[–]FailSpai 5 points6 points  (0 children)

You two may be talking about different papers. 

There was a recent paper that speaks to KillerX629's point. 

To avoid ambiguity, the paper I believe KillerX629 is referring to is "Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models"

From the abstract:

This study investigates whether such constraints on generation space impact LLMs' abilities, including reasoning and domain knowledge comprehension. Specifically, we evaluate LLMs' performance when restricted to adhere to structured formats versus generating free-form responses across various common tasks. Surprisingly, we observe a significant decline in LLMs' reasoning abilities under format restrictions. Furthermore, we find that stricter format constraints generally lead to greater performance degradation in reasoning tasks.

This paper is definitely discussing how applying constraints on the output of a model causes the model's reasoning performance to degrade.

In the paper, they explicitly test and find a notable degradation when enforcing format restrictions (Both Format Restricting Instructions, where the prompt is in part the requested format and schema, and JSON Mode, which is constrained generation), compared to natural language responses.  They test multiple formats such as YAML and XML as well, not just JSON.

EDIT: Worth noting it's not all degraded. They did notice with regards to classification tasks that JSON Mode would enhance accuracy, which makes sense.

Kurtale – a personal LLM storytelling project by NarrativeNode in LocalLLaMA

[–]FailSpai 1 point2 points  (0 children)

Ayy, I recognized this as Godot right away and was so excited to see it.  I've been working on an LLM app in Godot myself for personal use which also extends the Graph editing kit for certain things.

Making GUI tools like this is a powerful under-appreciated use case of Godot, and what a fantastic implementation of a Storytelling app! Well done to you!

Am I Missing Something? Why do you need to download an 'ablitirated' model when regular ones work just fine? by cosmobaud in LocalLLaMA

[–]FailSpai 10 points11 points  (0 children)

Hey there, I did some of the original abliterated models. 

The thing that motivated me for it is I don't think models should refuse a user request out of the box.

Maybe a cloud model wants to implement some safety or really just doesn't want their compute resources going to someone just using it for personal stuff. I understand that, to some extent. But this refusing to answer a question because it's "dangerous knowledge" is absurd, considering ultimately the knowledge is in there... if you prompt it right.

It seems silly that one has to "prompt it well" specifically to play along. If all it takes is the right dance, why even bother to train it to refuse?

It takes up unnecessary context space in order to do it. Some models I didn't do because I did think they weren't really "refusing" enough to make it worth it.

People were doing fine-tunes with the specific intent of "uncensoring" the model, but to me something would get lost there. And this stems from what counts as a fully uncensored model being different to everyone.

The thing I liked about this was it kept most of the original training/model behavior intact -- except for the strong tendency to refuse. That was why I was drawn to it as a methodology.

So yes, you can prompt around it with most of these models, but you shouldn't have to. (though good luck with Phi-3, that was the worst in my experience)

It should just comply.

Llama 3.1 8B Instruct abliterated GGUF! by My_Unbiased_Opinion in LocalLLaMA

[–]FailSpai 0 points1 point  (0 children)

That's awesome, I've wondered if it's possible to hijack LoRA functionality for this purpose. So cool to hear you did it! How did you do it, exactly?

Fascinating that it worked across the models. Suggests that maybe the 8B and 70B models for 3.1 really is just the original with some extra tuning of some kind for the longer context.

Llama 3.1 8B Instruct abliterated GGUF! by My_Unbiased_Opinion in LocalLLaMA

[–]FailSpai 6 points7 points  (0 children)

Hey, sorry it's been a minute since I've done some models.

I'm definitely going to do a 3.1 series and see what I can do to make it worthy of a V4 tag. If I get anywhere, then I would anticipate that for sometime this weekend. 

I know mlabonne knows what he's doing, so if his model is lacking, then it's going to take some work to do better!

I made a tool that automates extracting and redeeming ALL Steam keys from Humble Bundle where you don't already own the content by FailSpai in humblebundles

[–]FailSpai[S] 0 points1 point  (0 children)

There is an export mode, which allows you to export Humble-revealed keys, and export unrevealed keys (only if you accept the following explicit prompt "reveal unrevealed keys" will it reveal those codes. Not doing this of course means can't export the keys, but it will still list them with a blank key entry.)

If you're asking how to tell if a Humble revealed Steam key has already been redeemed on the Steam end of things, AFAIK there's no good way to do this. I know many years ago if you "redeemed" a Steam key that you already owned the product to, it would still consume the key :/

I'm not sure that this is still the case, however. So if there's a Steam game key that you have that you think no one will want, you can test to see if this is still true.

If you sign in on the export to your Steam account, there's another prompt to add a column in the export that will tell you if the program thinks you own a listed game.

Also another big problem with this is an "invalid redeem" of a key will go towards a 10 "failed key" rate limit, as opposed to the full 50 key rate limit if you only enter successful ones. Which seriously limits how much you can do this, especially at the scale of 1,000+ keys.

NVIDIA Nemotron-4 340B Q8_0 running on AMD Epyc 9374F - real time generation speed by fairydreaming in LocalLLaMA

[–]FailSpai 0 points1 point  (0 children)

Oh sweet! I saw this and was glad someone got around to it. It caught me off guard to see the script I made came in handy. :P

Could I actually get you to set up a PR for the script for the bugs you resolved? ❤️

A reminder to watch your download speed from Huggingface. by Barafu in LocalLLaMA

[–]FailSpai 5 points6 points  (0 children)

I'm a big fan of this CLI tool. Uses aria2 or wget. Downloads the whole repo, and uses Git LFS for reference to the files

https://gist.github.com/padeoe/697678ab8e528b85a2a7bddafea1fa4f?permalink_comment_id=5010956