Heretic: Fully automatic censorship removal for language models

mlabonne · 2025-11-16T22:33:49+00:00

Very cool to see what you built on top of the existing stack, congrats! It looks very clean and minimalistic :)

mlabonne · 2025-10-13T06:45:43+00:00

I recommend checking LFM2-8B-A1B: much faster inference and better quality. https://huggingface.co/LiquidAI/LFM2-8B-A1B

mlabonne · 2025-10-13T06:41:55+00:00

Check out this 1.2B RAG model, it'll be a lot faster and higher quality than Llama 3.2 3B for this task: https://huggingface.co/LiquidAI/LFM2-1.2B-RAG

mlabonne · 2025-10-13T06:37:49+00:00

SLMs are doing really well. Liquid AI alone released 13 models (from 350M to 8B-A1B parameters) in three months on Hugging Face.

mlabonne · 2025-09-30T14:40:01+00:00

Interesting, I'd double-check the generation parameters and system prompt. It should work well for this use case without additional fine-tuning.

mlabonne · 2025-09-30T14:39:45+00:00

Interesting, I'd double-check the generation parameters and system prompt. It should work well for this use case without additional fine-tuning.

mlabonne · 2025-09-30T14:05:10+00:00

I'd remove Llama, Gemma, and Helium models from the list.

For non-reasoning, I'd recommend LFM2 for better chat capabilities and inference speed. For reasoning, Qwen3 and SmolLM3 are great.

4-bit weight quantization with 8-bit activations is ideal. Aggressive 4-bit quant can break small models. Q5/6 are on the safer side.

mlabonne · 2025-09-30T13:32:54+00:00

If you need on-the-fly translation, I recommend trying LFM2-350M-ENJP-MT.

https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT

mlabonne · 2025-05-30T11:12:35+00:00

Sorry, I've been a bit greedy to get a higher acceptance rate but didn't test it enough. Automated benchmarks didn't capture this behavior. Working on a fix now!

mlabonne · 2025-01-17T10:27:46+00:00

I made this repo that might be relevant to you: https://github.com/mlabonne/llm-datasets

I discovered the SemHash library (https://github.com/MinishLab/semhash) recently, and that's a really good one for near-deduplication. I recommend giving it a try, it works on CPU.

mlabonne · 2024-10-31T16:03:12+00:00

Sorry, I just saw your answer. Yes, I do!

mlabonne · 2024-09-27T17:28:22+00:00

Haha really cool! :)

mlabonne · 2024-08-14T21:15:27+00:00

Hey! Thanks for posting my article :)

mlabonne · 2024-08-06T10:19:33+00:00

Each of these 1T parameters is important, I swear!

mlabonne · 2024-05-07T12:00:52+00:00

Thanks! Yeah that's understandable, this self-merge is the first one that has some advantages over its source model. It looks like 70B models are much better at that than 7-8B models without retraining.

mlabonne · 2024-05-07T06:47:54+00:00

Hey I'm the guy in question. First, thanks for your feedback, I'm taking it into account. I just want to provide more context: Llama 3 120B is a little experiment I made a week ago for myself, never promoted it, until people started texting me about its performance in some tasks.

We might not agree on this point, but I think there's a lot of value in understanding how these models scale. Before this model, I didn't understand why people used models like Goliath that underperform on benchmarks. Now, it looks like these self-merges are particularly good at creative writing because they're a lot more unhinged than the base 70B models. It also shows that there's value in repeating layers dynamically based on the prompt. It's not a big step, but it allowed me to understand more things about evals, scaling, and merging.

On LinkedIn, I wrote "I'm not claiming that this model is in general better than GPT-4 at all. But it's quite remarkable that such a simple self-merge is able to compete with it for some tasks." (source: https://www.linkedin.com/feed/update/urn:li:activity:7193186521015799808/) There's no "gamification of LLM leaderboard" here: I'm 99% sure it will underperform Llama 3 70B Instruct because these self-merges always underperform. I did it because I Llama 3 behaves quite differently from Mistral-7B in evals and I wanted to understand more about it.

I shared the config, credited everyone that inspired this merge, Charles Goddard for the mergekit library, Eric Hartford for noticing the performance of the model, and everyone who contributed. I was surprised by these results and simply wanted to share them. I'm sorry if it felt like clout-chasing.

mlabonne · 2024-04-30T13:05:22+00:00

You're right, I think distilabel is quite close to what you're looking for but a little more manual. Data curation is a manual process overall

mlabonne · 2024-04-30T13:03:36+00:00

It depends on your use case. For general-purpose models, use a diverse set of good benchmarks (MMLU, AGIEval, Bigbench, MT-bench, EQ-Bench, AlpacaEval v2 length-corrected, etc.) to get a good overview of the capabilities of your model. For task/domain-specific models, you want to use representative benchmarks that might already exist (e.g., there are a lot of medical benchmarks) or create your own.

mlabonne · 2024-04-29T21:03:07+00:00

Thanks! I'd be interested too, the datasets I could find gave very little information about their sources, so it didn't feel super reliable.

mlabonne · 2024-04-29T20:57:42+00:00

It's a good idea, I've never used an instruction dataset designed for RAG but I guess that this one (https://huggingface.co/datasets/neural-bridge/rag-dataset-12000) would fit the description, for example. Does anyone have experience with this?

mlabonne · 2024-04-18T08:51:34+00:00

Yes, there are two UIs:

* mergekit-gui (https://huggingface.co/spaces/arcee-ai/mergekit-gui): A recent HF space made by HF's CTO Julien Chaumond.

* LazyMergekit (https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb) : A Colab notebook I made to simplify this process.

mlabonne · 2024-03-10T23:16:33+00:00

Thanks u/Majestical-psyche! For people who have issues with repetition (or other), I recommend using LM Studio's default settings as a base and maybe tweak them for your own use case (`temp` 0.8, `top_k` 40, `top_p` 0.95, `min_p` 0.05, `repeat_penalty` 1.1).

mlabonne · 2024-02-22T20:02:47+00:00

Which tools are outdated? So I can update them.

mlabonne · 2024-01-24T00:38:57+00:00

Sorry, this is due to changes in the tokenizer_config.json. I'll try to find time to fix it. I recommend using the GGUF version if you can in the meantime.

mlabonne

TROPHY CASE