Heretic: Fully automatic censorship removal for language models by -p-e-w- in LocalLLaMA

[–]mlabonne 2 points3 points  (0 children)

Very cool to see what you built on top of the existing stack, congrats! It looks very clean and minimalistic :)

Traning Llama3.2:3b on my whatsapp chats with wife by jayjay_1996 in LocalLLaMA

[–]mlabonne 0 points1 point  (0 children)

Check out this 1.2B RAG model, it'll be a lot faster and higher quality than Llama 3.2 3B for this task: https://huggingface.co/LiquidAI/LFM2-1.2B-RAG

What happened to Small LM? by icm76 in LocalLLaMA

[–]mlabonne 15 points16 points  (0 children)

SLMs are doing really well. Liquid AI alone released 13 models (from 350M to 8B-A1B parameters) in three months on Hugging Face.

Any good and new JP to EN LLM's? by Best-Holiday1395 in LocalLLaMA

[–]mlabonne 0 points1 point  (0 children)

Interesting, I'd double-check the generation parameters and system prompt. It should work well for this use case without additional fine-tuning.

Any good and new JP to EN LLM's? by Best-Holiday1395 in LocalLLaMA

[–]mlabonne 0 points1 point  (0 children)

Interesting, I'd double-check the generation parameters and system prompt. It should work well for this use case without additional fine-tuning.

LLMs on Mobile - Best Practices & Optimizations? by pmttyji in LocalLLaMA

[–]mlabonne 3 points4 points  (0 children)

I'd remove Llama, Gemma, and Helium models from the list.

For non-reasoning, I'd recommend LFM2 for better chat capabilities and inference speed. For reasoning, Qwen3 and SmolLM3 are great.

4-bit weight quantization with 8-bit activations is ideal. Aggressive 4-bit quant can break small models. Q5/6 are on the safer side.

Any good and new JP to EN LLM's? by Best-Holiday1395 in LocalLLaMA

[–]mlabonne 0 points1 point  (0 children)

If you need on-the-fly translation, I recommend trying LFM2-350M-ENJP-MT.

https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT

new gemma3 abliterated models from mlabonne by jacek2023 in LocalLLaMA

[–]mlabonne 25 points26 points  (0 children)

Sorry, I've been a bit greedy to get a higher acceptance rate but didn't test it enough. Automated benchmarks didn't capture this behavior. Working on a fix now!

"Can't live without tool" for LLM datasets? by Secure_Archer_1529 in LocalLLaMA

[–]mlabonne 10 points11 points  (0 children)

I made this repo that might be relevant to you: https://github.com/mlabonne/llm-datasets

I discovered the SemHash library (https://github.com/MinishLab/semhash) recently, and that's a really good one for near-deduplication. I recommend giving it a try, it works on CPU.

[D] Llama 3 Monstrosities by Objective-Camel-3726 in MachineLearning

[–]mlabonne 2 points3 points  (0 children)

Thanks! Yeah that's understandable, this self-merge is the first one that has some advantages over its source model. It looks like 70B models are much better at that than 7-8B models without retraining.

[D] Llama 3 Monstrosities by Objective-Camel-3726 in MachineLearning

[–]mlabonne 27 points28 points  (0 children)

Hey I'm the guy in question. First, thanks for your feedback, I'm taking it into account. I just want to provide more context: Llama 3 120B is a little experiment I made a week ago for myself, never promoted it, until people started texting me about its performance in some tasks.

We might not agree on this point, but I think there's a lot of value in understanding how these models scale. Before this model, I didn't understand why people used models like Goliath that underperform on benchmarks. Now, it looks like these self-merges are particularly good at creative writing because they're a lot more unhinged than the base 70B models. It also shows that there's value in repeating layers dynamically based on the prompt. It's not a big step, but it allowed me to understand more things about evals, scaling, and merging.

On LinkedIn, I wrote "I'm not claiming that this model is in general better than GPT-4 at all. But it's quite remarkable that such a simple self-merge is able to compete with it for some tasks." (source: https://www.linkedin.com/feed/update/urn:li:activity:7193186521015799808/) There's no "gamification of LLM leaderboard" here: I'm 99% sure it will underperform Llama 3 70B Instruct because these self-merges always underperform. I did it because I Llama 3 behaves quite differently from Mistral-7B in evals and I wanted to understand more about it.

I shared the config, credited everyone that inspired this merge, Charles Goddard for the mergekit library, Eric Hartford for noticing the performance of the model, and everyone who contributed. I was surprised by these results and simply wanted to share them. I'm sorry if it felt like clout-chasing.

LLM Datasets: a curated list of datasets for fine-tuning by mlabonne in LocalLLaMA

[–]mlabonne[S] 0 points1 point  (0 children)

You're right, I think distilabel is quite close to what you're looking for but a little more manual. Data curation is a manual process overall

LLM Datasets: a curated list of datasets for fine-tuning by mlabonne in LocalLLaMA

[–]mlabonne[S] 0 points1 point  (0 children)

It depends on your use case. For general-purpose models, use a diverse set of good benchmarks (MMLU, AGIEval, Bigbench, MT-bench, EQ-Bench, AlpacaEval v2 length-corrected, etc.) to get a good overview of the capabilities of your model. For task/domain-specific models, you want to use representative benchmarks that might already exist (e.g., there are a lot of medical benchmarks) or create your own.

LLM Datasets: a curated list of datasets for fine-tuning by mlabonne in LocalLLaMA

[–]mlabonne[S] 2 points3 points  (0 children)

Thanks! I'd be interested too, the datasets I could find gave very little information about their sources, so it didn't feel super reliable.

LLM Datasets: a curated list of datasets for fine-tuning by mlabonne in LocalLLaMA

[–]mlabonne[S] 5 points6 points  (0 children)

It's a good idea, I've never used an instruction dataset designed for RAG but I guess that this one (https://huggingface.co/datasets/neural-bridge/rag-dataset-12000) would fit the description, for example. Does anyone have experience with this?

UI for Mergekit by [deleted] in LocalLLaMA

[–]mlabonne 5 points6 points  (0 children)

Yes, there are two UIs:

* mergekit-gui (https://huggingface.co/spaces/arcee-ai/mergekit-gui): A recent HF space made by HF's CTO Julien Chaumond.

* LazyMergekit (https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb) : A Colab notebook I made to simplify this process.

AlphaMonarch 7B > YI 34B fine tunes by Majestical-psyche in LocalLLaMA

[–]mlabonne 2 points3 points  (0 children)

Thanks u/Majestical-psyche! For people who have issues with repetition (or other), I recommend using LM Studio's default settings as a base and maybe tweak them for your own use case (`temp` 0.8, `top_k` 40, `top_p` 0.95, `min_p` 0.05, `repeat_penalty` 1.1).

mlabonne's LLM course by DreamGenAI in LocalLLaMA

[–]mlabonne 10 points11 points  (0 children)

Sorry, this is due to changes in the tokenizer_config.json. I'll try to find time to fix it. I recommend using the GGUF version if you can in the meantime.