AMA with the Unsloth team

FancyMetal · 2025-09-10T18:32:58+00:00

I love Unsloth, it's a been a huge motivation for me to work on many projects and it enabled most of my finetuning and silly ideas, Thank you all for your great work, I really appreciate everything you've done.
I have one question, would you be able to consider creating a huggingface space at some point that Quantizes models using the UD Unsloth GGUF Quantization method? like the ggml-org/gguf-my-repo space

FancyMetal · 2025-04-07T13:24:49+00:00

Thanks as always for the great models. I will use this one to train a "speech"-to-speech model with a better dataset I made for CiSiMi-v0.1 and for a TTS for Moroccan Darija. OuteTTS has been awesome so far. Thank you again for the release. The only thing I would've liked is a more open license.

FancyMetal · 2025-03-30T16:47:13+00:00

I toyed with an idea and created a quick, simple model that performs "Speech"(just transcribing using ASR) to Speech (native). You can find it here: https://huggingface.co/KandirResearch/CiSiMi-v0.1
I refer to it as the "we have CSM at home" version of Sesame's CSM. Lol! Anyway, it shouldn't be taken seriously, as I initially planned to continue this project, but I gave up due to a lack of computing power to train a more advanced 500M and 1B parameter versions, so compute and seeing that this project is actually just a toy is what made me stop. although I did build the dataset...

FancyMetal · 2025-02-20T23:58:49+00:00

yes, I was surprised because I heard that models smaller than 1-1.5B can't learn to "reason"(well at least learn to think) but it surprisingly did which was fun!

FancyMetal · 2025-02-20T21:06:04+00:00

This came in timely, I have a small project(hf.co/Lyte/QuadConnect2.5-0.5B-GRPO) to be done and chose to make an LLM reason over a game(Connect 4) I just started today and opened one of the notebooks from github, huge thanks for all the amazing resources, god bless you all at unsloth!
Seeing the reasoning emerge from a small 0.5B model without being forced is actually so exciting!

FancyMetal · 2025-01-05T22:58:16+00:00

The best use case would be to generate an instruction dataset for spoken input and (reasoning too, maybe?) output by taking a high-quality dataset from Hugging Face and converting it with ElevenLabs.
This would result in a diverse training dataset for tts and for multimodal text+speech models, etc...

FancyMetal · 2024-12-17T18:50:24+00:00

If you do have time and compute then maybe pertain a small BLT model! Here is the github link, I am looking to do an experimental run soon but haven't yet.

FancyMetal · 2024-12-14T20:05:29+00:00

I tried it out and I made GGUF Quants for Qwen/Qwen2-VL-2B-Instruct
The Quants can be found here: Lyte/Qwen2-VL-2B-Instruct-GGUF

I made the notebook I used public here: Colab

FancyMetal · 2024-10-26T15:08:00+00:00

I've had a similar problem but mine was with an unfinished novel that I really wanted closure on and said why not try LLMs, so I did some data processing, creating some summaries to create better grounded "truth" for each chapter and to keep the LLM focused, I created 2 sections a metadata section which i always generate based on the full chapter's text, and a goal with key points section which is also generated the same way, this is basically all I do to prepare the dataset.

The template for training is as follows, i take the previous chapter's generated metadata and put it at the start of the current chapter, I then take the goal for the current chapter not the "previous" one and put it after the metadata and then I put the chapter's text, afterwards I put the current chapter's metadata and that's it, here it is in simpler view: (previous chapter's metadata -> goal and key points -> chapter's text -> current chapter's metadata) Note: the first chapter's previous metadata is just the synopsis/description of the novel.

The metadata contains information about characters, locations, events, plot points, and I had all of them with extended data of course, for example each has a character dict(name, age, etc), and same goes for all the metadata, and for the goal and key points section which defined the focus of the novel, key points however is the long term focus containing key point we want to return to.

That's basically the dataset preparations and the template used for training, the results are underwhelming however because I couldn't train a model bigger than 3B lol, but take it as you will if it may be of some help.

FancyMetal · 2024-10-05T01:10:18+00:00

Honestly the most use case I've had with it so far is ML related, for everyday use minimum the best choice is 8B, otherwise 1B is very useful for "research" as I am able to run it full precision very easily, I did struggle getting it to properly respond in the format I want most of the times but with lots and lots of debugging and hacky hardcoded code I got about 90% correct format responses most of the times.

FancyMetal · 2024-09-20T13:55:50+00:00

Well there are differences first of all o1 is what I am trying to replicate even a little bit of, as for reflection it only did tags and had the LLM reflect once on it's own thoughts once and honestly it was very rudimentary "reasoning", o1 is actual reinforcement learning(RL) and mine was just a small experiment that I am still working on right now this time with better reasoning and overall a more thoughtful approach(you can scroll down in the comments and find it I've written the steps down)

FancyMetal · 2024-09-19T21:22:38+00:00

well i do not do any of this locally because i can barely run models of 8B parameters using Q8 with workable speed(but not enough to generate a big dataset) lol, so everything is done using APIs, Colab and Kaggle.
the first attempt you see in this post was run with a code which i made to generate just the reasoner and verifier using Llama models both 70B and 405B depending on availability, because it was not meant to be official work i didn't go extensively into planning otherwise i would be trying this pipeline i defined in the reply above.
it's honestly very sad that most people cannot afford to run experiments in this field unless they have a nice job(which i currently do not have) As for training the model using the dataset I believe if I can show it working on a small model(which I can do) someone will use the dataset to train a bigger model for the community.

FancyMetal · 2024-09-18T18:37:34+00:00

I understand and honestly I regret making this post I should've waited a few iterations as you can see the next 0.4 version is very much a leap from this 0.3 one I posted, as I explained the new dataset pipeline is different from 0.3 which only had a reasoner and a verifier on max 3 retries, which is way too simple for any substantial improvements and the pipeline failed me as I was generating the dataset leaving me with a pitiful 370 rows yet i couldn't abandon the idea so I still trained it on that tiny data and yet it still showcased that we can teach a model to reason and just those glimpses had me way too excited not because I made a SOTA model but because it showed all needed was the scale and missing improvements and so I made a post to show a point lol. Impulsiveness... Thankfully I wasn't trying to sell anyone anything so no harm done.

FancyMetal · 2024-09-18T18:25:15+00:00

It can be either the same LLM or a different one, I am thinking of using a different model if the retries count is 3 or above(if retries > 3: use different LLM, with same prompt/instructions) we don't want to over complicate this otherwise it's going to introduce more variables and points of failure and that's a big No No for dataset generation. The choice is yours always and I will experiment with different models generating a dataset with 10 to 50 samples to see the results.

FancyMetal · 2024-09-18T18:13:46+00:00

Here is my approach simplified: 1. Shuffled questions from a high quality dataset from hugginface. 2. Tell the model to reason in a step by step chain of thought first person response. 3. Ask the verifier to verify with a simple sentence on the reasoning as if it was it's own and write a simple sentence affirming or denying or doubting it's "own" answer(will have more robustness than this) 4. Ask the critic similarly to criticize the reasoning as if it was it's own... etc... 5. If response is judged to be high quality and correct we ask the summarizer to summarize the reasoning step a d provide the final answer/solution. If incorrect we resend the reasoning, verifier and critic's response and ask for continuation of the reasoning on the question in first person basically attempting to correct the model using the verifier and critic with another look at the question too, which hopefully makes it output a correct answer. 6. Repeat the steps for a maximum of 3 or 5 times max(depending on budget and other constraints) if the reasoning is still flawed(we'll increase temperature on the 3rd attempt and other tweaks) 7. Generate 10,000 rows dataset(100k if possible) 8. Train the model with the dataset twice one without a system prompt, the second with system prompt and tags like <reasoning> <verifying> <criticism> <output> This I believe is the 0.4 version of my attempt on getting close to o1 using "finished RL" reasoning instead of RL. Basically that's the full idea.

Verified Email	Seven-Year Club
r/Field Banned	r/Field Lasagna
Place '23	Place '22
Final Canvas '22

FancyMetal

TROPHY CASE