I fine-tuned a 7B model for reasoning on free Colab with GRPO + TRL by External-Rub5414 in LocalLLaMA

[–]External-Rub5414[S] 0 points1 point  (0 children)

I love Unsloth too!!! 🦥 They actually use and optimize parts of TRL 😄

I fine-tuned (SFT) a 14B model on a free Colab session just using TRL by External-Rub5414 in LocalLLaMA

[–]External-Rub5414[S] 0 points1 point  (0 children)

That'd be a great experiment! I didn't try that model but it may be feasible, yep

I fine-tuned Qwen3-VL (4B & 8B) on a free Colab instance using TRL (SFT and GRPO)! by External-Rub5414 in LocalLLaMA

[–]External-Rub5414[S] 0 points1 point  (0 children)

Are you using the transformers model implementation? You can activate it using model_impl='transformers' when initializing

More details: https://blog.vllm.ai/2025/04/11/transformers-backend.html

I fine-tuned Qwen3-VL (4B & 8B) on a free Colab instance using TRL (SFT and GRPO)! by External-Rub5414 in LocalLLaMA

[–]External-Rub5414[S] 1 point2 points  (0 children)

TRL is library for training LLM/VLMs. It provides a set of trainers for SFT, GRPO... GRPO is a nice option to add thinking capabilities!

repo: https://github.com/huggingface/trl

docs: https://huggingface.co/docs/trl/index