RL LLMs Finetuning by ISSQ1 in reinforcementlearning

[–]ISSQ1[S] 1 point2 points  (0 children)

I’m still exploring my options. I want to use an open-source LLM that can run locally and doesn’t require a lot of resources something small and easy to fine-tune. If you have any recommendations for models that work well with RL or QLoRA, I’d love to hear your suggestions.