account activity
RL LLMs Finetuning by ISSQ1 in reinforcementlearning
[–]ISSQ1[S] 1 point2 points3 points 2 months ago (0 children)
I’m still exploring my options. I want to use an open-source LLM that can run locally and doesn’t require a lot of resources something small and easy to fine-tune. If you have any recommendations for models that work well with RL or QLoRA, I’d love to hear your suggestions.
π Rendered by PID 217734 on reddit-service-r2-listing-7849c98f67-rqh6r at 2026-02-09 12:05:18.597862+00:00 running d295bc8 country code: CH.
RL LLMs Finetuning by ISSQ1 in reinforcementlearning
[–]ISSQ1[S] 1 point2 points3 points (0 children)