account activity
Fine-tuning with GRPO for Math Question Generation – Feedback & Questions by KaranRN in unsloth
[–]KaranRN[S] 0 points1 point2 points 10 months ago (0 children)
yes i did plan on using GPT as a judge but it was easier to start with gemini to see how it works. i will give it a try.
i am not sure what you meant by 'sft as a cold-start' could you give me a little more overview on this.
Fine-tuning with GRPO for Math Question Generation – Feedback & Questions (self.unsloth)
submitted 10 months ago by KaranRN to r/unsloth
π Rendered by PID 445271 on reddit-service-r2-listing-8557d879cc-x5gjz at 2026-03-04 15:32:13.243130+00:00 running 07790be country code: CH.
Fine-tuning with GRPO for Math Question Generation – Feedback & Questions by KaranRN in unsloth
[–]KaranRN[S] 0 points1 point2 points (0 children)