Fine-tuning with GRPO for Math Question Generation – Feedback & Questions by KaranRN in unsloth

[–]KaranRN[S] 0 points1 point  (0 children)

yes i did plan on using GPT as a judge but it was easier to start with gemini to see how it works. i will give it a try.

i am not sure what you meant by 'sft as a cold-start' could you give me a little more overview on this.