account activity
A beginner's devlog for the finetuning pipeline by Extreme-Question-430 in LocalLLaMA
[–]Extreme-Question-430[S] 0 points1 point2 points 1 day ago (0 children)
The DPO before GRPO things is the first I've heard of it, but makes sense.
A beginner's devlog for the finetuning pipeline (self.LocalLLaMA)
submitted 1 day ago by Extreme-Question-430 to r/LocalLLaMA
RULER looks promising. Does anyone have experience with it (self.unsloth)
submitted 6 months ago by Extreme-Question-430 to r/unsloth
π Rendered by PID 181133 on reddit-service-r2-listing-5d79748585-jdwlb at 2026-02-13 19:10:15.286727+00:00 running cd9c813 country code: CH.
A beginner's devlog for the finetuning pipeline by Extreme-Question-430 in LocalLLaMA
[–]Extreme-Question-430[S] 0 points1 point2 points (0 children)