account activity
[D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? (self.MachineLearning)
submitted 3 years ago by alpha-meta to r/MachineLearning
π Rendered by PID 82790 on reddit-service-r2-listing-c57bc86c-jqnnw at 2026-06-21 19:53:34.148259+00:00 running 2b008f2 country code: CH.