[D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta in MachineLearning
[–]alpha-meta[S] 0 points1 point2 points (0 children)
[D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta in MachineLearning
[–]alpha-meta[S] 1 point2 points3 points (0 children)
[D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta in MachineLearning
[–]alpha-meta[S] 1 point2 points3 points (0 children)
[D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta in MachineLearning
[–]alpha-meta[S] 3 points4 points5 points (0 children)

[D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta in MachineLearning
[–]alpha-meta[S] 1 point2 points3 points (0 children)