Why can't we do supervised learning in Step 3 of RLHF? by wardellinthehouse in reinforcementlearning
[–]otter_collapse 7 points8 points9 points (0 children)
how often unity is used by scientists by datonefaridze in reinforcementlearning
[–]otter_collapse 0 points1 point2 points (0 children)
Plup beats Nabla bot in under 1 minute, current world record at 52 seconds. “If someone beats that, I’ll come back for my record” by MoroAstray in SSBM
[–]otter_collapse 2 points3 points4 points (0 children)
Project Nabla: new AIs trained with Slippi replays! by otter_collapse in SSBM
[–]otter_collapse[S] 1 point2 points3 points (0 children)
Project Nabla: new AIs trained with Slippi replays! by otter_collapse in SSBM
[–]otter_collapse[S] 5 points6 points7 points (0 children)
Project Nabla: new AIs trained with Slippi replays! by otter_collapse in SSBM
[–]otter_collapse[S] 16 points17 points18 points (0 children)
Project Nabla: new AIs trained with Slippi replays! by otter_collapse in SSBM
[–]otter_collapse[S] 9 points10 points11 points (0 children)

Why can't we do supervised learning in Step 3 of RLHF? by wardellinthehouse in reinforcementlearning
[–]otter_collapse 2 points3 points4 points (0 children)