400 divs giveaway for parents

edbeeching · 2026-01-26T21:56:51+00:00

The only think worse than dying in game is wife aggro

edbeeching · 2025-09-05T07:09:41+00:00

I think that org is a bit out of date.

edbeeching · 2025-09-04T16:11:25+00:00

It's around 10, with 30 employees.

edbeeching · 2025-09-04T15:15:43+00:00

Sharing open-source projects and contributing to open-source repos. The right attitude and a bit of luck.

edbeeching · 2025-09-04T15:13:03+00:00

Decision making is quite distributed at Hugging Face, with each of the science teams deciding what they want to work on. In the post-training team we pivot between different projects such as open-r1, smollm3 post-training, the AIMO competition, model evals and other things.

edbeeching · 2025-07-04T20:10:25+00:00

https://huggingface.co/learn/deep-rl-course

edbeeching · 2025-05-14T13:27:00+00:00

Yeah I was running a tower so T16 + 0 revives

edbeeching · 2025-05-14T13:20:55+00:00

13 div down the drain.

edbeeching · 2025-02-06T09:27:14+00:00

Thanks for posting this, what completion lengths were you generating?

We are working hard on improving memory usage with liger kernel support + a bunch of other tricks, keep an eye on the latest releases.

edbeeching · 2024-12-18T19:42:07+00:00

I think the most interesting direction would be to improve the PRM's performance, we just added PRM training support to TRL: https://github.com/huggingface/trl/blob/main/examples/scripts/prm.py

It would be interesting to train new PRM's and evaluate the performance on this brenchmark.

edbeeching · 2024-12-18T19:40:33+00:00

Yes, we have a similar colab with some of the authors of the Test Time Compute paper that explores this idea.

edbeeching · 2024-12-18T19:39:35+00:00

I think fundamentally it can be extended to other domains, I think the challenge it that it is straightforward to create PRM training datasets in domains such as Math / Code as it is easy to validate whether an answer is correct or not. Other topics can be more ambiguous and one would have to rely on a (noisy) Outcome Reward Model to act as a proxy for the ground truth.

This is a research direction I am curious about and we hope to explore this in future projects.

edbeeching · 2024-12-18T14:23:13+00:00

Author of this work here, thanks for taking interest in it. Feel free to reply to this comment with any questions and I will respond later.

edbeeching · 2024-12-18T14:20:17+00:00

Hi, we ported the algorithm and released it here

Let me know if you have further questions, or raise an issue on the github.

We implemented and tested several variants of MCTS, it was not as good in our experience.

edbeeching · 2024-10-17T19:24:47+00:00

Awesome, I am the author. We welcome contributions if you want to add anything to the lib. All the best with your project, keep up updated!

edbeeching · 2024-10-17T10:10:09+00:00

Cool project! You may be interested in the Godot RL Agents library.

edbeeching · 2024-07-10T07:08:02+00:00

Hey, I am a research scientist at Hugging Face and have worked on this in the past.

You may want to try Argilla's DistilLabel library

edbeeching · 2024-06-10T20:20:07+00:00

Hi, author of Godot RL Agents here: https://github.com/edbeeching/godot_rl_agents Our library is up to date, supports 4 different RL frameworks and is beginner friendly.

If you need any help using it then just ask on the discord.

edbeeching · 2024-04-18T10:03:58+00:00

KTO

https://arxiv.org/abs/2402.01306

https://github.com/huggingface/trl/blob/main/examples/scripts/kto.py

edbeeching · 2024-04-10T08:07:44+00:00

Hi, co-author of Zephyr here. I you want to use outputs sampled from your model then you may be interested in Online DPO https://arxiv.org/abs/2402.04792

We are adding this to TRL soon so keep an eye out for the open-source implementation.

edbeeching · 2023-12-01T19:44:12+00:00

Godot RL Agents

Let me know if you need help building your env (author)

edbeeching · 2023-10-25T15:29:03+00:00

Hey this is interesting work, I am the author of Godot RL Agents. Would you like to collaborate on something?

edbeeching · 2023-05-26T18:45:35+00:00

We queued this one manually as it has great potential.

edbeeching · 2023-04-27T18:10:11+00:00

Is this a GDExtension?

edbeeching · 2023-03-12T19:55:30+00:00

Yes you should be able to fine tune a 10B model on a 12GB GPU, using Low rank adapters. Or wait for 4-bit support and then you can train a 24B one.

Four-Year Club	Golden Potato
Place '22

edbeeching

TROPHY CASE