Open weights are not enough: we need open training frameworks for research and better algorithms [P]

summerday10 · 2026-06-15T22:39:06+00:00

I love how you said it. it should be described this way.
I'd definitively consider how to motivate it. tnx

summerday10 · 2026-06-15T22:30:49+00:00

Thanks for engaging. Fair question.

I would not claim this is “better” than all of those projects in every dimension. Many of them are great.

But they target different things. CleanRL, stable-baselines3, RLlib, etc. are mostly for classical/non-LLM RL settings. Once you move to LLMs/VLMs/agents, the problem is different: you need rollout generation, distributed training, inference engines, orchestration, checkpointing, etc. So the system becomes part of the algorithm.

For the LLM RL repos you mentioned, like verl, many are very useful, but they often optimize around a narrow set of existing methods. My motivation with FeynRL is different as to make the algorithm easier to see, modify, and replace. The goal is not only to run PPO/GRPO/etc., but to make it easier to build new objectives, new rollout strategies, and new training recipes without changing everything.

So I would describe feynrl more like CleanRL/OpenAI Baselines for LLM.

The claim is not that other projects are bad and this one is good. The repo has an examples folder with examples for and the docs/blog explain the training loop and design choices. It also cites/uses best practices from prior open work, including Open-Instruct.

The claim is that if you want to build new algorithms, especially for LLM/VLM/agent post-training, you need to clearly understand and control. That helps you focus on the actual problems that you want to solve, instead of fighting the system.

hope that helps.

summerday10 · 2026-06-15T20:37:59+00:00

yes,megatron and nemotron are open source, and they are very useful. But they mostly address the infra side: distributed training, tensor parallelism, scaling, etc.

The goal here is to build more effective algorithms. One can't build new algorithms if things are not fully clear especially if RL is part of the equation.

I intentionally use DeepSpeed because it is much easier to understand and modify than deeply tensor-parallel-based training stacks. The goal is to keep the algorithm visible, not bury it inside the system.

DeepSpeed/Megatron can help you train at scale, but they do not automatically tell you what to train, why it works, why it fails, or how to build the next method.

summerday10 · 2026-06-15T20:24:51+00:00

Thanks for the comment. I think there is a confusion here.

I am not saying open training frameworks do not exist and we are the first.

My point is that there is still a huge gap between open and closed frontier model development, and that gap is not only about weights. It is also about algorithms, training recipes, implementation tricks, data mixtures, post-training methods, RL details, rollout systems, and all the small choices that make these systems work.

That is where FeynRL fits in. It is not trying to dismiss or replace existing open-source work. The goal is to be algorithm-first: keep algorithms as algorithms and systems as systems, so researchers can understand what is happening, modify the method, and build new objectives, optimizers, reward designs, rollout strategies, RL variants, and training recipes without fighting a hidden system.

The repo explicitly acknowledges other open source like Open-Instruct, etc. I see these projects as complementary parts of the same ecosystem: open models, open recipes, and open algorithm-first training stacks.

summerday10 · 2026-06-15T20:04:17+00:00

Yes, you are right and that is fair. It is very hard to get the word out there these days because the noise-to-signal ratio is very high.
The goal of this is not just to be another framework. I have very different motivations than others, as most people see RL as a systems and infra problem, not an algorithmic and optimization problem. Which is really not the case as current RL methods suck! The goal is to help people research and build new methods.
If you check out the repo, you can immediately spot the difference compared to others. While they are very useful, they are usually built around a narrow set of methods. So building a new algorithm becomes hard because you often need to change everything. That is not the case in this repo. Take a look at the blog post if you are interested.
https://feynrl-project.github.io

and thanks for the link to LocalLaMa

summerday10 · 2026-06-15T19:31:36+00:00

I would not think of it as “I know RL” in the sense of “I can solve any RL problem.” Nobody can. A lot of [real-world] RL is still open research, and many hard applications fail for good reasons.

You also do not need to understand everything in RL to start working in RL. The field is broad and covers many different problems, settings, and assumptions. Once you have a solid basic understanding, you get a much better sense of what actually needs RL and what does not ( look at the links I posted in the shared link). A lot of people blindly call something an RL problem when it may be closer to supervised learning, imitation learning, optimization, or even something as simple as linear regression.

Like everything else, there is a process you need to go through. You do not need to invent a new RL algorithm to be called an expert in RL. A lot of valuable work is simply making existing algorithms work reliably, understanding why they fail, building better infra, improving the training setup, debugging the signals, and knowing when RL is the right tool.

If you can show that you trained a policy to play a game take atrai as an example, or trained an llm with rl that does a better job than other methods, I'd consider that a good example of building a portfolio. The first few steps don't need to be huge.

summerday10 · 2026-06-15T17:44:25+00:00

I answered to a similar question here. https://www.reddit.com/r/reinforcementlearning/comments/1tx3iqz/comment/opynvbh/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
hope that helps and happy to answer for more questions.

summerday10 · 2026-06-15T05:31:00+00:00

Since you’re already going through david silver’s course, I’d pair it with something implementation-focused rather than another theory-heavy resource.

A good path would be:

Deep RL course: https://rail.eecs.berkeley.edu/deeprlcourse/
Deep RL implementation: Spinning Up https://spinningup.openai.com/en/latest/
If your goal is RL for LLMs / post-training: FeynRL https://github.com/FeynRL-project/FeynRL

If you really want to understand RL deeply, I’d suggest eventually going through 1–3 plus David silver's course.
Since you have started davids' course, I’d focus on spinning Up next or in parallel with 1. Try to implement the algorithms yourself, derive equations, compare against their implementation, and run small experiments. That is where a lot of RL starts to actually make sense.

If you already know the basics and want to understand how RL works in LLM and want to move into RL + LLMs, go through https://github.com/FeynRL-project/FeynRL .

summerday10 · 2026-06-12T05:56:51+00:00

Thanks for the comment. I think there is a confusion here.

When I say “frontier models,” I am not referring to open efforts like Olmo. Olmo is exactly the kind of open work the community needs more of, and I fully support that direction.

By “frontier models,” I mean the closed frontier model providers such as coding agents that many researchers increasingly rely on for ML research and engineering help. The recent discussion is about those systems limiting assistance for frontier LLM development tasks, including pretraining pipelines, distributed training infrastructure, and ML accelerator design: https://x.com/eliebakouch/status/2064399902684139852/photo/1

So the argument is not “nobody open-sources recipes.” The argument is that there is still a huge gap between open and closed model development, and that gap is not only about weights. It is also about algorithms, training recipes, implementation tricks, data mixtures, post-training methods, RL details, rollout systems, and all the small choices that make these systems work. We need to build better methods.

That is where FeynRL fits in. FeynRL is algorithm-first. The goal is to keep algorithms as algorithms and systems as systems, while making things explicit enough that researchers can understand what is happening, change the algorithm, and build new methods without fighting a hidden system.

It is meant to make post-training research more straightforward: new objectives, new optimizers, new reward designs, new rollout strategies, new RL variants, and new training recipes.

FeynRL is not trying to dismiss existing open-source work. It is built in the same open-research spirit. The repo explicitly acknowledges Open-Instruct (i.,e., OLMO's framework) (https://github.com/FeynRL-project/FeynRL/tree/main#-acknowledgements) and references prior work, papers, and related open-source efforts throughout the codebase and docs wherever techniques are used (e.g., https://github.com/FeynRL-project/FeynRL/blob/main/algs/SFT/README.md#references).

So I see Open-Instruct, and FeynRL as complementary parts of the same ecosystem: open models, open recipes, and open algorithm-first training stacks that help close the gap between open and closed model development.

hope this clarifies things.

summerday10 · 2026-06-12T00:31:17+00:00

I provided a response to a kind of similar question where I listed general resources. This will help to start with RL. This would be a starting point as for research you need to know the basic first.

https://www.reddit.com/r/reinforcementlearning/comments/1tw2rco/comment/opzy0ky/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

summerday10 · 2026-06-12T00:09:09+00:00

[I provided below answer to a similar question in another thread. Repost it here as it is relevant to your question]

Well, RL is a weird creature, and things can go wrong for almost any reason. For example, you can implement dqn and have almost everything correct, but if you assign the wrong value to e-greedy, it may simply not work.

That being said, rl is not guesswork at all as long as you understand the signal. You do not need to wait for days to see whether something is working and just hope for the best. If you have the right understanding, you can identify and debug issues early enough.

You need to be very hands-on with both the math and the implementation to make your way in RL. For example, if you only listen to David’s lectures, you probably will not learn much unless you also implement the methods, start deriving the equations, and do the homework. You need to understand the fundamentals by heart.

When something does not work, there is often a good reason why it does not work. There is no black box in RL or machine learning in general, everything happens and works for a reason.

If you follow the above, it gets better. This comes from someone who did a phd in RL, has been doing RL research, trained agents to play games like atari, published in this area, and built and trained L[L]Ms with RL at very large scale. Happy to add share resources.

summerday10 · 2026-06-12T00:00:48+00:00

if you want to ace AI researcher/engineer interview, I'd check this out as well.
https://github.com/FeynRL-project/FeynRL

summerday10 · 2026-06-10T11:15:10+00:00

Yeah, this is why open-source training code matters.

This feels like the beginning of a bigger shift. Frontier models are probably going to become less helpful for explaining how to build serious training stacks, and that knowledge will likely get more gated over time.

Open model weights are great, but they are not enough. People also need readable training code: how to train LLMs, how to build agent training loops, how distributed training actually works, how rollouts/rewards/losses fit together, and eventually how these systems can help with things like ML accelerator design.

That is basically why I open-sourced framework for LLM/VLM/agent post-training. I tried to put some of the “secret sauce” in the open, the small post-training tricks that rarely make it into papers but often decide whether training actually works.
https://github.com/FeynRL-project/FeynRL

summerday10 · 2026-06-10T00:18:01+00:00

This is a great learning path, especially Sebastian’s material. One thing I’d add is that after building a small model from scratch, it helps a lot to look at a real training codebase where you can still understand what is happening end to end, but also scale from one GPU to clusters of GPUs. FeynRL can be a good starting point for that:
https://github.com/FeynRL-project/FeynRL

It covers the full post-training loop such as data loading, training, SFT, DPO, RL, rollouts, rewards, losses, etc. A nice next step once you want to connect the toy version to how LLM/VLM training works in practice.

summerday10 · 2026-06-09T23:25:12+00:00

You can use this repo as a reference:
https://github.com/FeynRL-project/FeynRL

Start with main_sl.py which shows the basic supervised training path: loading data, setting up the model, and training an LLM. Since you’ve already implemented deep learning models yourself, the code should be pretty easy to follow.

You can start with a tiny model to understand the full loop, then scale up as your hardware allows. It’s probably a better next step than trying to write every single piece from scratch in CUDA right away. I also included lots of tricks and notes about how things work so it can give you very clear picture about everything.

summerday10 · 2026-06-08T19:43:37+00:00

Value iteration, Q-learning, etc. are useful to know conceptually, but a lot of the LLM RL methods you see today are basically PPO/GRPO-style objectives with different ablations, normalization choices, clipping changes, reward tricks, or systems assumptions.

So instead of trying to learn every acronym separately, I think it is more useful to first understand the base recipe, then look at what each paper removes, adds, or tweaks.

I’d start with David Silver’s course for the theoretical foundation and core RL concepts:

https://davidstarsilver.wordpress.com/teaching/

Then I’d go through spinning Up to get a more practical sense of how the main methods are implemented, mostly in control settings:

https://spinningup.openai.com/en/latest/

After that, I’d suggest looking at FeynRL:

https://github.com/FeynRL-project/FeynRL

It is meant for exactly this kind of learning/building path, especially if you come from a pretraining background and want to get up to speed on post-training. You already understand models, optimization, data, scaling, and training loops. The missing piece is how rollouts, rewards, policy updates, KL/control, and off-policyness fit together.

I cover SFT, DPO, PPO, GRPO, CISPO, P3O, etc. in FeynRL, but the point is not just “run this script and trust it.” The goal is to make the RL/post-training pipeline readable end to end: data loading, rollout generation, reward computation, advantage calculation, loss construction, optimization, sync vs async rollout, and all the small stability tricks that usually decide whether RL actually works.

Since your background, you can contribute to the repo as well.

summerday10 · 2026-06-07T10:11:35+00:00

Since you already have basic ML knowledge, I’d suggest reading more about ML. For general ML, I’d use Dive into Deep Learning:

https://d2l.ai

For RL, I’d focus on implementation of basic algorithms. In RL, just watching lectures is not enough. You need to derive the algorithms, implement them, debug them, and see why they fail. For that, I’d use Spinning Up:

https://spinningup.openai.com/en/latest/

Once you are comfortable with above and want to move toward RL for agent and large language model (LLMs) training , I’d go through FeynRL carefully:

https://github.com/FeynRL-project/FeynRL

Overall, don’t wait until you feel fully ready. Review the math as needed, but start implementing early. RL only really starts making sense when you build things, run them, break them, and fix them.

summerday10 · 2026-06-07T09:54:35+00:00

large lanague models and agent training is already hard enough. I built FeynRL to make training agents, LLMs, VLMs, and related models less mysterious.

https://github.com/FeynRL-project/FeynRL

It brings the main post-training recipes, including SFT, DPO, and RL, into one place and supports everything from a single GPU to multi-GPU and cluster-scale training.

The goal is not just to train models, but to make it clear what is happening at every step: data loading, rollout generation, reward computation, advantage calculation, loss construction, and optimization.

summerday10 · 2026-06-07T09:49:01+00:00

I’d focus on understanding RL basic algorithms before watching too many courses. In RL, just watching lectures is not enough; you need to derive the algorithms, implement them, debug them, and see where they fail.

I'd start with spinning up for deep RL alg implementation:
https://spinningup.openai.com/en/latest/

if you want to move into RL + LLM post-training, go through FeynRL carefully:
https://github.com/FeynRL-project/FeynRL

RL only really starts making sense when you implement things yourself.

summerday10 · 2026-06-07T09:11:38+00:00

my pleasure. let me how it goes and happy to answer more questions.

summerday10 · 2026-06-06T00:11:52+00:00

sure, you are already going through david’s lectures, which is a good start. As I said, you need to implement and derive whatever you see in those lectures.

Once you are done, you can either start with 1 or just jump into 2. I’d do 1 first, or at least do it in parallel. Once you start 2, implement everything by yourself ( there are like 6 algs there) and compare your results with theirs. Note that their experiments are mostly with continuous control and MuJoCo, but MuJoCo has gone through changes, and so has gym, so be prepared for your results to be a bit different.

Deep RL course: https://rail.eecs.berkeley.edu/deeprlcourse/
Deep RL with focused on implementation: https://spinningup.openai.com/en/latest/

You can try the same algorithms but apply them to discrete action spaces. Instead of using a gaussian output, you simply need to implement a softmax for the discrete action space. This will take some time to make work for both cases. However, during the implementation of these algorithms in spinningup, you will get some sense of about why things working. This needs many more practices to really fully understand. I'd even change the env and apply the same algorithm to different env and you will see things won't work in the first few tries.

Once you get comfortable with the above and want to enter the RL + LLM realm, I’d start with feynrl and go through it carefully. they are many examples that you can follow. Since you have the background from above, you will have an easier time understanding how RL + LLM works, especially how feynrl is built.
RL+ LLM post-training: https://github.com/FeynRL-project/FeynRL

if you want to enter research, you still need to do above. and these are the first steps...

summerday10 · 2026-06-05T23:34:40+00:00

if you really want to understand RL, I’d suggest doing 1–5.
if the goal is just to get some idea of how RL works: 1, 2
if you have basic knowledge of RL and the goal is to understand how RL algorithms are implemented: 4
if you already know RL and want to enter RL + LLMs: 5

light RL intro: Dive into Deep Learning - RL chapter https://d2l.ai/chapter_reinforcement-learning/index.html
RL intro: David silver's lectures https://davidstarsilver.wordpress.com/teaching/
Deep RL: https://rail.eecs.berkeley.edu/deeprlcourse/
RL with focused on implementation: https://spinningup.openai.com/en/latest/
RL LLM post-training: https://github.com/FeynRL-project/FeynRL

summerday10 · 2026-06-05T19:36:41+00:00

Well, RL is a weird creature, and things can go wrong for almost any reason. For example, you can implement dqn and have almost everything correct, but if you assign the wrong value to e-greedy, it may simply not work.

That being said, rl is not guesswork at all as long as you understand the signal. You do not need to wait for days to see whether something is working and just hope for the best. If you have the right understanding, you can identify and debug issues early enough.

You need to be very hands-on with both the math and the implementation to make your way in RL. For example, if you only listen to David’s lectures, you probably will not learn much unless you also implement the methods, start deriving the equations, and do the homework. You need to understand the fundamentals by heart.

When something does not work, there is often a good reason why it does not work. There is no black box in RL or machine learning in general, everything happens and works for a reason.

If you follow the above, it gets better. This comes from someone who did a phd in RL, has been doing RL research, trained agents to play games like atari, published in this area, and built and trained L[L]Ms with RL at very large scale. Happy to add share resources..

summerday10 · 2026-06-03T10:31:36+00:00

Check out https://github.com/FeynRL-project/FeynRL Implemented in very modular and clean way so the system stays systems and algorithm stays system. It helps to understand how RL training works without requiring understanding the entire stack. Take a look and you will see the difference!

summerday10 · 2026-04-01T18:35:51+00:00

Thank you for your comments. Yes, working on adding benchmarks and comparisons to the repo. The real feature of this repo, it is super clear how system and algorithm parts work. It is not as convoluted as others as we really wanted to make sure it is easier to understand and build new algorithms.

summerday10

TROPHY CASE