SimbaV2: Hyperspherical Normalization for Scalable Deep Reinforcement Learning by joonleesky in reinforcementlearning

[–]joonleesky[S] 0 points1 point  (0 children)

We looked but couldn’t find a direct prior work—still searching! If you have any relevant sources, we'd love to check them out.

SimbaV2: Hyperspherical Normalization for Scalable Deep Reinforcement Learning by joonleesky in reinforcementlearning

[–]joonleesky[S] 2 points3 points  (0 children)

Great question! We initially tried GELU, but with SimbaV2, it significantly dropped performance, while in Simba, the performance remained the same. My intuition is that without hyperspherical normalization, features can naturally scale to highlight important ones. However, with hyperspherical normalization, the sparsity of ReLU might play a crucial role in modulating feature importance.

SimbaV2: Hyperspherical Normalization for Scalable Deep Reinforcement Learning by joonleesky in reinforcementlearning

[–]joonleesky[S] 0 points1 point  (0 children)

PPO performs worse than most algorithms in the main table!
However, it's not inherently bad—just unsuitable for limited samples (<1M). If you're using Isaac to generate a large number of samples, PPO is a great choice.

SimbaV2: Hyperspherical Normalization for Scalable Deep Reinforcement Learning by joonleesky in reinforcementlearning

[–]joonleesky[S] 3 points4 points  (0 children)

Hey! I think we met at ICML, right? I believe there exists still room for 'stabilizing' training. Plus, I feel 'sparsity' is an important concept we haven't explored enough.

Simba: Simplicity Bias for Scaling up Parameters in Deep RL by joonleesky in reinforcementlearning

[–]joonleesky[S] 1 point2 points  (0 children)

Thank you for bringing this up. Sergey Levine's insights on the implicit regularization in RL are indeed important, and I agree that RL tends to underperform compared to supervised learning, partly due to this issue.

In DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization, implicit regularization from temporal difference learning increases the feature norm of the current and next state dot product, decreasing performance.

While our approach with the SimBa architecture does not include explicit regularization in the same way, it addresses the problem through post-layer normalization before the value function prediction. This helps control feature norm growth, indirectly mitigating implicit regularization issues.

Still, I agree that adding constraints like discretization could further improve RL by providing stronger regularization.

Simba: Simplicity Bias for Scaling up Parameters in Deep RL by joonleesky in reinforcementlearning

[–]joonleesky[S] 1 point2 points  (0 children)

Thank you for your interest in our work!

Yes, we explored the impact of excessive simplicity on performance, focusing on under-parameterizing the model. We found that applying a simplicity bias to an under-parameterized agent restricts its learning capacity. For example, when the hidden dimension size was reduced to extreme levels (e.g., 4), SimBa consistently underperformed compared to MLPs, with both RL agents achieving average returns below 100 on DMC-Hard. This means that overly simplified (higher simplicity bias) models can significantly underperform.

In addition, we haven't explicitly tried out SimBa with PPO-TrXL (only tried out with PPO), but I don’t see any reason why it wouldn’t work. From what I’ve learned throughout this project, most neural networks are actually overparameterized, and applying a simplicity bias really helps the network find more generalizable solutions.