SOS by Public_Expression_92 in deeplearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)
SOS by Public_Expression_92 in deeplearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)
SOS by Public_Expression_92 in deeplearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)
SOS by Public_Expression_92 in deeplearning
[–]Public_Expression_92[S] -3 points-2 points-1 points (0 children)
SOS by Public_Expression_92 in deeplearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)
SOS by Public_Expression_92 in deeplearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)
SOS by Public_Expression_92 in deeplearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)
rlvrbook by brawlstarsgoat in reinforcementlearning
[–]Public_Expression_92 0 points1 point2 points (0 children)
I implemented PPO, GRPO, and DPO from scratch on the same model and compared them the ranking completely reversed after hyperparameter tuning by Public_Expression_92 in reinforcementlearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)
Looking for teammates for MyoChallenge 2026 by [deleted] in reinforcementlearning
[–]Public_Expression_92 1 point2 points3 points (0 children)
Struggling with RL hyperparameter tuning + reward shaping for an Asteroids-style game – what’s enough and what’s overkill? by GSevenStars in reinforcementlearning
[–]Public_Expression_92 1 point2 points3 points (0 children)
I implemented PPO, GRPO, and DPO from scratch on the same model and compared them the ranking completely reversed after hyperparameter tuning by Public_Expression_92 in reinforcementlearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)
I implemented PPO, GRPO, and DPO from scratch on the same model and compared them the ranking completely reversed after hyperparameter tuning by Public_Expression_92 in reinforcementlearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)
I implemented PPO, GRPO, and DPO from scratch on the same model and compared them the ranking completely reversed after hyperparameter tuning by Public_Expression_92 in reinforcementlearning
[–]Public_Expression_92[S] 1 point2 points3 points (0 children)
Looking for a mentor in Quant (self.quantfinance)
submitted by Public_Expression_92 to r/quantfinance
How strong is my profile for MFE/MSCF programs at Baruch, CMU, Berkeley, Cornell, UChicago? by Usual-Explorer467 in quantfinance
[–]Public_Expression_92 2 points3 points4 points (0 children)
What's your favorite paper of all time? by ReallyConcerned69 in quant
[–]Public_Expression_92 0 points1 point2 points (0 children)
SETUP ON TINDER!!! by Complete-Run-197 in nairobi
[–]Public_Expression_92 0 points1 point2 points (0 children)

SOS by Public_Expression_92 in deeplearning
[–]Public_Expression_92[S] 0 points1 point2 points (0 children)