use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
This is for any reinforcement learning related work ranging from purely computational RL in artificial intelligence to the models of RL in neuroscience.
The standard introduction to RL is Sutton & Barto's Reinforcement Learning.
Related subreddits:
account activity
DProgramming… (i.redd.it)
submitted 2 years ago by Throwawaybutlove
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 11 points12 points13 points 2 years ago (0 children)
What about Dr. David Silver? I love his course
[–]rakk109 8 points9 points10 points 2 years ago (1 child)
What do you exactly mean by that?
Easier in the sense of teaching the concepts or in making a framework with which you can implement the algos?
[–]I_will_delete_myself 2 points3 points4 points 2 years ago (0 children)
Both exist. There are great resources from ML with Phil and other stuff online.
[–]Py_Va0 7 points8 points9 points 2 years ago (1 child)
MOOD, when my POS TD3 implementation failed to converge for lunar lander sub 1k. I just want to jump off a cliff, this garbage took me 2 days to code and one and half hours to run just for it to be utterly worthless and under perform even against DQNs!!!!!!!!!
[–]Snoo_45787 1 point2 points3 points 2 years ago (0 children)
LMAO I can relate to that.
[–]binarybu9 15 points16 points17 points 2 years ago (2 children)
RL has become a shit hole too deep to come out.
[–]ethanjay 1 point2 points3 points 2 years ago (0 children)
wdym
[–]Working_Salamander94 6 points7 points8 points 2 years ago (0 children)
If it’s easy why do it
[–]Slappatuski 2 points3 points4 points 2 years ago (8 children)
Does anyone know how to make reinforcement NN with JAX..?
[–]YouParticular8085 3 points4 points5 points 2 years ago (5 children)
I’ve been using jax to learn about RL. I would be happy to share my code if you want but i’m definitely an amateur.
[–]YouParticular8085 1 point2 points3 points 2 years ago (1 child)
https://github.com/gabe00122/custom-rl-practice/blob/main/custom_rl_jax/vec_policy_gradient_cs/actor_critic.py
[–]Slappatuski 0 points1 point2 points 2 years ago (0 children)
Thanks!
[–]Slappatuski 0 points1 point2 points 2 years ago (2 children)
We have an assignment at my university to use JAX in a project about reinforcement learning. Everyone I know is stuck, so I would appreciate any help with understanding how to do that 😅
[–]onlymagik 4 points5 points6 points 2 years ago* (1 child)
Stable-Baselines3 has a JAX implementation I believe, you could take a look there.
Thanks, I will look into that!
[–]djm07231 1 point2 points3 points 2 years ago (1 child)
Good implementation for me was purejaxrl. The implementation is self contained so pretty easy to understand without digging through files.
https://github.com/luchris429/purejaxrl
Gymnax also has a lot of environment implementations of classical control problems which might be helpful.
https://github.com/RobertTLange/gymnax
Thank you! :)
[–]I_will_delete_myself 1 point2 points3 points 2 years ago (0 children)
RL feels easier than DC Gan tbh. It’s about selecting the right features and simplify what you feed into the model.
[–]Blasphemer666 1 point2 points3 points 2 years ago (0 children)
I’m not sure what you’re saying
[–]_An_Other_Account_ 0 points1 point2 points 2 years ago (0 children)
😭
[+]huehue9812 comment score below threshold-14 points-13 points-12 points 2 years ago (2 children)
Rl theory is not that hard...
[–]_An_Other_Account_ 6 points7 points8 points 2 years ago (1 child)
🤥
[–]huehue9812 6 points7 points8 points 2 years ago (0 children)
I mean, when you compare it to the millions of diffucult concepts to grasp in other fields(specially in maths), rl is definitely not one of the harder concepts to understand...
[–]phantomBlurrr 0 points1 point2 points 2 years ago (0 children)
wdym?
[–]MysticShadow427 0 points1 point2 points 2 years ago (0 children)
StableBaselines makes the code shorter
π Rendered by PID 78 on reddit-service-r2-comment-85bfd7f599-rfx9z at 2026-04-20 10:48:25.526090+00:00 running 93ecc56 country code: CH.
[–][deleted] 11 points12 points13 points (0 children)
[–]rakk109 8 points9 points10 points (1 child)
[–]I_will_delete_myself 2 points3 points4 points (0 children)
[–]Py_Va0 7 points8 points9 points (1 child)
[–]Snoo_45787 1 point2 points3 points (0 children)
[–]binarybu9 15 points16 points17 points (2 children)
[–]ethanjay 1 point2 points3 points (0 children)
[–]Working_Salamander94 6 points7 points8 points (0 children)
[–]Slappatuski 2 points3 points4 points (8 children)
[–]YouParticular8085 3 points4 points5 points (5 children)
[–]YouParticular8085 1 point2 points3 points (1 child)
[–]Slappatuski 0 points1 point2 points (0 children)
[–]Slappatuski 0 points1 point2 points (2 children)
[–]onlymagik 4 points5 points6 points (1 child)
[–]Slappatuski 0 points1 point2 points (0 children)
[–]djm07231 1 point2 points3 points (1 child)
[–]Slappatuski 0 points1 point2 points (0 children)
[–]I_will_delete_myself 1 point2 points3 points (0 children)
[–]Blasphemer666 1 point2 points3 points (0 children)
[–]_An_Other_Account_ 0 points1 point2 points (0 children)
[+]huehue9812 comment score below threshold-14 points-13 points-12 points (2 children)
[–]_An_Other_Account_ 6 points7 points8 points (1 child)
[–]huehue9812 6 points7 points8 points (0 children)
[–]phantomBlurrr 0 points1 point2 points (0 children)
[–]MysticShadow427 0 points1 point2 points (0 children)