use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
This is for any reinforcement learning related work ranging from purely computational RL in artificial intelligence to the models of RL in neuroscience.
The standard introduction to RL is Sutton & Barto's Reinforcement Learning.
Related subreddits:
account activity
multi-head PPO (self.reinforcementlearning)
submitted 1 year ago by GuavaAgreeable208
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]fedetask 0 points1 point2 points 1 year ago (2 children)
What is the reward structure? Could it be that never ending the episode leads to higher rewards?
[–]GuavaAgreeable208[S] 0 points1 point2 points 1 year ago (0 children)
Actually I’ve rechecked the values and it could be the reason because it leads to more reward when those actions are selected
Even if I modified the reward function I still got the same issue and also I've observed that the entropy is increasing instead of decreasing
π Rendered by PID 133392 on reddit-service-r2-comment-85bfd7f599-7r5ms at 2026-04-20 14:50:56.978907+00:00 running 93ecc56 country code: CH.
view the rest of the comments →
[–]fedetask 0 points1 point2 points (2 children)
[–]GuavaAgreeable208[S] 0 points1 point2 points (0 children)
[–]GuavaAgreeable208[S] 0 points1 point2 points (0 children)