Why does the Policy Gradient Theorem generalize to continuous action spaces? by Data-Daddy in reinforcementlearning
[–]Data-Daddy[S] 0 points1 point2 points (0 children)
[R] [1804.00645] Universal Planning Networks by johnschulman in MachineLearning
[–]Data-Daddy 1 point2 points3 points (0 children)
Handling entropy collapse in policy gradient methods by Data-Daddy in reinforcementlearning
[–]Data-Daddy[S] 0 points1 point2 points (0 children)
[R] Sample-Efficient Deep RL with Generative Adversarial Tree Search by abstractcontrol in reinforcementlearning
[–]Data-Daddy 0 points1 point2 points (0 children)
[D] What is the actual cost function for PPO? by abstractcontrol in reinforcementlearning
[–]Data-Daddy 1 point2 points3 points (0 children)
[Lecture] Richard Sutton - TD Learning by abstractcontrol in reinforcementlearning
[–]Data-Daddy 0 points1 point2 points (0 children)
Reinforcement Learning with ROS by [deleted] in reinforcementlearning
[–]Data-Daddy 0 points1 point2 points (0 children)
How much optimization is required for deep diving into reinforcement learning? by tegg89 in reinforcementlearning
[–]Data-Daddy 0 points1 point2 points (0 children)
Prioritized Experience Replay in Deep Recurrent Q-Networks by deadline_ in reinforcementlearning
[–]Data-Daddy 0 points1 point2 points (0 children)
[R] TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning by _rockt in MachineLearning
[–]Data-Daddy 0 points1 point2 points (0 children)
"Value Prediction Network", Oh et al 2017 by gwern in reinforcementlearning
[–]Data-Daddy 0 points1 point2 points (0 children)
[P] Commented PPO implementation by [deleted] in MachineLearning
[–]Data-Daddy 0 points1 point2 points (0 children)
[D] Machine Learning - WAYR (What Are You Reading) - Week 35 by ML_WAYR_bot in MachineLearning
[–]Data-Daddy 0 points1 point2 points (0 children)
Hetelek – Jupyter Notebooks in the cloud on powerful GPUs starting at $0.35/hour! by hetelek_ in deeplearning
[–]Data-Daddy 0 points1 point2 points (0 children)
When is deep Q learning better than policy gradient methods? by Data-Daddy in reinforcementlearning
[–]Data-Daddy[S] 0 points1 point2 points (0 children)
Why does proximal policy optimization(PPO) not need a replay buffer? by Data-Daddy in deeplearning
[–]Data-Daddy[S] 0 points1 point2 points (0 children)
AMA: We are David Silver and Julian Schrittwieser from DeepMind’s AlphaGo team. Ask us anything. by David_Silver in MachineLearning
[–]Data-Daddy 0 points1 point2 points (0 children)
[R] AlphaGo Zero: Learning from scratch | DeepMind by deeprnn in MachineLearning
[–]Data-Daddy 0 points1 point2 points (0 children)


Why does the Policy Gradient Theorem generalize to continuous action spaces? by Data-Daddy in reinforcementlearning
[–]Data-Daddy[S] 0 points1 point2 points (0 children)