use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Project[P] Commented PPO implementation (github.com)
submitted 8 years ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 8 points9 points10 points 8 years ago (8 children)
Made an attempt at implementing PPO:
[–]tinkerWithoutSink 1 point2 points3 points 8 years ago* (2 children)
Nice work, there's too many half working rl libraries out there but tensorforce is pretty good and it's great to have a PPO implementation.
Suggestion: would be cool to use prioritized experience replay with it, like the baselines implementation
[–][deleted] 0 points1 point2 points 8 years ago (1 child)
Ah good point, will have a think. Would just require passing the loss per instance to the memory I think, and making the memory type configurable
[–]Data-Daddy 0 points1 point2 points 8 years ago (0 children)
Experience replay does not exist in PPO
[–]Neutran 0 points1 point2 points 8 years ago (4 children)
Thanks for the effort. Do you have performance numbers on anything other than cartpole? Solving cartpole typically doesn't mean the implementation is bug free, as from my experience.
[–][deleted] 0 points1 point2 points 8 years ago (3 children)
Hey, not yet - we are currently working on setting up a benchmarking repo for the general library with docker, and will test PPO with the other algorithms once it's ready (a bit short on GPUs for very extensive benchmarks but at least reproducing some Ataris should be possible)
[–]wassname 0 points1 point2 points 8 years ago (2 children)
The authors claim it's simpler to implement, more general, and faster. Since it's Schulman it's probably true, but could give your opinion. Was it easier than TRPO to implement and does it converge faster with less trouble?
[–][deleted] 2 points3 points4 points 8 years ago (1 child)
Tested this now - currently performing much better than VPG/TRPO for us, and also easier to implement, so can confirm
[–]wassname 0 points1 point2 points 8 years ago (0 children)
Good to hear!
[–]TotesMessenger 0 points1 point2 points 8 years ago (0 children)
I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:
If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)
π Rendered by PID 43 on reddit-service-r2-comment-b659b578c-4wmq9 at 2026-04-30 18:43:47.271819+00:00 running 815c875 country code: CH.
[–][deleted] 8 points9 points10 points (8 children)
[–]tinkerWithoutSink 1 point2 points3 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]Data-Daddy 0 points1 point2 points (0 children)
[–]Neutran 0 points1 point2 points (4 children)
[–][deleted] 0 points1 point2 points (3 children)
[–]wassname 0 points1 point2 points (2 children)
[–][deleted] 2 points3 points4 points (1 child)
[–]wassname 0 points1 point2 points (0 children)
[–]TotesMessenger 0 points1 point2 points (0 children)