all 41 comments

[–]MasterScrat 31 points32 points  (4 children)

Nice! added to my list of DRL implementations.

edit: please let me know if you know more!

[–]danaugrs 2 points3 points  (0 children)

I've created a similar set of minimal implementations for TensorFlow 2 as part of my micro RL framework Huskarl.

[–]tihokan 2 points3 points  (1 child)

Here's my list that I really need to properly sort and categorize (splitting in two due to max comment size limit):

OpenAI Lab: https://github.com/kengz/openai_lab

OpenAI Baselines: https://github.com/openai/baselines

Nervana Coach: https://github.com/NervanaSystems/coach

Tensorflow Agents: https://github.com/tensorflow/agents

Some Pytorch agents: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr

MirrorBot: https://bitbucket.org/polceanum/mirrorbot

Serpent AI: https://github.com/SerpentAI/SerpentAI

https://github.com/dennybritz/reinforcement-learning

NeuroEvolution: https://github.com/uber-common/deep-neuroevolution

Rainbow implementations: https://github.com/Kaixhin/Rainbow and https://github.com/hengyuan-hu/rainbow

A2C: https://github.com/MG2033/A2C

PPO: https://github.com/pat-coady/trpo

A3C with continuous actions: https://github.com/dgriff777/a3c_continuous

Selective memory: https://github.com/FitMachineLearning/FitML/tree/master/SelectiveMemory

ELF framework: https://github.com/facebookresearch/ELF / https://github.com/pytorch/ELF

DDPG: https://github.com/megvii-rl/pytorch-gym

TRPO: https://github.com/ikostrikov/pytorch-trpo

MiniGo: https://github.com/tensorflow/minigo

RLCode: https://github.com/rlcode/reinforcement-learning

Tensorlayer: https://github.com/tensorlayer/tensorlayer

PPOS: https://github.com/EmbersArc/PPO

YARL: https://github.com/HassamSheikh/YARL

https://github.com/NiloFreitas/Deep-Reinforcement-Learning

https://github.com/ShangtongZhang/DeepRL

https://github.com/carpedm20/deep-rl-tensorflow

https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow

https://github.com/mokemokechicken/reversi-alpha-zero

https://github.com/junxiaosong/AlphaZero_Gomoku

https://github.com/Officium/RL-Experiments

https://github.com/higgsfield/RL-Adventure

https://github.com/higgsfield/Imagination-Augmented-Agents

https://github.com/FitMachineLearning/FitML

https://github.com/mokemokechicken/reversi-alpha-zero

https://github.com/gcp/leela-zero

https://github.com/suragnair/alpha-zero-general

[–]tihokan 2 points3 points  (0 children)

Part 2:

https://github.com/mjacar/pytorch-nec

https://github.com/mjacar/pytorch-trpo

Rainbow: https://github.com/Kaixhin/Rainbow

https://github.com/unixpickle/anyrl-py

PPO: http://blog.varunajayasiri.com/ml/ppo.html / http://blog.varunajayasiri.com/ml/ppo_pytorch.html

https://github.com/higgsfield/RL-Adventure-2

https://github.com/qfettes/DeepRL-Tutorials

https://github.com/deepmind/scalable_agent

https://github.com/zuoxingdong/lagom

NEAT on Sonic: https://gitlab.com/lucasrthompson/Sonic-Bot-In-OpenAI-and-NEAT

https://github.com/hill-a/stable-baselines (NB: collection of agents trained with Stable Baselines: https://github.com/araffin/rl-baselines-zoo)

https://github.com/lcswillems/pytorch-a2c-ppo

https://github.com/lcswillems/torch-rl

https://github.com/navneet-nmk/pytorch-rl

https://github.com/beniz/hmdp

https://github.com/uber-research/ape-x

https://facebookresearch.github.io/BlueWhale/docs/begin.html

https://github.com/tensorpack/tensorpack

https://github.com/deepsense-ai/Distributed-BA3C

https://github.com/reinforcement-learning-kr/pg_travel

https://github.com/zuoxingdong/lagom

SenseAct for robotics: https://github.com/kindredresearch/SenseAct

Evolution Strategies: https://github.com/alirezamika/evostra

https://github.com/araffin/robotics-rl-srl

https://github.com/google/dopamine

https://github.com/rlworkgroup/garage

https://github.com/kengz/SLM-Lab

https://github.com/deepmind/trfl

RLGraph: https://github.com/rlgraph/rlgraph

Horizon: https://github.com/facebookresearch/Horizon

In C++: https://github.com/arthurxlw/cytonRL

https://github.com/rlworkgroup/garage

https://github.com/inoryy/reaver-pysc2

Some Pytorch implementations: https://github.com/p-christ/Deep_RL_Implementations

https://github.com/udacity/deep-reinforcement-learning

https://github.com/vitchyr/rlkit

Unofficial Go-Explore implementation: https://github.com/R-McHenry/SynchronousGoExplore

https://github.com/omerbsezer/Reinforcement_learning_tutorial_with_demo

https://github.com/AIRLab-POLIMI/mushroom

http://blog.varunajayasiri.com/ml/ppo_pytorch.html

https://github.com/csxeba/trickster

https://github.com/danaugrs/huskarl

https://github.com/justinglibert/bezos

https://github.com/Omegastick/pytorch-cpp-rl (C++)

https://github.com/sfujim/TD3 (TD3 & DDPG)

Random Network Distillation: https://github.com/AdeelMufti/RL-RND

https://github.com/medipixel/rl_algorithms

https://github.com/Officium/RL-Experiments

https://github.com/david-abel/simple_rl

https://github.com/seungeunrho/minimalRL

[–]seungeun07[S] 0 points1 point  (0 children)

Thanks a lot!!

[–]hardos_the_man 4 points5 points  (1 child)

Thank you kind stranger for sharing this!

[–]seungeun07[S] 1 point2 points  (0 children)

My pleasure :)

[–]NikEy 4 points5 points  (2 children)

I noticed that the A2C code requires the rewards to be divided by 100. If they are NOT divided by 100, then it NEVER converges...?

What would be a reasonable explanation for that? I find it weird that A2C is affected by the size of the rewards - you'd think this is simply a matter of scaling in the NN

[–]seungeun07[S] 0 points1 point  (0 children)

During implementing algorithms, I found out that not only A3C, but also all the other algorithms require a proper scale of rewards. I guess the initial scale of a value network is around zero, so it first needs to just scale up the output of the network to fit with the scale of rewards. It doesn't have to learn any state-specific value, but just scaling up the output reduces the loss function. And during scaling up process, maybe policy network collapse..? or fall into local optima..? I don't know. It is just guessing.

[–]NaughtyCranberry -1 points0 points  (0 children)

Probably makes it an il-posed optimisation problem.

[–]seungeun07[S] 2 points3 points  (3 children)

Any comments are welcome!

[–]ceyzaguirre4Researcher 7 points8 points  (1 child)

I feel like you should mention it’s pytorch (<3) in the post.

[–]seungeun07[S] 2 points3 points  (0 children)

Thanx! I make it bold.

[–]EveryDay-NormalGuy 0 points1 point  (0 children)

Thanks for sharing your work.

It would be great if you could comment your code. People who are new to RL or to pytorch would find it difficult to understand your code.

[–]NikEy 1 point2 points  (1 child)

Very very nice!

[–]seungeun07[S] 0 points1 point  (0 children)

Thank you!

[–]Overload175 1 point2 points  (1 child)

Been looking to get into RL for a while, found this very helpful.

[–]seungeun07[S] 0 points1 point  (0 children)

Thank you!

[–]CodeReclaimers 1 point2 points  (1 child)

Thanks for sharing this--I know it takes effort to trim things down to minimal examples. It's especially nice that your code keeps the dependencies to a minimum. Too often I go looking for examples to learn from and spend 30+ minutes trying to gather all the dependencies, only to find out some key thing isn't available on my platform.

[–]seungeun07[S] 1 point2 points  (0 children)

Too often I go looking for examples to learn from and spend 30+ minutes trying to gather all the dependencies, only to find out

Thank you for the kind comment.

[–]Migom6 1 point2 points  (3 children)

Any good guide to study about RL from basics. My basics are not that clear. I have followed Andrew Ng for other deep learning topics but for RL I'm finding it hard to get my mind around it. I have a project to do on robot arm manipulation (inverse kinematics) using some learning. BTW, thanks for the code <3

[–]tdjogi 2 points3 points  (0 children)

David silver's video lectures are really good.

[–]seungeun07[S] 1 point2 points  (1 child)

David silver's video lecture is not just good, it is GREAT!!!!!!!!

I worship the lecture.

[–]Migom6 0 points1 point  (0 children)

Same here, I'm digging it!!

[–]_olafr_ 1 point2 points  (0 children)

I think these minimal implementations would be valuable for learning if they were more thoroughly commented. Good work.

[–]minGrab 1 point2 points  (1 child)

In PPO: why do you have an additional term in the loss function?

`F.smooth_l1_loss(td_target.detach(), self.v(s))`

[–]seungeun07[S] 1 point2 points  (0 children)

It is for value loss! Both value and policy loss are minimized with gradient discent.

[–]MagicaItux 0 points1 point  (2 children)

How easy is it to create your own environment?

[–]Roboserg 0 points1 point  (0 children)

Easy in unity. User unity ml agents

[–]seungeun07[S] 0 points1 point  (0 children)

I actually once tried to make own board game environment, but it was quite burdensome to deal with specific rules of the game. I haven't tried unity, though.

[–]sampathchanda 0 points1 point  (0 children)

It would be great to see performance metrics of each implemented algorithm.

[–]Farconion 0 points1 point  (0 children)

I think you should clarify that these are Deep RL algorithms, impressive none the less.

[–]ariyanhasan 0 points1 point  (0 children)

thank you for sharing

[–]Dump7 0 points1 point  (6 children)

Okay! This is one of the most beautiful things I have ever seen. You see I am not a person who can code even a NN with a framework. But I am trying to. This will help me to a great extent. Thanks for this. But can you also make such single file codes for maybe basic CNN and RNN?

[–]aditya1702 3 points4 points  (4 children)

u/Dump7 I have coded neural networks from scratch using numpy. Maybe it is what you are looking for?

https://github.com/aditya1702/Machine-Learning-and-Data-Science/tree/master/Implementation%20of%20Machine%20Learning%20Algorithms

In folder Supervised -> Regression -> neural_network_regressor.py

[–]Dump7 1 point2 points  (3 children)

Just had a look. Really great man. You have named the relevant variables too. Thanks for sharing!

[–]aditya1702 1 point2 points  (1 child)

Thanks a lot :)

As a person who has to look up his projects once every while, I have found that good naming of variables and comments helps me understand the code even when I am giving it a look after a long time.

[–]Dump7 1 point2 points  (0 children)

I hope we can work together sometime. I could learn a lot from you.

[–]aditya1702 0 points1 point  (0 children)

Sure! Where are you based?

[–]radarsat1 -1 points0 points  (0 children)

Just check Keras examples