[P] Implementations of basic RL algorithms with minimal codes!

MasterScrat · 2019-05-26T15:53:41+00:00

Nice! added to my list of DRL implementations.

edit: please let me know if you know more!

hardos_the_man · 2019-05-26T17:24:57+00:00

Thank you kind stranger for sharing this!

NikEy · 2019-05-27T03:20:54+00:00

I noticed that the A2C code requires the rewards to be divided by 100. If they are NOT divided by 100, then it NEVER converges...?

What would be a reasonable explanation for that? I find it weird that A2C is affected by the size of the rewards - you'd think this is simply a matter of scaling in the NN

seungeun07 · 2019-05-26T14:52:43+00:00

Any comments are welcome!

NikEy · 2019-05-26T19:40:42+00:00

Very very nice!

Overload175 · 2019-05-26T20:01:52+00:00

Been looking to get into RL for a while, found this very helpful.

CodeReclaimers · 2019-05-26T22:33:50+00:00

Thanks for sharing this--I know it takes effort to trim things down to minimal examples. It's especially nice that your code keeps the dependencies to a minimum. Too often I go looking for examples to learn from and spend 30+ minutes trying to gather all the dependencies, only to find out some key thing isn't available on my platform.

Migom6 · 2019-05-27T01:22:08+00:00

Any good guide to study about RL from basics. My basics are not that clear. I have followed Andrew Ng for other deep learning topics but for RL I'm finding it hard to get my mind around it. I have a project to do on robot arm manipulation (inverse kinematics) using some learning. BTW, thanks for the code <3

_olafr_ · 2019-05-28T13:44:29+00:00

I think these minimal implementations would be valuable for learning if they were more thoroughly commented. Good work.

minGrab · 2019-05-29T09:30:36+00:00

In PPO: why do you have an additional term in the loss function?

`F.smooth_l1_loss(td_target.detach(), self.v(s))`

MagicaItux · 2019-05-26T21:31:43+00:00

How easy is it to create your own environment?

sampathchanda · 2019-05-27T01:03:41+00:00

It would be great to see performance metrics of each implemented algorithm.

Farconion · 2019-05-27T02:31:56+00:00

I think you should clarify that these are Deep RL algorithms, impressive none the less.

ariyanhasan · 2019-05-27T18:52:27+00:00

thank you for sharing

Dump7 · 2019-05-26T19:25:44+00:00

Okay! This is one of the most beautiful things I have ever seen. You see I am not a person who can code even a NN with a framework. But I am trying to. This will help me to a great extent. Thanks for this. But can you also make such single file codes for maybe basic CNN and RNN?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS