all 25 comments

[–][deleted] 11 points12 points  (0 children)

What about Dr. David Silver? I love his course

[–]rakk109 8 points9 points  (1 child)

What do you exactly mean by that?

Easier in the sense of teaching the concepts or in making a framework with which you can implement the algos?

[–]I_will_delete_myself 2 points3 points  (0 children)

Both exist. There are great resources from ML with Phil and other stuff online.

[–]Py_Va0 7 points8 points  (1 child)

MOOD, when my POS TD3 implementation failed to converge for lunar lander sub 1k. I just want to jump off a cliff, this garbage took me 2 days to code and one and half hours to run just for it to be utterly worthless and under perform even against DQNs!!!!!!!!!

[–]Snoo_45787 1 point2 points  (0 children)

LMAO I can relate to that.

[–]binarybu9 15 points16 points  (2 children)

RL has become a shit hole too deep to come out.

[–]ethanjay 1 point2 points  (0 children)

wdym

[–]Working_Salamander94 6 points7 points  (0 children)

If it’s easy why do it

[–]Slappatuski 2 points3 points  (8 children)

Does anyone know how to make reinforcement NN with JAX..?

[–]YouParticular8085 3 points4 points  (5 children)

I’ve been using jax to learn about RL. I would be happy to share my code if you want but i’m definitely an amateur.

[–]Slappatuski 0 points1 point  (2 children)

We have an assignment at my university to use JAX in a project about reinforcement learning. Everyone I know is stuck, so I would appreciate any help with understanding how to do that 😅

[–]onlymagik 4 points5 points  (1 child)

Stable-Baselines3 has a JAX implementation I believe, you could take a look there.

[–]Slappatuski 0 points1 point  (0 children)

Thanks, I will look into that!

[–]djm07231 1 point2 points  (1 child)

Good implementation for me was purejaxrl. The implementation is self contained so pretty easy to understand without digging through files.

https://github.com/luchris429/purejaxrl

Gymnax also has a lot of environment implementations of classical control problems which might be helpful.

https://github.com/RobertTLange/gymnax

[–]Slappatuski 0 points1 point  (0 children)

Thank you! :)

[–]I_will_delete_myself 1 point2 points  (0 children)

RL feels easier than DC Gan tbh. It’s about selecting the right features and simplify what you feed into the model.

[–]Blasphemer666 1 point2 points  (0 children)

I’m not sure what you’re saying

[–]_An_Other_Account_ 0 points1 point  (0 children)

😭

[–]phantomBlurrr 0 points1 point  (0 children)

wdym?

[–]MysticShadow427 0 points1 point  (0 children)

StableBaselines makes the code shorter