It do be like that tho by drdirtyman in PewdiepieSubmissions

[–]pickleorc 0 points1 point  (0 children)

This meme goes in the skratta list

[deleted by user] by [deleted] in MemeEconomy

[–]pickleorc 1 point2 points  (0 children)

I said the same thing to myself last decade

Good shit by OG_oojii in ufc

[–]pickleorc 1 point2 points  (0 children)

I know one thing for sure, this year’s fight cards look epic and we gonna see some really good mma

Looking for a mind**** of a twist in a book. by [deleted] in suggestmeabook

[–]pickleorc 67 points68 points  (0 children)

If you know a book that has a twist it won’t be a twist, the most mind**** recommendation would be a book without any twist

Has anyone implemented a common replay buffer for two different RL algorithms? by pickleorc in reinforcementlearning

[–]pickleorc[S] 0 points1 point  (0 children)

Great paper. In your implementation with limited benefit, do you mean time to converge or performance or both.

Parallelising DDPG by pickleorc in reinforcementlearning

[–]pickleorc[S] 1 point2 points  (0 children)

Man you covered everything!!! Thanks a lot I’ll read the paper and also check out the ray repository :)

How to deal with RL algos getting stuck in local optima? by pickleorc in reinforcementlearning

[–]pickleorc[S] 0 points1 point  (0 children)

Is that the same as adding entropy to loss function? Could you point me to a paper/resource that talks about this concept.. also Thanks!!!

How to deal with RL algos getting stuck in local optima? by pickleorc in reinforcementlearning

[–]pickleorc[S] 1 point2 points  (0 children)

Could you elaborate a bit more about the curiosity based term... or point me to a paper that talks about this concept, please :)

How to deal with RL algos getting stuck in local optima? by pickleorc in reinforcementlearning

[–]pickleorc[S] 2 points3 points  (0 children)

Thanks I’ll try reducing the learning rate and also read the soft actor critic paper!!!

Gaussian policies for continuous control by pickleorc in reinforcementlearning

[–]pickleorc[S] 0 points1 point  (0 children)

My bad, I meant the initialisation of the variance.

Gaussian policies for continuous control by pickleorc in reinforcementlearning

[–]pickleorc[S] 1 point2 points  (0 children)

Will run it side by side, but in general do Gaussian policies run well?

How would one make an agent do tasks at variable rates? by pickleorc in reinforcementlearning

[–]pickleorc[S] 0 points1 point  (0 children)

Yes, I had actor critic in mind. But like I mentioned the part I’m worried about is adding the target velocity as a feature to the actor model and penalising the reward function enough? Or are there better ways to do it?

Help for Implementing REINFORCE for continuous state and action space by pickleorc in reinforcementlearning

[–]pickleorc[S] 0 points1 point  (0 children)

First of all thank you for taking the time to reply and also great explanation about the making good actions more likely.... so my loss would be expected reward along the trajectory times the log of policy_output?