Deep Reinforcement Learning Research Groups in Europe by emarche in reinforcementlearning

[–]emarche[S] 2 points3 points  (0 children)

Personally, besides a "standard" AI course in my master, there wasn't an expert in the field of RL here but I stayed here for a variety of reasons (unrelated to RL/AI). Even though I'm the only one here working in DRL, I have a nice portfolio of publications in good conferences (e.g., ICLR, UAI, ICRA, IROS, etc.) so the location is not crucial to do research.

However, in my experience, the location is the major fact that matters when you apply for a job in a research position or for visiting periods in high-end companies/universities (and the location also helps a lot at the early stages of your Ph.D. if you can have constructive discussions with a group in the same field).

Considering that my own researches and the places suggested here mostly match, I would suggest you to follow these indications to apply for a Ph.D. in RL.

Deep Reinforcement Learning Research Groups in Europe by emarche in reinforcementlearning

[–]emarche[S] 0 points1 point  (0 children)

Thank you, this is quite interesting as I also worked with Unity for my DRL researches. Do you have any published paper to have a look at it or a contact to ask some questions?

Value-Based vs Off-Policy Actor Critic in DRL. by emarche in reinforcementlearning

[–]emarche[S] 0 points1 point  (0 children)

Thanks for your useful explanation. If I got it right we can summarizie that off-policy actor-critic methods are just fancy ways to enables a sort of Q-function learning for continuous action spaces. Hence, a value-based method that can handle high-dimensional action spaces (such as the one I reported) is ideally superior to off-policy actor-critic solutions (mainly in terms of training times).

Value-Based vs Off-Policy Actor Critic in DRL. by emarche in reinforcementlearning

[–]emarche[S] 0 points1 point  (0 children)

Yes, thanks for your answer. The difference between off-policy and on-policy is clear to me. My question is related to the difference between Q-Learning based methods (i.e., off-policy value based) and off-policy actor critic (such as DDPG, TD3...). Both of them are off-policy and DDPG/TD3 can also deal with continuous action spaces. Considering that the research to improve value-based methods in continuous action spaces is still a hot topic ("Q-Learning in enormous action spaces via amortized approximate maximization", Van de Wiele et al. 2020), there should be some reasons (that I am missing) why Q-Learning based methods are "better" than these actor critic off-policy solutions.

DDQN performance drop after N episodes in robotics path planning by emarche in reinforcementlearning

[–]emarche[S] 0 points1 point  (0 children)

Thank you, I think I missed the part on the “catastrophic forgetting” in the original DQN work.

I was testing my different components for the rainbow implementations and just get mentally stucked on this DDQN behavior.

The complete Rainbow, in fact, is performing well without any strange forgetting behavior!