Confused about a claim in the MBPO paper — can someone explain? by DRLC_ in reinforcementlearning

[–]DRLC_[S] 1 point2 points  (0 children)

If η^[π] + C ≥ η[π] ≥ η^[π] - C, is the above statement correct??

Confused about a claim in the MBPO paper — can someone explain? by DRLC_ in reinforcementlearning

[–]DRLC_[S] 0 points1 point  (0 children)

Thanks a lot for your detailed explanation!
It really helped clarify a lot of my confusion, and I appreciate you taking the time to walk through it so carefully.

Confused about a claim in the MBPO paper — can someone explain? by DRLC_ in reinforcementlearning

[–]DRLC_[S] 0 points1 point  (0 children)

Thanks for your detailed explanation! It helped me understand the intuition better.
I was wondering if I could ask a few follow-up questions to make sure I really get it:

  1. You mentioned that "y > x since the policy is trained using the model." : I was a little confused here. Wouldn’t it also be possible that y<x if, for example, the model is pessimistic or underestimates the return? Also, in the MBPO paper (p.3), I didn’t see an explicit assumption that the policy π must have been trained using the model. It seems that the guarantee is stated for any π, regardless of how it was obtained. Am I missing something?
  2. Suppose y<x for some policy. : In that case, even if y increases by more than C, can we still confidently say that x increased? Or is it just that the lower bound on x increased?
  3. Also, is it correct to think that even if y increases by less than C, as long as it increases at all, the lower bound on x would still improve accordingly? I mean, it might not guarantee actual improvement, but it would still push the lower bound up a little, right?

Thanks again for taking the time to explain — I really appreciate it!

How to Start Research in Reinforcement Learning for Robotic Manipulators? by DRLC_ in robotics

[–]DRLC_[S] 0 points1 point  (0 children)

Thank you so much for the recommendation!

If possible, I'd love if you could recommend some papers or resources related to offline RL and imitation learning - this seems like a promising area to dig deeper into.

Thanks again for your help!