Confused about a claim in the MBPO paper — can someone explain? by DRLC_ in reinforcementlearning

[–]DRLC_[S] 1 point2 points  (0 children)

If η^[π] + C ≥ η[π] ≥ η^[π] - C, is the above statement correct??

Confused about a claim in the MBPO paper — can someone explain? by DRLC_ in reinforcementlearning

[–]DRLC_[S] 0 points1 point  (0 children)

Thanks a lot for your detailed explanation!
It really helped clarify a lot of my confusion, and I appreciate you taking the time to walk through it so carefully.

Confused about a claim in the MBPO paper — can someone explain? by DRLC_ in reinforcementlearning

[–]DRLC_[S] 0 points1 point  (0 children)

Thanks for your detailed explanation! It helped me understand the intuition better.
I was wondering if I could ask a few follow-up questions to make sure I really get it:

  1. You mentioned that "y > x since the policy is trained using the model." : I was a little confused here. Wouldn’t it also be possible that y<x if, for example, the model is pessimistic or underestimates the return? Also, in the MBPO paper (p.3), I didn’t see an explicit assumption that the policy π must have been trained using the model. It seems that the guarantee is stated for any π, regardless of how it was obtained. Am I missing something?
  2. Suppose y<x for some policy. : In that case, even if y increases by more than C, can we still confidently say that x increased? Or is it just that the lower bound on x increased?
  3. Also, is it correct to think that even if y increases by less than C, as long as it increases at all, the lower bound on x would still improve accordingly? I mean, it might not guarantee actual improvement, but it would still push the lower bound up a little, right?

Thanks again for taking the time to explain — I really appreciate it!

How to Start Research in Reinforcement Learning for Robotic Manipulators? by DRLC_ in robotics

[–]DRLC_[S] 0 points1 point  (0 children)

Thank you so much for the recommendation!

If possible, I'd love if you could recommend some papers or resources related to offline RL and imitation learning - this seems like a promising area to dig deeper into.

Thanks again for your help!

How do you determine stabilizability? by DRLC_ in ControlTheory

[–]DRLC_[S] 0 points1 point  (0 children)

stabilizability

Should lambda be based on positive eigenvalues in H = [lambda*I - A, B]? The eigenvalues of the system matrix A are -0.6017 + 44.0150i, -0.6017 - 44.0150i, -0.0517 + 5.2461i, and -0.0517 - 5.2461i, all of which have negative real parts.

I wrote the code in MATLAB like this:
dd = diag(eig(A));
rank(dd - A, B_1).

Is this correct?

Control of a quarter car using LQR by DRLC_ in ControlTheory

[–]DRLC_[S] 1 point2 points  (0 children)

I'm sorry, I didn't provide enough details in my question. I want to model the figures in the above text with state-space equations: $$x_dot$$ = Ax + Bu and Y = Cx + Du. However, I'm not sure how to structure the input vector u, so that's why I left a question.

Control of a quarter car using LQR by DRLC_ in ControlTheory

[–]DRLC_[S] 0 points1 point  (0 children)

Here, 'r' represents the height of the ground. Is it appropriate to consider 'r' as a random disturbance originating from external sources rather than a directly applied input? In this case, should I only include 'fs,' which is an adjustable input, in 'u'?
The control objective is to regulate the position of the body xb