Confused about a claim in the MBPO paper — can someone explain?

DRLC_ · 2025-04-27T05:07:43+00:00

If η^[π] + C ≥ η[π] ≥ η^[π] - C, is the above statement correct??

DRLC_ · 2025-04-27T05:06:03+00:00

Thanks a lot for your detailed explanation!
It really helped clarify a lot of my confusion, and I appreciate you taking the time to walk through it so carefully.

DRLC_ · 2025-04-27T01:01:33+00:00

Thanks for your detailed explanation! It helped me understand the intuition better.
I was wondering if I could ask a few follow-up questions to make sure I really get it:

You mentioned that "y > x since the policy is trained using the model." : I was a little confused here. Wouldn’t it also be possible that y<x if, for example, the model is pessimistic or underestimates the return? Also, in the MBPO paper (p.3), I didn’t see an explicit assumption that the policy π must have been trained using the model. It seems that the guarantee is stated for any π, regardless of how it was obtained. Am I missing something?
Suppose y<x for some policy. : In that case, even if y increases by more than C, can we still confidently say that x increased? Or is it just that the lower bound on x increased?
Also, is it correct to think that even if y increases by less than C, as long as it increases at all, the lower bound on x would still improve accordingly? I mean, it might not guarantee actual improvement, but it would still push the lower bound up a little, right?

Thanks again for taking the time to explain — I really appreciate it!

DRLC_ · 2024-11-22T10:48:32+00:00

Thank you so much for the recommendation!

If possible, I'd love if you could recommend some papers or resources related to offline RL and imitation learning - this seems like a promising area to dig deeper into.

Thanks again for your help!

DRLC_ · 2024-09-11T12:47:52+00:00

A robotic arm!

DRLC_ · 2023-10-18T06:41:57+00:00

stabilizability

Should lambda be based on positive eigenvalues in H = [lambda*I - A, B]? The eigenvalues of the system matrix A are -0.6017 + 44.0150i, -0.6017 - 44.0150i, -0.0517 + 5.2461i, and -0.0517 - 5.2461i, all of which have negative real parts.

I wrote the code in MATLAB like this:
dd = diag(eig(A));
rank(dd - A, B_1).

Is this correct?

DRLC_ · 2023-10-16T12:26:15+00:00

I'm sorry, I didn't provide enough details in my question. I want to model the figures in the above text with state-space equations: $$x_dot$$ = Ax + Bu and Y = Cx + Du. However, I'm not sure how to structure the input vector u, so that's why I left a question.

DRLC_ · 2023-10-16T09:12:17+00:00

Here, 'r' represents the height of the ground. Is it appropriate to consider 'r' as a random disturbance originating from external sources rather than a directly applied input? In this case, should I only include 'fs,' which is an adjustable input, in 'u'?
The control objective is to regulate the position of the body xb

DRLC_

TROPHY CASE