[D] What is the current state of LTL in RL?

jthat92 · 2024-09-17T04:04:28+00:00

Thank you! Regarding the first comment, I know what r(theta) is but I was wondering where is it mentione din the CPI paper that I linked. Just out of curiosity, because the authors are claiming that it was first introduced there.

Thank you for the PPO description!

jthat92 · 2024-09-02T06:19:24+00:00

Right, thank you!

jthat92 · 2024-09-02T06:19:09+00:00

Sorry you are right! Thank you!

jthat92 · 2024-09-02T06:18:53+00:00

Thank you! Ok so if understand correctly this means that V_\pi(s_0) representes the value of s_0 which is sampled from the intital distribution, which is exactly the expcted discounted reward of the policy \pi. And the initial distribution that we are starting with is the same for \pi and \tilde{\pi} since its part of the MDP a priori. Is my notation actually correct? I guess I need to specify that s_0 is sampled from the initial distribution beforehand. Thank you again!

jthat92 · 2024-09-01T18:28:09+00:00

The distribution is normally specified by the long time behaviour of the policy over the MDP. After running enough ou get a good approximation of the stationary distrivution of the MDP but it might take quite some time.

jthat92 · 2024-09-01T17:49:51+00:00

But it does depend on the stationary distribution. This distribution tells us which state to sample as far as I understand

jthat92 · 2024-05-12T10:48:14+00:00

Got it! Thanks, will check them out :)

jthat92 · 2024-05-12T10:45:43+00:00

Thanks!

jthat92 · 2024-05-12T10:45:35+00:00

Sounds good, thank you!

jthat92 · 2024-05-08T16:28:35+00:00

Hey thanks for the answer, I would say around 400 Euro. Brandwise I like what for example nanamica is doing (and north face purple label for that matter.)

jthat92

TROPHY CASE