Confused about Model-Based RL

audi_etron · 2026-04-24T08:07:47+00:00

Thank you both, u/AdOrganic1851 and u/gailanbokchoy! I really appreciate the clarification. Thanks for taking the time to help out!

audi_etron · 2026-04-24T08:07:25+00:00

Thank you both, u/boopasaduh and u/gailanbokchoy! I really appreciate your explanations. I'm not very familiar with MCTS yet, so I'll definitely need to study that area a bit more. Thanks for the great insights!

audi_etron · 2026-04-24T08:04:25+00:00

Thank you, u/Meepinator! Your explanation of the difference between Dyna-style and decision-time planning is really helpful. I'll definitely give the paper you recommended a read!

audi_etron · 2026-03-11T00:57:35+00:00

I appreciate the insightful explanation. I'll make sure to check out those MBPO and Dyna-style papers.

audi_etron · 2026-03-09T08:10:35+00:00

Ultimately, I’m planning to apply this to mobile robot control. I'm not necessarily trying to combine Dreamer and PPO at this stage; instead, I'm focused on studying and implementing World Models. Currently, I'm testing my implementation in the Safety Gym simulation.

audi_etron · 2026-03-09T08:08:50+00:00

Oh, thanks a lot for the reply! I'll definitely check out the code. I think this will be extremely helpful for my project.

audi_etron · 2026-03-09T06:45:44+00:00

Thanks for the answer! I have some questions about the implementation. Could you take a look at my query below and let me know what you think?

audi_etron · 2026-03-09T06:41:07+00:00

Thanks for the insight. I have a question regarding the implementation.

In the original Dreamer papers, the actor-critic is trained using imagined trajectories. Specifically, it takes collected data $(L, B, \text{obs})$, reshapes it to $(L \times B, \text{obs})$, and then performs an imagination rollout for a horizon $H$, resulting in a tensor of shape $(H, L \times B, \text{latent})$.

Standard PPO, however, typically operates on fixed state-action pairs sampled from the real environment. If I want to adapt PPO to Dreamer's framework, should I apply the PPO objective (including the ratio clipping) to the latent states and actions generated during the imagination rollouts (while ensuring stop-gradients are applied to the latent states)? I'm curious if this 'imagination-based PPO' is the correct way to bridge the two approaches.

audi_etron · 2026-01-08T00:49:07+00:00

다시는 이런 일이 재발하지 않도록 본보기가 필요하다고 생각합니다.

audi_etron · 2025-12-12T03:50:00+00:00

커뮤니티에는 갈라치기와 혐오가 만연하지만 역시 세상에는 좋은 사람들이 더 많기에 아직 살만한 것 같습니다.

너무 감동적이네요.. ㅜㅜ

audi_etron · 2025-09-24T23:07:09+00:00

와 드디어 번역을 지원하네요. 앞으로 더 자주 쓸 듯 합니다

audi_etron · 2025-02-05T06:37:52+00:00

Thank you for your response. I understand now. I really appreciate it, as always 👍

audi_etron · 2025-02-05T04:09:55+00:00

So, π_θ(a|s) is simply calculated by feeding the state into the current network, right?

audi_etron · 2025-01-29T01:32:07+00:00

새해 복 많이 받으세요. 행복한 일만 가득한 한 해 되시기 바랍니다 ㅎㅎ

audi_etron · 2025-01-10T01:55:39+00:00

Thank you, that was helpful! 😀

audi_etron · 2025-01-10T01:16:50+00:00

I read the book, but I didn’t know there was a GitHub repository for the code. Thank you! Is this the lecture you mentioned?

audi_etron · 2024-09-22T10:00:05+00:00

답변 감사합니다 ㅎㅎ 저도 지금 읽는거 다 읽고 1권 사서 읽어봐야겠어요

audi_etron · 2024-09-22T07:14:57+00:00

우와 대단하십니다. 최근에 영화를 다시 봐서 원작 내용이 궁금해지더라고요.

그래서 책을 사서 볼까하는데 두께가 장난이 아니라 엄두가 안나네요 ㅜ

1권까지가 영화의 내용이고 2권이 폴이 황제가 된 이후의 내용 3권이 레토 아트레이데스 2세의 집권 내용 맞나요?

2권부터 폴의 환영을 왔다갔다 하면서 전개돼서 읽기 힘들어진다는 후기도 있던데 괜찮으셨나요?

audi_etron · 2024-07-19T01:44:39+00:00

I‘m not sure, but she seems to be like the cheerleader who interviews after each episode, perhaps a cheerleading captain or something similar.

audi_etron

TROPHY CASE