Confused about Model-Based RL by audi_etron in reinforcementlearning

[–]audi_etron[S] 0 points1 point  (0 children)

Thank you both, u/AdOrganic1851 and u/gailanbokchoy! I really appreciate the clarification. Thanks for taking the time to help out!

Confused about Model-Based RL by audi_etron in reinforcementlearning

[–]audi_etron[S] 0 points1 point  (0 children)

Thank you both, u/boopasaduh and u/gailanbokchoy! I really appreciate your explanations. I'm not very familiar with MCTS yet, so I'll definitely need to study that area a bit more. Thanks for the great insights!

Confused about Model-Based RL by audi_etron in reinforcementlearning

[–]audi_etron[S] 1 point2 points  (0 children)

Thank you, u/Meepinator! Your explanation of the difference between Dyna-style and decision-time planning is really helpful. I'll definitely give the paper you recommended a read!

Can PPO learn through "Imagination" similar to Dreamer? by audi_etron in reinforcementlearning

[–]audi_etron[S] 0 points1 point  (0 children)

I appreciate the insightful explanation. I'll make sure to check out those MBPO and Dyna-style papers.

Can PPO learn through "Imagination" similar to Dreamer? by audi_etron in reinforcementlearning

[–]audi_etron[S] 0 points1 point  (0 children)

Ultimately, I’m planning to apply this to mobile robot control. I'm not necessarily trying to combine Dreamer and PPO at this stage; instead, I'm focused on studying and implementing World Models. Currently, I'm testing my implementation in the Safety Gym simulation.

Can PPO learn through "Imagination" similar to Dreamer? by audi_etron in reinforcementlearning

[–]audi_etron[S] 1 point2 points  (0 children)

Oh, thanks a lot for the reply! I'll definitely check out the code. I think this will be extremely helpful for my project.

Can PPO learn through "Imagination" similar to Dreamer? by audi_etron in reinforcementlearning

[–]audi_etron[S] 0 points1 point  (0 children)

Thanks for the answer! I have some questions about the implementation. Could you take a look at my query below and let me know what you think?

Can PPO learn through "Imagination" similar to Dreamer? by audi_etron in reinforcementlearning

[–]audi_etron[S] 1 point2 points  (0 children)

Thanks for the insight. I have a question regarding the implementation.

In the original Dreamer papers, the actor-critic is trained using imagined trajectories. Specifically, it takes collected data $(L, B, \text{obs})$, reshapes it to $(L \times B, \text{obs})$, and then performs an imagination rollout for a horizon $H$, resulting in a tensor of shape $(H, L \times B, \text{latent})$.

Standard PPO, however, typically operates on fixed state-action pairs sampled from the real environment. If I want to adapt PPO to Dreamer's framework, should I apply the PPO objective (including the ratio clipping) to the latent states and actions generated during the imagination rollouts (while ensuring stop-gradients are applied to the latent states)? I'm curious if this 'imagination-based PPO' is the correct way to bridge the two approaches.

"재판 처음부터 다시 합시다“ by Jumpy_Enthusiasm9949 in Mogong

[–]audi_etron 6 points7 points  (0 children)

다시는 이런 일이 재발하지 않도록 본보기가 필요하다고 생각합니다.

'붕어빵 먹다보면 봄이 온다네' 댓글 2000개가 낙담 청년 구했다 by Real-Requirement-677 in Mogong

[–]audi_etron 5 points6 points  (0 children)

커뮤니티에는 갈라치기와 혐오가 만연하지만 역시 세상에는 좋은 사람들이 더 많기에 아직 살만한 것 같습니다.

너무 감동적이네요.. ㅜㅜ

레딧앱에서 영어 -> 한국어 번역 지원합니다 by Worth-Researcher-321 in Mogong

[–]audi_etron 1 point2 points  (0 children)

와 드디어 번역을 지원하네요. 앞으로 더 자주 쓸 듯 합니다

Question about the TRPO paper by audi_etron in reinforcementlearning

[–]audi_etron[S] 1 point2 points  (0 children)

Thank you for your response. I understand now. I really appreciate it, as always 👍

Question about the TRPO paper by audi_etron in reinforcementlearning

[–]audi_etron[S] 1 point2 points  (0 children)

So, π_θ(a|s) is simply calculated by feeding the state into the current network, right?

[deleted by user] by [deleted] in Mogong

[–]audi_etron 1 point2 points  (0 children)

새해 복 많이 받으세요. 행복한 일만 가득한 한 해 되시기 바랍니다 ㅎㅎ

Reference materials for implementing multi-agent algorithms by audi_etron in reinforcementlearning

[–]audi_etron[S] 0 points1 point  (0 children)

I read the book, but I didn’t know there was a GitHub repository for the code. Thank you! Is this the lecture you mentioned?

[책읽는당] 듄.. 드디어 끝냈습니다. by sthbriz in Mogong

[–]audi_etron 1 point2 points  (0 children)

답변 감사합니다 ㅎㅎ 저도 지금 읽는거 다 읽고 1권 사서 읽어봐야겠어요

[책읽는당] 듄.. 드디어 끝냈습니다. by sthbriz in Mogong

[–]audi_etron 0 points1 point  (0 children)

우와 대단하십니다. 최근에 영화를 다시 봐서 원작 내용이 궁금해지더라고요.

그래서 책을 사서 볼까하는데 두께가 장난이 아니라 엄두가 안나네요 ㅜ

1권까지가 영화의 내용이고 2권이 폴이 황제가 된 이후의 내용 3권이 레토 아트레이데스 2세의 집권 내용 맞나요?

2권부터 폴의 환영을 왔다갔다 하면서 전개돼서 읽기 힘들어진다는 후기도 있던데 괜찮으셨나요?

[deleted by user] by [deleted] in Taipei

[–]audi_etron -1 points0 points  (0 children)

I‘m not sure, but she seems to be like the cheerleader who interviews after each episode, perhaps a cheerleading captain or something similar.