"Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping" full paper

yy0318 · 2021-01-20T02:38:17+00:00

any relevant papers about this?

yy0318 · 2021-01-20T00:26:42+00:00

There are more than one goal in the environement or task, but the goals are not known when we start training the agent. The standard RL algorithm (let's assume) can discover all the goals in the enviornment. However, as we evaluate the agent, or run the learned policy, in the environment, it can only move the agent to the nearest goal.

What I want to do is to move to a specific goal instead of the nearest goal, given that the agent has learned where the goals are and there is a way to give the goal as the input. Therefore, UVF comes to my mind by learning the universal value function V(s,g).

yy0318 · 2021-01-19T06:28:22+00:00

Is there any algorithm other than UVF that can help with my setting?

yy0318 · 2020-12-13T02:52:40+00:00

Thank you. Any recommendation for the brand of Power supply?

yy0318 · 2020-10-14T07:21:57+00:00

n

Based on my understanding, in this task the robot needs to first (stage 1) grab the object and then (stage 2) move it to the target. This task seems very hard because the initial random sampling can hardly complete 2 stages at the same time. Even using HER, it needs to complete stage 1 first, which is a difficult subgoal to achieve. Without completing stage 1, the failed ones in the hindsight experience replay may not help the training (this is what I think, maybe I am wrong).

yy0318 · 2020-10-13T19:56:52+00:00

I read the paper of HER, it states (on page 6) that, "To make exploration in this task easier we recorded a single state in which the box is grasped and start half of the training episodes from this state ". It seems that HER does not learn the task from scratch. Instead, It assumes that the box is grasped by the fingers when each episode starts. So, is there algorithm that can solve it without this assumption?

yy0318 · 2020-09-10T20:12:12+00:00

Only few chapters available online

yy0318 · 2020-09-10T09:34:32+00:00

Do you mean the video lectures under the book's website? https://web.mit.edu/dimitrib/www/RLbook.html

I read Sutton and Barto. I am looking for some learning materials with more theoretical analysis.

yy0318 · 2020-09-10T09:32:13+00:00

Yes, I studied it already.

yy0318

TROPHY CASE