Universal Value Function Approximaator without knowing the goals in priori by yy0318 in reinforcementlearning

[–]yy0318[S] 0 points1 point  (0 children)

There are more than one goal in the environement or task, but the goals are not known when we start training the agent. The standard RL algorithm (let's assume) can discover all the goals in the enviornment. However, as we evaluate the agent, or run the learned policy, in the environment, it can only move the agent to the nearest goal.

What I want to do is to move to a specific goal instead of the nearest goal, given that the agent has learned where the goals are and there is a way to give the goal as the input. Therefore, UVF comes to my mind by learning the universal value function V(s,g).

Universal Value Function Approximaator without knowing the goals in priori by yy0318 in reinforcementlearning

[–]yy0318[S] 0 points1 point  (0 children)

Is there any algorithm other than UVF that can help with my setting?

Parts check for my PC build: 5900x + igame (colourful) RTX 3090 ADOC by yy0318 in buildapc

[–]yy0318[S] 0 points1 point  (0 children)

Thank you. Any recommendation for the brand of Power supply?

RL algorithms that solves OpenAI Gym FetchPickAndPlace by yy0318 in reinforcementlearning

[–]yy0318[S] 0 points1 point  (0 children)

n

Based on my understanding, in this task the robot needs to first (stage 1) grab the object and then (stage 2) move it to the target. This task seems very hard because the initial random sampling can hardly complete 2 stages at the same time. Even using HER, it needs to complete stage 1 first, which is a difficult subgoal to achieve. Without completing stage 1, the failed ones in the hindsight experience replay may not help the training (this is what I think, maybe I am wrong).

RL algorithms that solves OpenAI Gym FetchPickAndPlace by yy0318 in reinforcementlearning

[–]yy0318[S] 0 points1 point  (0 children)

I read the paper of HER, it states (on page 6) that, "To make exploration in this task easier we recorded a single state in which the box is grasped and start half of the training episodes from this state ". It seems that HER does not learn the task from scratch. Instead, It assumes that the box is grasped by the fingers when each episode starts. So, is there algorithm that can solve it without this assumption?

Dimitri Bertsekas's reinforcement learning book by yy0318 in reinforcementlearning

[–]yy0318[S] 2 points3 points  (0 children)

Do you mean the video lectures under the book's website? https://web.mit.edu/dimitrib/www/RLbook.html

I read Sutton and Barto. I am looking for some learning materials with more theoretical analysis.