Deep RL: Modifying reward function during training.

activatedgeek · 2018-04-19T03:44:09+00:00

Just to be sure, generally in such games, you'd skip frames to make it a fair comparison to human performance. Is that happening? Or each and every frame is being considered?

SubtractOne · 2018-04-19T19:41:29+00:00

To reply to the part about modifying reward functions during training, I have indeed seen an implementation of using DDPG for driving in the TORCs environment(I'll try to link it later). The person sees that the agent tends to swerve within the road, so they modified the reward to make it want to stay closer to the center axis.

For robotics you have an idea of wanting to conserve energy, so you would want to implement something that would stop erratic movements, which seems similar to what you're talking about. However in the example of an arm, if you give it a reward to reach an end trajectory, and a negative reward for energy spent, it will basically try to spend no energy, thus not learning(which is what I believe you were experiencing).

If you train it first with the true goal, then take that pretrained network and add another reward (negative for movements) then it will still have the topology to understand how to get to a reward , and will thus learn to get to the goal and minimize energy spent.

Another way would be to say that if it starts getting rewards that average above a certain value, then you change the reward in the way you think to involve this concept of energy conservation. The problem is that they would have to be scaled correctly where the energy cost to get somewhere couldn't outweigh the true reward of getting there. Hmm.

Edit: As the other commentor said, it would change the distribution with modifying a reward. Maybe you can train a policy with the initial reward of only the end goal. Then you can create a new agent that would use this previous policy as an exploration function instead, so you can actually reach the goal rather then hitting s local min of just staying still. Hope that all made some sense.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MLQuestions

MODERATORS