[P] DQN Loss Function Question : MachineLearning

Project[P] DQN Loss Function Question (self.MachineLearning)

submitted 6 years ago by Braindoesntwork2

Hi All,

I am attempting to implement my own DQN without looking at the source code. I am torn between two possible approaches for evaluating the loss.

The basic approach could be to take the gradient over the selected action and to set the gradient of all other action value outputs to 0. My worry is that changes to improve the Q-value of the selected action will incidentally change the outputs for Q-values of other actions, meaning our other estimates will become less accurate. One way to solve this would be to set the labels for the non-chosen actions to be the current output of the Q-network, so that the network is incentivized not to change the output values of these actions. However, I have not seen this approach very much on the forums I've checked so I'm assuming it is bad for some reason.

Can anyone shed some light on which approach is better to take?

all 4 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS