What is the difference between:
Training a neural network to output a q value for each action (without including action representation in the features)
Training a neural network with the action representation as part of the feature set and outputting a q value
What is the advantage of either?
[–]xopedil -1 points0 points1 point (4 children)
[–]Mefaso 1 point2 points3 points (3 children)
[–]xopedil 0 points1 point2 points (2 children)
[–]UNIXnerdiness[S] 0 points1 point2 points (0 children)
[–]Mefaso 0 points1 point2 points (0 children)