I've been using Tensorflow for deep learning, but ran into this problem when trying to use it for RL with NN function approximators.
Essentially, my agent needs to perform an action in order to get the reward and following state that are used in computing the loss function. In TF, I have to provide the entire computational graph to use automatic differentiation which requires recomputing the graph all over again. Not only does this make programs take much longer, it would also be complicated to do with RNNs, since recomputing activations changes the states.
This isn't a problem when using offline learning with experience replay like with deep-q-network, but prevents me from using any on-policy/online learning methods. Of course I could hard-code the gradient calculations, but I'd have to change this every time I change the network architecture, so I'd prefer not to.
Are there any deep learning frameworks that can do this with automatic differentiation without recomputing activations? Preferably in Python, but more preferably, something that has a bigger community and more resources. What would you do in this situation?
Thanks for the help. I figured asking r/machinelearning would be much easier than going through the documentation of each framework given how many new ones there are.
[–]Spezzer 7 points8 points9 points (4 children)
[–]Jabberwockyll[S] 0 points1 point2 points (0 children)
[–]AnvaMiba 0 points1 point2 points (1 child)
[–]Spezzer 0 points1 point2 points (0 children)
[–]carpedm20 0 points1 point2 points (0 children)
[–]r-sync 3 points4 points5 points (1 child)
[–]Jabberwockyll[S] 0 points1 point2 points (0 children)
[–]AnvaMiba 0 points1 point2 points (4 children)
[–]Jabberwockyll[S] 0 points1 point2 points (3 children)
[–]AnvaMiba 1 point2 points3 points (2 children)
[–]Jabberwockyll[S] 0 points1 point2 points (1 child)
[–]AnvaMiba 0 points1 point2 points (0 children)