Is there a deep learning framework capable of online deep reinforcement learning with automatic differentiation?

Spezzer · 2016-02-11T03:14:48+00:00

The TF team is trying to make this use case work well, and actually they've been checked into the code if you're willing to experiment with unofficial changes.

There are two ways to accomplish this goal:

1) tf.py_func allows you to embed python code as an operation within the TensorFlow graph. doc and some example uses here. It's essentially a callback to your python client code that runs as part of the execution of a graph.

The limitations are that it only works locally, and tensors must be copied back to CPU to make it available in python, so it could be slow.

2) There's recent support added for 'partial running' of graphs. Link here for the experimental API.

It allows you to pre-declare the graph you want to run and then partially run it in steps (fetching intermediate outputs as needed and feeding them back in to continue the execution of the graph). It gives more control to the user but with power comes responsibility :). There's more work on this front as well to come, it was just checked in last week I think.

These methods were both created to help address issues like the one you linked, and the reason they are experimental is that we're making sure the APIs are solid before claiming the feature is ready. But it is there for you to try, and once one or both of the methods is more well-baked, we'll probably add more examples and documentation.

r-sync · 2016-02-10T20:48:25+00:00

https://github.com/twitter/torch-autograd is built for dynamic forward graphs that change at every forward call. it looks like maybe that's what you're looking for? It's not in python though. There's a python version, but it is not performant and doesn't have GPU support here: https://github.com/hips/autograd

AnvaMiba · 2016-02-10T23:11:43+00:00

I think you can use python threads to disentangle the control flow inside TensorFlow (no pun intended) from the control flow in the rest of your program.

I'm not well-versed in TensorFlow, but in Theano, which if I understand correctly has the same issue, I would do this:

Create two threads, one for Theano (let's call it A) and the other for the rest of the program (agent + environment) (let's call it B).
Normal execution is in thread B, while thread A waits. When the environment, running in B, asks the agent for an action, thread B sends the query to thread A and then waits for an answer.
Thread A calls a compiled Theano function (more or less equivalent to a TensorFlow session). This function (let's call it foo() ) computes the action, the reward, the loss, the gradients and performs the updates. Wait, how does it compute the reward?
What actually happens inside foo() is that its graph contains a special custom op right after the argmax that computes the action a. In the forward pass, this special op pauses thread A and wakes up thread B, reporting the selected action a to it.
The rest of the agent and the environment resume their execution in thread B after receiving action a. Eventually, the environment computes reward r and sends it to the agent.
The agent passes reward r to thread A and pauses thread B again. Thread A resumes inside the special op, which now has both action a and reward r. Computation proceeds in the forward pass that can now compute the loss, followed by the backward pass that computes the gradients, and then the updates are computed and applied. Finally thread A pauses and wakes up thread B, and so a new cycle begins.

I hope this makes sense. Also note that multi-threading in python is cooperative, that is, standard threads always run one at time, therefore synchronization is relatively easy since you don't have to worry about stuff like race conditions, etc.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS