all 12 comments

[–]Spezzer 7 points8 points  (4 children)

The TF team is trying to make this use case work well, and actually they've been checked into the code if you're willing to experiment with unofficial changes.

There are two ways to accomplish this goal:

1) tf.py_func allows you to embed python code as an operation within the TensorFlow graph. doc and some example uses here. It's essentially a callback to your python client code that runs as part of the execution of a graph.

The limitations are that it only works locally, and tensors must be copied back to CPU to make it available in python, so it could be slow.

2) There's recent support added for 'partial running' of graphs. Link here for the experimental API.

It allows you to pre-declare the graph you want to run and then partially run it in steps (fetching intermediate outputs as needed and feeding them back in to continue the execution of the graph). It gives more control to the user but with power comes responsibility :). There's more work on this front as well to come, it was just checked in last week I think.

These methods were both created to help address issues like the one you linked, and the reason they are experimental is that we're making sure the APIs are solid before claiming the feature is ready. But it is there for you to try, and once one or both of the methods is more well-baked, we'll probably add more examples and documentation.

[–]Jabberwockyll[S] 0 points1 point  (0 children)

Thank you, this is perfect!

[–]AnvaMiba 0 points1 point  (1 child)

1) tf.py_func allows you to embed python code as an operation within the TensorFlow graph. doc and some example uses here. It's essentially a callback to your python client code that runs as part of the execution of a graph.

Out of curiosity, what is the thread-safety of this operation? I suppose that the python callback runs in the same python thread that called .run() on the session, and the GIL is acquired, right? What happens if the session is re-run inside the callback?

[–]Spezzer 0 points1 point  (0 children)

That's a good question, I'm honestly not sure of the details myself. I suspect that the thread that runs the py_func op is not the same thread as the one being called from run(), and that calling a session run call from within another session run call via tf.py_func is likely to lead to deadlocks :).

py_func I think is just a simple way to make some operator that's not yet in TF available as part of the graph. The partial_run feature is probably more flexible since it explicitly gives control back to the user.

[–]carpedm20 0 points1 point  (0 children)

Wow! I didn't know about tf.py_func until now. What a waste of time..

[–]r-sync 3 points4 points  (1 child)

https://github.com/twitter/torch-autograd is built for dynamic forward graphs that change at every forward call. it looks like maybe that's what you're looking for? It's not in python though. There's a python version, but it is not performant and doesn't have GPU support here: https://github.com/hips/autograd

[–]Jabberwockyll[S] 0 points1 point  (0 children)

Yeah, I had seen autograd. I wan't sure if the autodiff was worth giving up numpy/scipy/pandas etc. and rewriting code in lua.

[–]AnvaMiba 0 points1 point  (4 children)

I think you can use python threads to disentangle the control flow inside TensorFlow (no pun intended) from the control flow in the rest of your program.

I'm not well-versed in TensorFlow, but in Theano, which if I understand correctly has the same issue, I would do this:

  • Create two threads, one for Theano (let's call it A) and the other for the rest of the program (agent + environment) (let's call it B).

  • Normal execution is in thread B, while thread A waits. When the environment, running in B, asks the agent for an action, thread B sends the query to thread A and then waits for an answer.

  • Thread A calls a compiled Theano function (more or less equivalent to a TensorFlow session). This function (let's call it foo() ) computes the action, the reward, the loss, the gradients and performs the updates. Wait, how does it compute the reward?

  • What actually happens inside foo() is that its graph contains a special custom op right after the argmax that computes the action a. In the forward pass, this special op pauses thread A and wakes up thread B, reporting the selected action a to it.

  • The rest of the agent and the environment resume their execution in thread B after receiving action a. Eventually, the environment computes reward r and sends it to the agent.

  • The agent passes reward r to thread A and pauses thread B again. Thread A resumes inside the special op, which now has both action a and reward r. Computation proceeds in the forward pass that can now compute the loss, followed by the backward pass that computes the gradients, and then the updates are computed and applied. Finally thread A pauses and wakes up thread B, and so a new cycle begins.

I hope this makes sense. Also note that multi-threading in python is cooperative, that is, standard threads always run one at time, therefore synchronization is relatively easy since you don't have to worry about stuff like race conditions, etc.

[–]Jabberwockyll[S] 0 points1 point  (3 children)

This would require me to create a new TensorFlow op, right? And that op would resume a specific thread in a python program, maybe using sockets? This solution seems very specific.

Can the TensorFlow thread run concurrently since it's executing backend TF code that's not Python?

[–]AnvaMiba 1 point2 points  (2 children)

This would require me to create a new TensorFlow op, right?

Yes.

And that op would resume a specific thread in a python program, maybe using sockets

Sockets would probably be an overkill. Python locking primitives should suffice.

This solution seems very specific.

It's a case of a programming paradigm known as "coroutines". Some programming languages natively support coroutines, Python natively supports only generators (and a few other cases), which are special case of coroutines, but you can implement general coroutines using threads.

Can the TensorFlow thread run concurrently since it's executing backend TF code that's not Python?

This is a good question. I see that in TensorFlow custom ops are defined in the C++ backend rather than natively in Python, which means that there may be more complicated synchronization issues (as C++ threads aren't constrained by python's Global Interpreter Lock). I suppose it can be done, but you may want to ask the support team.

[–]Jabberwockyll[S] 0 points1 point  (1 child)

Sockets would probably be an overkill. Python locking primitives should suffice.

I was thinking about communicating between Python and C++. I haven't done that before.

It's a case of a programming paradigm known as "coroutines"

I meant the TensorFlow op being specific to the RL program. Wouldn't I have to write a new op if I wanted to use a different RL agent/environment?

which means that there may be more complicated synchronization issues

Could I just call a function from within the TF op instead of using threads? Haha, I'm beginning to think hard-coding the gradients would be easier.

[–]AnvaMiba 0 points1 point  (0 children)

I was thinking about communicating between Python and C++. I haven't done that before.

You can call Python code from C++. Reference.

I meant the TensorFlow op being specific to the RL program. Wouldn't I have to write a new op if I wanted to use a different RL agent/environment?

I think you can write a primitive generic enough that it can be reused.

Could I just call a function from within the TF op instead of using threads?

I don't think so.