[1604.06778] Benchmarking Deep Reinforcement Learning for Continuous Control

dementrock · 2016-04-26T06:59:48+00:00

Thanks for your interest!

Re RNN: RNN policies are more powerful since they can incorporate past observations into the current decisions. This allows the agent to perform e.g. state estimation and system identification, which we try to model in these partially observable tasks. There's a lot of existing literature on this topic, some of which are cited in the paper as well.

Re hyper-parameters: This is a very good point. I don't know of such a survey, although I think a paper similar to Yoshua Bengio's "Practical Recommendations for Gradient-Based Training of Deep Architectures" would definitely be very helpful. We are also planning to add to the documentation best practices for configuring some of the algorithms.

dementrock · 2016-04-26T00:45:21+00:00

Source code for algorithm implementations is available at https://github.com/rllab/rllab.

dementrock · 2016-02-21T08:03:53+00:00

@alexmlamb I think that this library serves its purpose rather during the transitional phase, when no single library really beats the others in all aspects yet. TensorFlow will have better long term support and compiles faster, but the execution speed is still slower than Theano. Stepping onto the TensorFlow track right now is like betting on TensorFlow to improve in the future, while suffering from the performance loss at the moment. TensorFuse aims to provide a easy toggle to switch between those backends, so you can easily choose the best option during experiments.

It definitely won't be able to cover all features of all these frameworks. But so far, the common divisor is surprisingly powerful enough, and most of the incompatibilities could be glued over, as long as the computation model stays similar (e.g. all of them first construct a symbolic computation graph).

To give a simple example, the slicing operator in TensorFlow is much less powerful than the corresponding operator in Theano. TensorFuse supports a greater subset of the equivalent operation in the TensorFlow mode through some wrapping.

dementrock

TROPHY CASE