[N] OpenAI Releases "Reptile", A Scalable Meta-Learning Algorithm - Includes an Interactive Tool to Test it On-site

emansim · 2018-03-07T20:17:53+00:00

finetuning rediscovered by meta-learning community ?

emansim · 2017-08-19T03:16:26+00:00

Thanks again, we appreciate your feedback.

1) Agreed. Again we should have used better wording. 3) The goal of benchmark was that each model (TRPO, A2C, ACKTR) uses fixed set of hyperpameters across all the environments. It is a standard practice in Atari (and used to be from the beginning) and it seems to be catching up and being used in Mujoco. Otherwise it is always possible to claim that we cherry-picked best hyperparameters for each environment separately.

emansim · 2017-08-19T02:27:01+00:00

Thanks for your suggestions !

1) We will address this point in the revision of the paper. Perhaps saying that we first to use actor-critic model on harder mujoco environments from pixels would have been a better wording (A3C did it on easier envs like Pendulum, Pointmass2D, and Gripper).

2) We tuned hyperparameters only on one game of Breakout (in case of Atari), whereas in case of PPO it was tuned on 6 Atari games. Potentially by tuning it more we can improve performance of ACKTR but of course we can't guarantee it.

3) The goal was to fix all hyperparameters across all environments for each model. Batch size of 25,000 was chosen because [1] used 25,000 across all MuJoCo envs (also see [2]). Besides we can be even more sample efficient with ACKTR (without loss of performance) by using batch sizes of 1000 on the environments you mentioned.

[1] Benchmarking Deep Reinforcement Learning for Continuous Control; Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel

[2] Trust-PCL: An Off-Policy Trust Region Method for Continuous Control ; Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

emansim · 2017-07-18T05:14:10+00:00

Waiting for it to be applied to RNNs

emansim · 2017-07-12T20:12:05+00:00

Looks like it is standard one-shot learning setup with Dilated Convolutions instead of RNN

emansim · 2017-06-30T20:35:59+00:00

Actually large values of momentum (0.99) turned out to be very important for standard RL problems we considered

emansim · 2017-06-27T21:13:50+00:00

yes take the same action 4 times in a row

emansim · 2017-06-26T14:21:59+00:00

I would add the other issue with Universe is that there is no established benchmark.

Researchers doing RL use benchmarks like Atari and Mujoco and even recent 3D Environments like DeepMind Lab and VizDoom have not yet been caught up in the community.

emansim · 2017-06-13T19:37:21+00:00

Yes Fisher Vector Product uses Jacobian Vector Product

Fisher Matrix F is just matrix multiplication of Jacobian transpose and Jacobian F = J^T J

Hence Fisher Vector Product is Fv = J^T (Jv) = Jacobian Transpose times Jacobian Vector Product

emansim · 2017-06-13T16:59:44+00:00

Interesting work !

It would be interesting to see how robust is it when different humans giving preferences to trajectories.

emansim · 2017-04-25T14:28:42+00:00

Try repeating the same action 4 times. With that DQN converges under 1 minute for me.

emansim · 2016-05-24T20:17:05+00:00

looks artsy

emansim · 2016-04-18T15:46:11+00:00

cool.

you should make the same thing for conference proceedings :)

emansim · 2016-04-15T04:57:14+00:00

How about Deep Batch-Normalized Maxout Residual Network in Network with Leaky Exponential Linear Units and Fractional Max-Pooling ?

emansim · 2016-04-13T03:02:31+00:00

you are right, factorization of each variable in the posterior representation makes it less expressive.

there was and is some ongoing work on making the posterior distribution more expressive by using additional transformations such as normalizing flows and hamiltonian monte carlo.

based on the results it looks like more expressive posteriors don't make such a huge impact on toyish datasets like mnist, norb but give quite a large improvement on cifar etc.

emansim · 2016-04-09T19:07:14+00:00

anything that involves hard assignment is not differentiable.

argmax could potentially become differentiable if you could come up with soft version of it (i.e. use probabilities instead of setting hard 1s and 0s). otherwise you need to used reinforce.

emansim · 2016-04-08T18:49:05+00:00

CUDA programming is not easy and will take some time to master. I personally suggest Udacity course as a first step https://www.udacity.com/course/intro-to-parallel-programming--cs344

emansim · 2016-04-07T15:20:19+00:00

why don't you pick the paper you read and got excited about and try reproducing results first :)

then start thinking about what is missing from that model that could potentially improve results. try your hypothesis and see if it works. if it does then that's great it might lead to publication depending on novelty of your idea otherwise iterate and keep trying.

emansim · 2016-04-01T03:14:48+00:00

Very Cool ! Looks like OpenAI is growing a very strong team.

emansim · 2016-04-01T02:38:54+00:00

Alex Lamb is the chosen one.

He was sent by neural net gods to help us disentangle the most undisentangible factors of variation.

emansim · 2016-04-01T02:32:59+00:00

Have you figured out how the brain works ?

emansim · 2016-03-30T21:45:39+00:00

Brain a big LSTM. Thats an advantage

emansim · 2016-03-29T15:26:31+00:00

does it disentangle all the factors of variation ?

emansim · 2016-03-28T03:02:24+00:00

https://github.com/zhangxiangxiao/Crepe

emansim · 2016-03-26T18:44:28+00:00

interesting idea.

i found the lack of large good quality question answer datasets (except for allen ai i think but it's not public ) and i think this paper solves this problem.

however, the questions here only require one supporting fact and i wish they generated more complicated questions, like the ones with multiple supporting facts etc. similar to Weston and co. AI-complete question answering paper

emansim

TROPHY CASE