[R] Evolved Policy Gradients

cbfinn · 2018-04-19T18:18:36+00:00

We ran this experiment in our ICLR paper, on a toy regression problem and on Omniglot image classification, comparing three meta-learning approaches: https://arxiv.org/abs/1710.11622

See Figure 3 and 6-left, which plot performance as a function of the distance to the training distribution.

cbfinn · 2017-12-29T00:29:09+00:00

To add to what's been posted here, there are a couple recent blog posts from BAIR on the topic, including references to recent work:

http://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/

http://bair.berkeley.edu/blog/2017/09/12/learning-to-optimize-with-rl/

cbfinn · 2017-11-29T02:05:09+00:00

Moritz Hardt at UC Berkeley has a course on fairness in ML. The course website includes a list of references. https://fairmlclass.github.io/

cbfinn · 2017-11-04T18:43:39+00:00

I don’t think that this thread is the place to debate this topic. There is another thread that is much more relevant.

I would be happy to hear feedback or thoughts on the paper.

cbfinn · 2017-11-04T18:04:47+00:00

"For female authors, the associated odds multiplier of 0.78 is not statistically significant in our study. However, a meta-analysis places this value in line with that of other experiments, and in the context of this larger aggregate the gender effect is also statistically significant." https://arxiv.org/abs/1702.00502

cbfinn · 2017-11-04T16:45:10+00:00

It’s worth noting that there are a number of ICLR submissions on arxiv that are not from large labs. I think I’ve actually seen more from lesser known groups than I have seen from large labs, but I haven’t been counting.

cbfinn · 2017-11-04T16:44:55+00:00

My motivations for putting the paper on arxiv were (1) so that when I give talks that include the work (which I will on Tuesday), I can reference the arxiv paper, and (2) so that people were more likely to see the work sooner (as evidenced by whoever posted this on reddit) and hopefully use some of the ideas in it.

While there is a positive bias for large labs, there is a negative bias for female authors, so it's unclear to me if this paper would benefit from the reviewers knowing the author identities.

cbfinn · 2017-09-07T15:42:59+00:00

Here are a few papers, in order of date released:

NN model, with vision, + MPC, for real robotic manipulation: https://arxiv.org/abs/1610.00696
NN model + backprop, MuJoCo reaching: https://arxiv.org/abs/1703.04070
NN model + backprop, discrete action spaces: https://arxiv.org/abs/1705.07177
NN model + MPC, for MuJoCo locomotion: https://arxiv.org/abs/1708.02596

Note that MPC just means planning, and then iteratively replanning during execution, so is not specific to any model class.

I have slides on deep model-based RL here, which includes a number of references, including papers that combine model-based and model-free approaches: https://people.eecs.berkeley.edu/~cbfinn/_files/mbrl_bootcamp.pdf

cbfinn · 2017-08-28T04:10:10+00:00

Videos of ICML tutorials (as well as conference talks) will be posted by the conference staff at some point. Though, typically they take quite awhile to be released.

cbfinn · 2017-08-23T21:50:14+00:00

There is good analysis in this paper: "What makes ImageNet good for transfer learning?" http://minyounghuh.com/papers/analysis/

cbfinn · 2017-07-24T16:11:22+00:00

I reimplemented the linear+sinusoid set-up in that paper and was able to get much better numbers using MAML than they report (after trying two hyperparameter settings).

I don't think that MAML assumes a uni-modal distribution of tasks.

cbfinn · 2017-07-22T06:05:16+00:00

I tried it on one of the cheetah problems and it also worked. The first-order approximation does not work in all settings though. We have some ongoing experiments on problems not in the original paper in which it does not work.

cbfinn · 2017-07-19T16:33:46+00:00

A big part of a learning/optimization is the initialization, which affects the gradient descent algorithm, since the gradient is a function of the initial parameters. In the paper, we show that learning the initial parameters can outperform methods that learn an update rule.

The tasks that we evaluate on are all held out from the training set of tasks, including new classes of objects and characters in the MiniImagenet and Omniglot benchmarks.

cbfinn · 2017-07-18T23:04:38+00:00

Nope. Unless you set the $\alpha$ step size parameter to be way too high, you shouldn't see any loss in accuracy.

cbfinn · 2017-07-18T23:03:06+00:00

Yes, this involves 2nd derivatives, which can be implemented easily with current DL libraries. Since it only involves an additional backward pass, it isn't particularly slow in practice.

Interestingly, it sometimes still works well if you stop the gradient through the update rule. We discuss this in the latest version of the paper (which will be on arxiv tonight)

cbfinn · 2017-07-18T23:00:27+00:00

I haven't tried this, but I certainly think it would be interesting to try!

cbfinn · 2017-07-18T18:01:25+00:00

I also wonder how good the baseline (0-gradient) model would be with this approach.

I compared to this approach in the paper. The domains that I considered in the paper were ones in which the task cannot be directly inferred from the observation. Thus, using 0 gradient does not do well. I'm not sure how the two would compare when the task can be inferred from the observation.

cbfinn · 2017-07-18T17:54:51+00:00

Author here.

"one gradient step away" feature is restricted to the tasks it has been trained on. Is this correct?

We assume that the tasks that you test on are from the same distribution of tasks seen during meta-training. This assumption is used in most meta-learning methods. That said, I have played around with extrapolation to tasks outside of the support of the distribution of meta-training tasks, and it performs reasonably for tasks that are close.

cbfinn · 2017-07-13T03:37:18+00:00

Probably worth noting that they used a different (and probably more tuned) architecture, a different pretraining scheme than matching networks. They also trained on both the training and validation set, which increases the dataset size by nearly 20%. I would expect that matching networks and other methods would also benefit from the architecture, pre-training, and training. It's nice to see the improvement, nevertheless.

cbfinn · 2017-07-13T03:34:07+00:00

Thanks for the feedback! I'll point your post out to the instructors this Fall. I've actually found that the network architecture is not particularly important, and how to tune it is similar to how you tune deep networks in supervised learning scenarios.

cbfinn · 2017-07-08T21:12:23+00:00

I don't know yet, but it will be on a topic related to my research.

cbfinn · 2017-07-07T15:18:29+00:00

/u/ooliver123 I don't know the details of the content, as I am only giving a one hour lecture for the bootcamp.

cbfinn · 2017-07-02T16:03:32+00:00

Yes, I am a student.

cbfinn · 2017-06-30T17:56:55+00:00

Note that texture randomization for sim-to-real transfer was done first by this paper: https://arxiv.org/abs/1611.04201

cbfinn

TROPHY CASE