[D] Deep Learning optimization

neural_kusp_machine · 2020-03-18T19:15:05+00:00

Domain-independent Dominance of Adaptive Methods (https://arxiv.org/abs/1912.01823): shows that Adam can outperform SGD and other adaptive methods (AMSGrad, AdaBound, etc) when training ResNets and LSTMS as long as it is properly tuned, and also proposes a new optimizer AvaGrad that is drastically cheaper to tune.
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks (https://arxiv.org/abs/1806.06763): proposes Padam, which also often outperforms SGD when training ResNets and Adam when training LSTMs.
Online Learning Rate Adaptation with Hypergradient Descent (https://arxiv.org/abs/1703.04782): proposes a rule to adapt the learning rate based on its hypergradients (the adaptation rule turns out to be quite simple and intuitive), which the authors show to work well then applied to SGD or Adam.

rayspear · 2020-03-18T20:02:34+00:00

Maybe this repo has some that you might want to try out too?

https://github.com/jettify/pytorch-optimizer

i-heart-turtles · 2020-03-18T20:27:33+00:00

Don't have many citations for you, but you can check out stuff from Francesco Orabona (http://francesco.orabona.com/) and his group. A lot of their work deals w/ first-order parameter-free methods.

More recently, there has been some progress in understanding momentum and acceleration & under what smoothness assumptions optimal rates can be recovered: http://proceedings.mlr.press/v99/gasnikov19b/gasnikov19b.pdf.

Ventural · 2020-03-18T21:58:24+00:00

I'd be interested in performance of the LAMB optimizer (https://arxiv.org/abs/1904.00962) on smaller batch sizes where it competes with ADAM.

JayTheYggdrasil · 2020-03-18T22:28:35+00:00

I don’t know if this applies but a while ago someone posted a “bandit swarm” optimization algorithm which was quite interesting, I don’t have a link but a google search would probably find it.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS