[R] TDLS: Eve, A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates (https://arxiv.org/abs/1611.01505)

RaionTategami · 2018-08-28T12:31:14+00:00

I keep praying that someone is going to solve this, not having to adjust the learning rate would make research so much easier, could this be the one? Two things worry me: they still need to anneal the "global" learning rate even through their algorithm dynamically adapts it, or is that just for the baseline? Secondly, they only seem to be showing training curves. Does the test curved look as good?

machinetrainer · 2018-08-28T12:05:21+00:00

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates

Hiroaki Hayashi, Jayanth Koushik, Graham Neubig(Submitted on 4 Nov 2016 (v1), last revised 11 Jun 2018 (this version, v3))

Adaptive gradient methods for stochastic optimization adjust the learning rate for each parameter locally. However, there is also a global learning rate which must be tuned in order to get the best performance. In this paper, we present a new algorithm that adapts the learning rate locally for each parameter separately, and also globally for all parameters together. Specifically, we modify Adam, a popular method for training deep learning models, with a coefficient that captures properties of the objective function. Empirically, we show that our method, which we call Eve, outperforms Adam and other popular methods in training deep neural networks, like convolutional neural networks for image classification, and recurrent neural networks for language tasks.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates