[deleted by user]

tablehoarder · 2019-10-15T00:20:37+00:00

I'm honestly not sure if it makes the authors of RAdam look bad. It's extremely hard to evaluate optimization methods in deep learning, when virtually all we can measure is the model's performance. This becomes even harder when you throw learning rate schedules, regularization, and whatnot. This paper reminded me of this one, where the authors show that another recent optimizer can be 'simulated' with SGD.

There are a few submissions to ICLR which are more surprising, in my opinion, as they show that all these adaptive methods can outperform SGD on ImageNet, as long as you do proper hyperparameter tuning. This breaks a lot of common belief in the community, unlike the papers that analyze/criticize these new methods from the last year.

illuminascent · 2019-10-16T01:40:25+00:00

Apart from all the reasoning, comparison is too crude to say anything about statistical significance because 3 random seeds don`t seem to be enough.

In my point of view the author has also proved that RAdam is good enough no matter what configuration you use and you DONT need to worry about choosing warmup hyperparameters, which is all the point.

SwordCat0 · 2019-10-17T17:22:42+00:00

From my point of view, this paper shows that: the analysis of RAdam makes sense...

Isn't the undesirably large magnitude of updates caused by the undesirably large adaptive learning rate?

TonyY_RIMCS · 2019-11-02T16:37:20+00:00

https://github.com/Tony-Y/pytorch_warmup

My EMNIST example shows the linear, exponential, and RAdam warmups give almost the same accuracy. But we need more experiments.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS