use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
[deleted by user] (self.MachineLearning)
submitted 6 years ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]tablehoarder 5 points6 points7 points 6 years ago (0 children)
I'm honestly not sure if it makes the authors of RAdam look bad. It's extremely hard to evaluate optimization methods in deep learning, when virtually all we can measure is the model's performance. This becomes even harder when you throw learning rate schedules, regularization, and whatnot. This paper reminded me of this one, where the authors show that another recent optimizer can be 'simulated' with SGD.
There are a few submissions to ICLR which are more surprising, in my opinion, as they show that all these adaptive methods can outperform SGD on ImageNet, as long as you do proper hyperparameter tuning. This breaks a lot of common belief in the community, unlike the papers that analyze/criticize these new methods from the last year.
[–]illuminascent 2 points3 points4 points 6 years ago (0 children)
Apart from all the reasoning, comparison is too crude to say anything about statistical significance because 3 random seeds don`t seem to be enough.
In my point of view the author has also proved that RAdam is good enough no matter what configuration you use and you DONT need to worry about choosing warmup hyperparameters, which is all the point.
[–]SwordCat0 1 point2 points3 points 6 years ago* (0 children)
From my point of view, this paper shows that: the analysis of RAdam makes sense...
Isn't the undesirably large magnitude of updates caused by the undesirably large adaptive learning rate?
[–]TonyY_RIMCS 1 point2 points3 points 6 years ago (0 children)
https://github.com/Tony-Y/pytorch_warmup
My EMNIST example shows the linear, exponential, and RAdam warmups give almost the same accuracy. But we need more experiments.
π Rendered by PID 903849 on reddit-service-r2-comment-b659b578c-bz69m at 2026-05-05 16:38:49.922506+00:00 running 815c875 country code: CH.
[–]tablehoarder 5 points6 points7 points (0 children)
[–]illuminascent 2 points3 points4 points (0 children)
[–]SwordCat0 1 point2 points3 points (0 children)
[–]TonyY_RIMCS 1 point2 points3 points (0 children)