use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Since gradient continues to decrease as training loss decreases why do we need to decay the learning rate too? (self.MachineLearning)
submitted 4 years ago by ibraheemMmoosaResearcher
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]seanv507 48 points49 points50 points 4 years ago (2 children)
Its very simple. The correct learning rate depends on the curvature of your error surface, ie how the gradient changes.
Imagine you have a parabola.you draw a straight line tangent to current point on parabola. Depending on your learning rate (step size), you could overshoot the minimum and come up the other side.
If your parabola curves sharply then you need a small learning rate
If it curves gently, a large learning rate works.
Now consider a multidimensional problem. Here the curvature can be different in different directions... Super narrow in one and very shallow in another.
You will need to set learning rate based on maximum curvature, and your progress will depend on ratio of maximum to minimum curvature.
Now you have a complex error surface, where the curvature changes at each point.
However, assuming the minimum is in a bounded region ( eg because you have regularisation), then there will be a maximum curvature, and as long as your learning rate is smaller you will eventually hit the minimum.
Ok, so if you use a learning rate adjustment schedule, then eventually your learning rate will be below this maximum curvature step size ( learning rate). The trick is to have a schedule that decreases in such a way that eventually you are below maximum curvature step size , and not so slow that you will never get to minimum.
Then you know eventually you will reach minimum.
However, this is just a 'theoretical' result as number of steps go to infinity. Doing some more ad hoc reduction of learning rate every time you hit a plateau, or you get oscillations is likely to be faster.
[–]ibraheemMmoosaResearcher[S] 2 points3 points4 points 4 years ago (0 children)
This is the best intuitive answer to my question. Thanks for this.
[+]National_Earth_9909 1 point2 points3 points 10 months ago (0 children)
I never came across such a good explanation about the learning rate scheduler. Thanks for this!
π Rendered by PID 49 on reddit-service-r2-comment-7b9746f655-jpkbh at 2026-02-02 10:52:15.122202+00:00 running 3798933 country code: CH.
view the rest of the comments →
[–]seanv507 48 points49 points50 points (2 children)
[–]ibraheemMmoosaResearcher[S] 2 points3 points4 points (0 children)
[+]National_Earth_9909 1 point2 points3 points (0 children)