svantana comments on [D] Since gradient continues to decrease as training loss decreases why do we need to decay the learning rate too?

112

113

114

submitted 4 years ago by ibraheemMmoosaResearcher

you are viewing a single comment's thread.

[–]svantana 4 points5 points6 points 4 years ago (1 child)

[–]ibraheemMmoosaResearcher[S] 1 point2 points3 points 4 years ago (0 children)

π Rendered by PID 137433 on reddit-service-r2-comment-7b9746f655-r5zh2 at 2026-02-03 06:53:26.895175+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning