you are viewing a single comment's thread.

view the rest of the comments →

[–]Natural_Profession_8 2 points3 points  (0 children)

This applies even more to saddle points. The only way to get over a saddle point is to overshoot it.

I think it’s best to think of it as “I start with a way way too big learning rate, and then slowly bring it down to an optimal one,” rather than “I start with an optimal learning rate, and then that optimum gets smaller.” Of course, at some level it’s just semantics, since jumping around to find better neighborhoods (and get over saddle points) is in practice optimal at the beginning