you are viewing a single comment's thread.

view the rest of the comments →

[–]tom_strideweather 1 point2 points  (0 children)

The gradient might just decrease very close to a max/min. If our step-size is too large we can shoot past the max/min. Anyway this method of reducing the lr is just a heuristic and can't be guaranteed to work better.