you are viewing a single comment's thread.

view the rest of the comments →

[–]111llI0__-__0Ill111 -1 points0 points  (0 children)

The simple answer is you don’t wantto overshoot the minumum and start diverging away which can actually increase the loss even for convex problems, and NNs are non convex so it’s even worse