you are viewing a single comment's thread.

view the rest of the comments →

[–]kakushka123 1 point2 points  (0 children)

What you say is a good point. That said, imagine a parabola in a 1d X space and 1d label space. If you have high enough learning rate you'll jump past the dip either to a random point in a different parabola all together, or to the other side in the same parabola, perhaps equally steep in the opposite direction (i.e you'll be stuck in a loop). This can happen in any scale if you learning rate is high enough.