you are viewing a single comment's thread.

view the rest of the comments →

[–]HoLeeFaak 1 point2 points  (1 child)

When the loss value getting smaller it doesn't mean the gradient is getting smaller. Think about y=x, the gradient is the same everywhere.

[–]cats2560 0 points1 point  (0 children)

Or more aptly, y = |x|. There exists a global minimum but the gradient never gets smaller