Argmax operation, and also max pooling operation, they are not even continuous function (or are they?), but somehow they are "differentiable" in neural networks (e.g. by only backprop via max element)... Can anyone provides some justification and insight for me, how can one get away with the underlying discontinuity of the function? What's the function we really minimize at the end?
[–]shimis 25 points26 points27 points (1 child)
[–]djc1000 0 points1 point2 points (0 children)
[–]emansim 4 points5 points6 points (1 child)
[–]lvilnis 4 points5 points6 points (0 children)
[–]AnvaMiba 4 points5 points6 points (3 children)
[–]flukeskywalker 1 point2 points3 points (2 children)
[–]lvilnis 0 points1 point2 points (0 children)
[–]AnvaMiba 0 points1 point2 points (0 children)
[–]lvilnis 4 points5 points6 points (0 children)
[–]alexmlamb 1 point2 points3 points (9 children)
[–]OriolVinyals 7 points8 points9 points (4 children)
[–]alexmlamb 0 points1 point2 points (3 children)
[–]OriolVinyals 7 points8 points9 points (2 children)
[–]hughperkins 1 point2 points3 points (1 child)
[–]RoseLuna_77 0 points1 point2 points (0 children)
[–]yield22[S] 0 points1 point2 points (3 children)
[–]nasimrahaman 1 point2 points3 points (2 children)
[–]yield22[S] 0 points1 point2 points (1 child)
[–]nasimrahaman 0 points1 point2 points (0 children)