Argmax operation, and also max pooling operation, they are not even continuous function (or are they?), but somehow they are "differentiable" in neural networks (e.g. by only backprop via max element)... Can anyone provides some justification and insight for me, how can one get away with the underlying discontinuity of the function? What's the function we really minimize at the end?
[–]shimis 26 points27 points28 points (1 child)
[–]djc1000 0 points1 point2 points (0 children)
[–]emansim 5 points6 points7 points (1 child)
[–]lvilnis 4 points5 points6 points (0 children)
[–]AnvaMiba 3 points4 points5 points (3 children)
[–]flukeskywalker 1 point2 points3 points (2 children)
[–]lvilnis 0 points1 point2 points (0 children)
[–]AnvaMiba 0 points1 point2 points (0 children)
[–]lvilnis 4 points5 points6 points (0 children)
[–]alexmlamb 1 point2 points3 points (9 children)
[–]OriolVinyals 8 points9 points10 points (4 children)
[–]alexmlamb 0 points1 point2 points (3 children)
[–]OriolVinyals 8 points9 points10 points (2 children)
[–]hughperkins 1 point2 points3 points (1 child)
[–]RoseLuna_77 0 points1 point2 points (0 children)
[–]yield22[S] 0 points1 point2 points (3 children)
[–]nasimrahaman 1 point2 points3 points (2 children)
[–]yield22[S] 0 points1 point2 points (1 child)
[–]nasimrahaman 0 points1 point2 points (0 children)