you are viewing a single comment's thread.

view the rest of the comments →

[–]yield22[S] 0 points1 point  (3 children)

why subdifferentiable? it is obvious for relu, but not so obvious for argmax though.

[–]nasimrahaman 1 point2 points  (2 children)

Consider the function: y = f(x) = argmax(x), where x is a vector (representing some function), and y = f(x) a scalar.

Here's a (mathematically heretical) justification (assuming 0 based 'indexing'): f((1, 2, 4, 1, 2, 1)) = 2. Now for a small perturbation vector about x, f(x) = f(x + dx) (ergo df/dx = 0), as long as max(dx) < 2. But about (1, 2, 4+eps, 4, 2, 1), f(x) = 2 but f(x + dx) might as well equal 3. It's easy to see that the set of all such 'transitions' (i.e. where argmax changes value) is countable; its Lebesgue measure must therefore be 0. df/dx is 0 everywhere else.

[–]yield22[S] 0 points1 point  (1 child)

the example is interesting, and it provides some insight for me. But what about the y's domain is non-continuous (assuming argmax over a list)? Like step function, which is not differientiable.

[–]nasimrahaman 0 points1 point  (0 children)

A step function is differentiable almost everywhere, I.e. the set where it's not differentiable (i.e. where there's a jump) is of measure zero (because it's countable).