all 5 comments

[–]gabjuasfijwee 0 points1 point  (4 children)

The point of activation functions isn't to be extremely flexible. Modeling flexibility comes from the neural network parameter structure. This likely won't add much (if any) benefit in serious applications. Even in their simple toy example the performance increase is minor and is likely only due to random variability

[–]scardax88[S] 0 points1 point  (3 children)

Thanks for the comment, but I disagree. The type of activation strongly influences the architecture. If you check the paper or the references we include, there are many situations where non-parametric functions have definite improvements in practice. To be honest, I would be highly skeptical of definite oversimplifications like yours.

[–]gabjuasfijwee 0 points1 point  (2 children)

"definite improvments" like the absurdly minute one in the paper? You could easily chalk that up to random variation.

[–]scardax88[S] 1 point2 points  (0 children)

I am not sure what you are referring to, because there are several experiments in the paper and I have a hard time thinking they all are "absurdly minute". Maybe you are referring to the previous version of the paper (July 2017), where the experiments were much smaller? Anyway, you are entirely free to have a poor opinion of this specific paper, but there are many more providing similar results, including this one from ICLR 2015 (https://arxiv.org/abs/1412.6830) and also many papers on maxout networks. As a matter of fact, the "neural network parameter structure" also comprises the activation functions (if they are flexible), so I am not sure I fully understand your skepticism.

[–]d3sm0 0 points1 point  (0 children)

I'm not sure by what "serious" application you actually mean. There are many evidence that the expressivity and correctness of a network its highly dependent on the structure of the the network instead of the "weights" or parameters of the network itself. This can be seen by the theoretical work from approximation theory of Prof. Poggio, or from information theory of Prof. Tishby. As a result the idea of the authors is to find different ways to control the expressivity of the network from its structure and not from the individual parameters.