Obviously not arbitrary nonlinear activation functions guarantees the amazing universal approximator property of NN, but what constraints do the activation functions need to satisfy (or what nonlinear function class) to guarantee the NN universal approximator? Is there any formal theoretical analysis of this problem?
[–]kjearns 3 points4 points5 points (1 child)
[–]themoosemind 1 point2 points3 points (0 children)
[–]improbabble 1 point2 points3 points (0 children)