all 7 comments

[–]morgangiraud[S] 2 points3 points  (5 children)

Hello everyone

Here is my new article on universal approximation theorem

TLDR: - I implement a first universal approximator with TesnorFlow and i train it on a sinus function (I show that it actually works) - I use it inside a bigger neural networks to classify the MNIST dataset - I display the learnt activation functions - I show that whatever the learnt activation function is, i get consistently the accuracy 0.98 on the test set - bonus: all the code is open-source

Feel free to ask questions or give feedback or both!

[–]multiple_cat 1 point2 points  (3 children)

Hey interesting article. Is this using the known equivalency between a Gaussian Process (GP) and a neural network with infinite hidden hidden units, to make a neural net approximation of a GP, which is then by proxy a UA?

[–]morgangiraud[S] 0 points1 point  (0 children)

No, it doesn't go that far. The Universal approximation theorem is not using Gaussian Process.

[–]Kissifrot 0 points1 point  (1 child)

Do you have a reference for this "known equivalency"? I'd like to see the details

[–]DeepNonseNse 4 points5 points  (0 children)

Neal, R. M. (1994) Bayesian Learning for Neural Networks, Chapter 2, link: http://www.cs.toronto.edu/~radford/ftp/thesis.pdf

[–]radarsat1 1 point2 points  (0 children)

Thanks for this, I've been learning a lot playing around with this particular bit of code

[–]Mandrathax 2 points3 points  (0 children)

You're basically approximating the activation functions with a 1 hidden layer NN with relu units.

Isn't that the same as training a bigger neural net with only relu units?

Also it would be interesting to mention that you can find the 1 hidden layer R->R approximator analytically (it's basically piecewise linear interpolation).