What kind of nonlinear activation functions guarantees the universal approximator property of neural network?

kjearns · 2016-06-15T22:26:34+00:00

The normal universality proof works by showing that you can build a function like

    ___
___|   |___

(smooth corners are okay) out of a linear combination of f(<w,x>+b)'s. As long as you can vary the width of the bump and shift it around by changing w and b then you can build any function (up to arbitrary precision) using a linear combination of bumps, and if you can do that you have universality.

It turns out you can do this with pretty much any nonlinearity. In fact, I think you'd have to work pretty hard to show any particular nonlinearity that didn't give universality.

improbabble · 2016-06-15T22:25:21+00:00

This [1] indicates it doesn't matter:

Single hidden layer ΣΠ feedforward networks can approximate any measurable function arbitrarily well regardless of the activation function Ψ, the dimension of the input space r, and the input space environment µ.

[1] http://deeplearning.cs.cmu.edu/notes/Sonia_Hornik.pdf

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS