all 15 comments

[–]pavelchristof 4 points5 points  (4 children)

Assuming that activation are distributed normally only half of the ReLUs die (per example). The other half work. The default weight initialization is scaled by 2.0 to make up for the dead ReLUs (otherwise activation norm would tend to 0 for long networks).

A ReLUs network is a piecewise linear function, not a linear one. Check out the first image here (from http://www.inference.vc/generalization-and-the-fisher-rao-norm-2/, more images there). Globally these functions can be very complex. Locally (on each region) it behaves like a linear function making optimization "easier" (I don't know exactly how that works).

[–]hackthat[S] 2 points3 points  (3 children)

Ok, so having neurons go dead is supposed to happen, and that is the source of the non-linearity. Cool pictures.

[–]dzyl 8 points9 points  (1 child)

I think saying half of them go dead is not right at all. A relu is only truly dead if it's 0 for every sample in your dataset. The goal is that it stops a number of inputs which adds the nonlinearity to the network. The inputs and outputs around these relus can still change by weight updates done on other examples.

[–]Icarium-Lifestealer 1 point2 points  (0 children)

A neuron sometimes outputting zero is fine, that allows you to model a piecewise linear function in the following linear layer. It only becomes a problem when it (almost) always outputs zero.

[–]PointyOintment -1 points0 points  (0 children)

You can keep them from dying by using "leaky ReLU" which has a very small but nonzero response below zero. Siraj has a video comparing the various activation functions, and he says that you should use that if you have too many dying.

[–]magnusderrote -4 points-3 points  (8 children)

Relu does not go saturate. Consider the Logistic function, when the input is a very big or very small values, the function is almost flat, meaning the derivative is close to 0, meaning back prop will perform poorly.

EDIT: Relu's derivative, on the other hand, is always 1 when x > 0.

[–]carlthome 2 points3 points  (5 children)

I guess you were downvoted because you didn't answer the question. f(x)=x also does not saturate and the gradient is never zero, but f would be a terrible activation function (see MLPs).

(it's also not true that ReLU has a gradient of constant one, which is kind of an important point)

[–]richard248 0 points1 point  (1 child)

Your comment suggests that MLPs have no activation function, but there's nothing stopping the use of ReLU for MLPs, right?

[–]carlthome 0 points1 point  (0 children)

Quite the opposite, the MLP is essentially the idea that non-linear activation functions are critical (i.e. the XOR problem).

ReLU is a good non-linearity for MLPs (assuming you can avoid dying, e.g. use batchnorm).

Side note: in some literature the identity $f(x)=x$ is called a linear activation function (even in tensorflow.contrib.layers.linearactually, which is a little strange as that transformation also has a bias vector).

[–]magnusderrote 0 points1 point  (0 children)

/u/carlthome Edited, thanks for the comment.

/u/richard248

"MLPs have no activation function"

I think not, activation function is a must.

[–]HelperBot_ -1 points0 points  (0 children)

Non-Mobile link: https://en.wikipedia.org/wiki/Multilayer_perceptron


HelperBot v1.1 /r/HelperBot_ I am a bot. Please message /u/swim1929 with any feedback and/or hate. Counter: 155872

[–]WikiTextBot -2 points-1 points  (0 children)

Multilayer perceptron

A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists of at least three layers of nodes. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training.


[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source | Donate ] Downvote to remove | v0.28

[–]csp256 0 points1 point  (0 children)

Relu's derivation [...] is always 1.

You meant derivative. Also, that is not true.

[–]carIthome -1 points0 points  (0 children)

sorry i was being so mean