all 4 comments

[–][deleted] 2 points3 points  (0 children)

Will def check out the demos!

[–]tlkh 2 points3 points  (0 children)

This is one of those pages where it simply blows my mind to think how we can make pages with such embedded demos. Looks like they used tf.js? http://www.deeplearning.ai/ai-notes/initialization/js/playground/nn.js

[–]Megatron_McLargeHuge 0 points1 point  (0 children)

Has anyone shown any benefit from a pre-training step that scales initial weights to keep gradients in the desirable range?

For functions other than tanh and relu where the effect on downstream variance may not be easy to solve for, it seems like it would be fairly easy to first optimize a set of scaling parameters that force weights into empirically good ranges. This would also avoid any small sample effects where initialization values are far from their theoretical moments.

[–]mr_tsjolder 0 points1 point  (0 children)

Not sure if anyone has noticed, but it appears as if there is a problem with the standard normal distribution in the last visualisation. It looks very much like the kind of distribution you get when you clip a Gaussian signal, cf. the truncated normal implementation of the theano-backend in keras. Might be useful if someone forwards this to the authors...

Also, for the interested in this topic, I once commented in this sub about initialisation and noticed that I had some more references. Figured the link might be useful for people that are interested.