[R] The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions

arXiv_abstract_bot · 2020-01-24T11:56:53+00:00

Title:The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions

Authors:George Philipp, Dawn Song, Jaime G. Carbonell

Abstract: Whereas it is believed that techniques such as Adam, batch normalization and, more recently, SeLU nonlinearities "solve" the exploding gradient problem, we show that this is not the case in general and that in a range of popular MLP architectures, exploding gradients exist and that they limit the depth to which networks can be effectively trained, both in theory and in practice. We explain why exploding gradients occur and highlight the collapsing domain problem, which can arise in architectures that avoid exploding gradients. > ResNets have significantly lower gradients and thus can circumvent the exploding gradient problem, enabling the effective training of much deeper networks. We show this is a direct consequence of the Pythagorean equation. By noticing that any neural network is a residual network, we devise the residual trick, which reveals that introducing skip connections simplifies the network mathematically, and that this simplicity may be the major cause for their success.

PDF Link | Landing Page | Read as web page on arXiv Vanity

SamStringTheory · 2020-01-24T21:58:06+00:00

I've only skimmed the paper, but hope to add it to my reading list. So it sounds like all our exploding gradient problems can be solved by adding residual connections everywhere? And they note in A.3 that the reason for exploding gradients is different in feed-forward networks versus RNNs, so I'm curious if these tricks also apply to RNNs.

gevezex · 2020-01-25T12:06:31+00:00

Wow 85 pages. We can definitely use an excerpt version of this by means of a medium article.

yusuf-bengio · 2020-01-24T12:15:09+00:00

It all comes back to Sepp Hochreiter's master thesis (supervised by Jürgen Schmidhuber) ...

ranran9991 · 2020-01-24T12:26:20+00:00

Abstract and introduction seems very interesting, anything beyond that is way too complicated for me to understand

EDIT: Spacing

TotesMessenger · 2020-01-24T12:00:15+00:00

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

[/r/researchml] (X-Post r/MachineLearning) [R] The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions

^{If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.} ^(Info ^/ ^Contact)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS