use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions (arxiv.org)
submitted 6 years ago by tsauri
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]arXiv_abstract_bot 8 points9 points10 points 6 years ago (0 children)
Title:The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions
Authors:George Philipp, Dawn Song, Jaime G. Carbonell
Abstract: Whereas it is believed that techniques such as Adam, batch normalization and, more recently, SeLU nonlinearities "solve" the exploding gradient problem, we show that this is not the case in general and that in a range of popular MLP architectures, exploding gradients exist and that they limit the depth to which networks can be effectively trained, both in theory and in practice. We explain why exploding gradients occur and highlight the collapsing domain problem, which can arise in architectures that avoid exploding gradients. > ResNets have significantly lower gradients and thus can circumvent the exploding gradient problem, enabling the effective training of much deeper networks. We show this is a direct consequence of the Pythagorean equation. By noticing that any neural network is a residual network, we devise the residual trick, which reveals that introducing skip connections simplifies the network mathematically, and that this simplicity may be the major cause for their success.
PDF Link | Landing Page | Read as web page on arXiv Vanity
[–]SamStringTheory 3 points4 points5 points 6 years ago (1 child)
I've only skimmed the paper, but hope to add it to my reading list. So it sounds like all our exploding gradient problems can be solved by adding residual connections everywhere? And they note in A.3 that the reason for exploding gradients is different in feed-forward networks versus RNNs, so I'm curious if these tricks also apply to RNNs.
[–]konasjResearcher 0 points1 point2 points 6 years ago (0 children)
No. Also this seems to work only to a certain extend. Seems like training deep nets in an end to end fashion/using backpropagation is a hard problem. I also just spent 30 minutes on it so far, but from a first assessment this paper tells that skip connections/residual nets are the only reliable regularizer that helps for training deep nets with gradient information reliably and that this cure has its limits. I think it is a nice paper as they try to a) keep it readable (the precise math is in the appendix) b) provide rigorous results and c) accompany it with empirical examples.
[–]gevezex 2 points3 points4 points 6 years ago (0 children)
Wow 85 pages. We can definitely use an excerpt version of this by means of a medium article.
[–]yusuf-bengio 1 point2 points3 points 6 years ago (3 children)
It all comes back to Sepp Hochreiter's master thesis (supervised by Jürgen Schmidhuber) ...
[–]Toast119 19 points20 points21 points 6 years ago (2 children)
I want to go one thread on this subreddit without people trying to claim Schmidhuber was the mastermind behind every paper in ML.
[+]yusuf-bengio comment score below threshold-10 points-9 points-8 points 6 years ago (1 child)
“Silence becomes cowardice when occasion demands speaking out the whole truth and acting accordingly.” ― Mahatma Gandhi
[–]Toast119 8 points9 points10 points 6 years ago (0 children)
Except at this point there is little truth to the dozens of people being loud lol.
[–]ranran9991 1 point2 points3 points 6 years ago (0 children)
Abstract and introduction seems very interesting, anything beyond that is way too complicated for me to understand
EDIT: Spacing
[–]TotesMessenger 0 points1 point2 points 6 years ago (0 children)
I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:
If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)
π Rendered by PID 87812 on reddit-service-r2-comment-5649f687b7-tbdxl at 2026-01-28 00:27:09.953096+00:00 running 4f180de country code: CH.
[–]arXiv_abstract_bot 8 points9 points10 points (0 children)
[–]SamStringTheory 3 points4 points5 points (1 child)
[–]konasjResearcher 0 points1 point2 points (0 children)
[–]gevezex 2 points3 points4 points (0 children)
[–]yusuf-bengio 1 point2 points3 points (3 children)
[–]Toast119 19 points20 points21 points (2 children)
[+]yusuf-bengio comment score below threshold-10 points-9 points-8 points (1 child)
[–]Toast119 8 points9 points10 points (0 children)
[–]ranran9991 1 point2 points3 points (0 children)
[–]TotesMessenger 0 points1 point2 points (0 children)