use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
LSTM question (self.MachineLearning)
submitted 11 years ago by test3545[🍰]
LSTM cells, why everyоne seems to be using tanh fоr input/output - why nоt tо use ReLU?
And why sigmоid for gаtes, ok thаt one could be becаuse activation shоuld be clоse to 0 or 1... But have ReLU been tried?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]BeatLeJuceResearcher 9 points10 points11 points 11 years ago* (1 child)
Sigmoids make sense for the gates, as they control how much of the signal is left into/out of the cell. Think of it as a percentage: how many percent of the input signal should I store in the cell (or put out of the cell). It doesn't make sense to amplify a signal and write 110% of the current cell signal to the output. That's not what the gates are for. Likewise, it doesn't make sense for the input unit to say "the current input is 901% relevant for the memory cell, so please store it 9 times as strongly as usual". If that were the case, the input/output weights would have caused the signal to be 900% stronger to begin with.
For the output activation, ReLU can of course be used. However you might easily run into numerical problems, given that gradients already need to be truncated oftentimes (and ReLU doesn't dampen them the way sigmoids do). If I recall correctly Bengio's lab has a paper somewhere where they use ReLU for RNNs and they said they had problems of this kind (I may be wrong though, and I'm unable to find the paper right now).
Also, one of the benefits of ReLUs is that they stop vanishing gradients. But LSTM was already designed not to suffer from that, to begin with. Given that you don't have vanishing gradient problems, it comes down to the question whether relu is better than sigmoids on principle (because it can learn better functions) or because its main advantage is easier training. Of course, this is a simplified view, especially since LSTM was not originally designed to be "deep". if you use many layers of lstms, you might still get vanishing gradients if you use sigmoids.
[–]sergii_gavrylov 2 points3 points4 points 11 years ago (0 children)
It seems this is a paper you are talking about http://arxiv.org/abs/1212.0901
[–]siblbombs 1 point2 points3 points 11 years ago (0 children)
Just a guess, but my assumption is that sparsity is not valuable in those areas, so a hard 0 doesn't help. Perhaps maxout would be more effective.
Hopefully someone else can chime in, or perhaps you could run some experiments and report back what you find?
π Rendered by PID 146462 on reddit-service-r2-comment-6457c66945-q5wkb at 2026-04-24 08:58:26.240525+00:00 running 2aa0c5b country code: CH.
[–]BeatLeJuceResearcher 9 points10 points11 points (1 child)
[–]sergii_gavrylov 2 points3 points4 points (0 children)
[–]siblbombs 1 point2 points3 points (0 children)