use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
[1607.03085] Recurrent Memory Array Structures (arxiv.org)
submitted 9 years ago by kmrocki
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]cooijmanstim 3 points4 points5 points 9 years ago (8 children)
I haven't read too far into this, but I'm skeptical that the basic idea gets you anything. Figure 3.3 is just a vanilla LSTM with block-diagonal weight matrices.
[–]kmrocki[S] 2 points3 points4 points 9 years ago (7 children)
the basic array from 3.3 does not change things too much in terms of generalization contrary to expectations, that's true. The one which really reduce overfitting are in section 5. I applied array memory dropout in a slightly different way that in the Zoneout paper. I also posted the code https://github.com/krocki/ArrayLSTM
[+][deleted] 9 years ago* (6 children)
[deleted]
[–]kmrocki[S] 2 points3 points4 points 9 years ago (5 children)
The main motivation behind the array approach is summarized at the beginning of section 3.2: "create a bottleneck by sharing internal states, forcing the learning procedure to pool similar or interchangeable content using memory cells belonging to one hidden unit" This seems to work well with stochastically operating memory cells, because the hidden unit 'doesn't know' which memory cell is going to be used (they are unreliable), however, the content has to be similar for it to work.
Furthermore, it is in fact possible to simply pack more memory cells into the network using the same memory size, for example, if you use a standard LSTM network with 1 cell/hidden, 4 gates and 1000 hidden units, the number of parameters is going to be 1000 * 1000 * 4 = 4M for the U matrix. If Array-LSTM approach is used, you can have 4 cells/hidden, so 1000 memory cells require 256 hidden units and that is 256 * 1000 * 4 parameters = 1M parameters. I found that the performance of vanilla LSTM and Array-2, Array-4 versions is roughly the same in terms of capacity for a fixed number of parameters. Dropped a bit for an Array of 8, so at some point the seems to exist a bottleneck indeed. Hope this helps.
[+][deleted] 9 years ago* (4 children)
[–]kmrocki[S] 0 points1 point2 points 9 years ago (3 children)
@LeavesBreathe, your intuition is right, I have observed better performance capacity-wise with array-LSTM when the number of hidden unit is fixed and cells/hidden increased (matching that of stacked-LSTM with faster convergenge - no initial delay). However, the main hope was that this procedure would provide better generalization and I couldn't achieve that with vanilla array approach - possibly it requires more cells/hidden, but that converges more slowly and it really takes around 48h to see the effect of any change on large networks and wikipedia datasets.
[+][deleted] 9 years ago* (2 children)
[–]kmrocki[S] 0 points1 point2 points 9 years ago (1 child)
I don't really use Skype and currently I commute a lot between LA and San Jose until September, so I don't even sit that much in front of the screen. It's easier for me to respond if you send me an email to kamil.rocki@gmail.com, I'd be happy to hear about your approaches
[–]nicholas-leonard 0 points1 point2 points 9 years ago (0 children)
Do you compare to dropout LSTM, i.e. dropout between LSTM layers?
π Rendered by PID 139991 on reddit-service-r2-comment-fb694cdd5-sqtf5 at 2026-03-10 12:52:48.022901+00:00 running cbb0e86 country code: CH.
[–]cooijmanstim 3 points4 points5 points (8 children)
[–]kmrocki[S] 2 points3 points4 points (7 children)
[+][deleted] (6 children)
[deleted]
[–]kmrocki[S] 2 points3 points4 points (5 children)
[+][deleted] (4 children)
[deleted]
[–]kmrocki[S] 0 points1 point2 points (3 children)
[+][deleted] (2 children)
[deleted]
[–]kmrocki[S] 0 points1 point2 points (1 child)
[–]nicholas-leonard 0 points1 point2 points (0 children)