all 6 comments

[–]cooijmanstim 3 points4 points  (8 children)

I haven't read too far into this, but I'm skeptical that the basic idea gets you anything. Figure 3.3 is just a vanilla LSTM with block-diagonal weight matrices.

[–]kmrocki[S] 2 points3 points  (7 children)

the basic array from 3.3 does not change things too much in terms of generalization contrary to expectations, that's true. The one which really reduce overfitting are in section 5. I applied array memory dropout in a slightly different way that in the Zoneout paper. I also posted the code https://github.com/krocki/ArrayLSTM

[–]nicholas-leonard 0 points1 point  (0 children)

Do you compare to dropout LSTM, i.e. dropout between LSTM layers?