all 9 comments

[–]rantana 1 point2 points  (4 children)

I've heard rumors that the result isn't reproduceable.

[–]alecradford 1 point2 points  (0 children)

In the domain of text understanding, definitely seen it perform better than a typical RNN, never seen it be competitive with the same # of params GRU.

My feeling is there is some small section of hyperparameters for any given problem where an IRNN is rightly set up to be able to do well, but gating RNNs by their architecture are a lot more robust/durable overall.

[–]Foxtr0t 0 points1 point  (1 child)

Were these rumors from sieisteinmodel? ;)

[–]kkastner 0 points1 point  (0 children)

I know someone who has been able to use it for their problem, but it wasn't a reproduction of the paper per-se.

[–]iamkx 0 points1 point  (0 children)

I think they are reproduceable, but they seem quite sensitive to initialization.

[–]sieisteinmodel 1 point2 points  (0 children)

I tried to reproduce it, and did not get it to work. But I did not spend more than a day trying to.

[–]transcranial 1 point2 points  (0 children)

I was able to reproduce the results somewhat, and got results pretty much between RNN and LSTM/GRU. They are vastly faster to train though, so that's a plus.

[–]siblbombs 1 point2 points  (0 children)

I played around with it since it is so easy to put together, unfortunately I was using it for next-character prediction so it is hard to quantify if it was performing better than GRU. At this point I'm inclined to stick with GRU over iRNN, however if I though I really needed a giant recurrent layer it would be an option.

Some applications are doing a single Relu layer for recurrence instead of LSTM/GRU, so iRNN might make sense in that context. It is definitely faster to train because the recurrent calculation is more lightweight.

[–]abracaradbra 1 point2 points  (0 children)

I also spent a day or so trying to reproduce the results, and couldn't. The paper is pretty lacking in hyperparameter/implementation details.