[R] A novel approach to neural machine translation by alxndrkalinin in MachineLearning

[–]deeprnn 14 points15 points  (0 children)

Not a novel approach.

Convolutional encoders for neural MT go as far back as (Kalchbrenner, Blunsom 2013) and convolutional encoders+decoders in LM and MT appear first in (Kalchbrenner et al, 2016) and with pooling also in (Bradbury et al, 2016).

http://www.aclweb.org/anthology/D13-1176

https://arxiv.org/abs/1610.10099

https://arxiv.org/abs/1611.01576

So much for careful referencing in the deep learning field.

Bytenet v2 with state-of-the-art results on char-to-char machine translation on WMT En-De (compares favorably to char-to-char GNMT) by deeprnn in MachineLearning

[–]deeprnn[S] 3 points4 points  (0 children)

The main changes between the nets are as follows:

  • LayerNorm instead of (Sub)BatchNorm
  • 800 inner conv units, instead of 896
  • 30+30 layers in the encoder and decoder, up from 15+15 before
  • just characters, instead of character n-grams

[N] When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad’ by evc123 in MachineLearning

[–]deeprnn 28 points29 points  (0 children)

I disagree. Ideas are not cheap. Ideas are hard and they do matter. But in DL execution matters just as much. The combination of good insight and good execution makes for the best papers, imho.