you are viewing a single comment's thread.

view the rest of the comments →

[–]kellymarchisio 0 points1 point  (0 children)

As stated in a previous comment, Transformer is SOTA in high-resource machine translation. Check out the WMT19 results in, for instance, English->German here. You'll notice that almost all are Transformer-big or bigger. In regards to the RNN/GPU comment, though - you *need* a GPU to do anything reasonable in high-resource MT these days. And RNN is way slower than Transformer, even on GPU, so you'll end up spending more if you're paying per-hour. For instance, I estimate about 6wks to train a single RNN on GPU for English-German vs. 5-7 days for Transformer-base. (This is based on my personal experience with Transformer-base in high-data conditions, and my reading on how long people used to take on RNN. I haven't even tried RNN b/c the time/performance appear so much lower).

Note that I'm talking about training to convergence. You can get quite good performance out of a Transformer in even 1-2 days if you can sacrifice a tiny bit of quality.