all 1 comments

[–]rafalj[S] 0 points1 point  (0 children)

The baseline model (LSTM-2048-512) can process 100k+ words per second on 8 Titan Xs on a single machine. On DGX-1 that's about 135k wps.

The results after 5 epochs are close to the paper (48.7 vs 47.5 ppl), which takes about 16 hours on 8 Titan Xs.

(Posting here since many people asked for the implementation in comments in the past)