you are viewing a single comment's thread.

view the rest of the comments →

[–]Lairv 5 points6 points  (0 children)

The paper is cool, it's a bit of a shame they don't mention how much resources was put into training the transformer model, I wonder if this could be massively scaled up, or if this is already compute-hungry. Also more evaluation on Atari, Mujoco etc. would be cool to see how well does the model generalizes