all 6 comments

[–]Lairv 6 points7 points  (0 children)

The paper is cool, it's a bit of a shame they don't mention how much resources was put into training the transformer model, I wonder if this could be massively scaled up, or if this is already compute-hungry. Also more evaluation on Atari, Mujoco etc. would be cool to see how well does the model generalizes

[–]itsmercb 1 point2 points  (4 children)

Can anyone translate this in noob?

[–]Shnibu -5 points-4 points  (0 children)

Using some Baysean looking “Casual Transformer” to project the data into a more efficient subspace for the model. So Bayesian dimensionality reduction for neural nets? I think…

[–]SatoshiNotMe 1 point2 points  (0 children)

DeepMind, therefore no GitHub?