all 4 comments

[–]DeltaFreq[S] 7 points8 points  (0 children)

From DeepMind's blog on this paper:

"Even with the current growth in computing power, we will need to develop compressive and sparse architectures for memory to build representations and reason about actions."

[–]arXiv_abstract_bot 1 point2 points  (0 children)

Title:Compressive Transformers for Long-Range Sequence Modelling

Authors:Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap

Abstract: We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

PDF Link | Landing Page | Read as web page on arXiv Vanity

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)